We curated a benchmark dataset called ReS. This dataset includes 100 paired images, where one image features a repositioned subject while the other elements remain constant. These images were collected from over 20 indoor and outdoor scenes, showcasing subjects from more than 50 categories. This diversity enables effective simulation of realworld open-vocabulary applications. The dataset is available at
here.