Repositioning The Subject Within Image

[arxiv][dataset][Full-size PDF] [bib]
title={Repositioning the Subject within Image},
author={Wang, Yikai and Cao, Chenjie and Dong, Qiaole and Li, Yifan and Fu, Yanwei},
journal={arXiv preprint arXiv: 2401.16861},

Watch the demo video at Bilibili.


Illustrated generative sub-tasks encompassed by subject repositioning:
i) Subject removal: Fill the void created when moving the subject to maintain consistency and avoid generating new, random subjects.
ii) Subject completion: Completing the occluded portions of the moved subject is necessary.
iii) Subject harmonization: The appearance of repositioned subject should blend with the surrounding areas.


SEELE employs an interactive pre-processing, manipulation, and post-processing pipeline for subject repositioning. During the pre-processing phase, SEELE identifies the subject using the segmentation model, guided by user-provided conditions, and maintains the occlusion relationships between subjects intact. In the manipulation stage, SEELE manipulates the image to fill in any left gaps. Furthermore, SEELE rectifies the obscured subject with user-specified incomplete masks. In the post-processing phase, SEELE addresses any disparities between the repositioned subject and its new surroundings.


We curated a benchmark dataset called ReS. This dataset includes 100 paired images, each with dimensions 4032×3024, where one image features a repositioned subject while the other elements remain constant. These images were collected from over 20 indoor and outdoor scenes, showcasing subjects from more than 50 categories. This diversity enables effective simulation of realworld open-vocabulary applications. The dataset is available at here.


We present results of SEELE on 1024 x 1024 images.

We also assess the effectiveness of various components within SEELE during both pre-processing and post-processing phases.

Please refer to our paper(arxiv) for more details.