Towards Enhanced Image Inpainting:
Mitigating Unwanted Object Insertion and Preserving Color Consistency

Yikai Wang*, Chenjie Cao*, Junqiu Yu*, Ke Fan, Xiangyang Xue, Yanwei Fu†.
Fudan University
CVPR 2025

Abstract

ASUKA solves two issues existed in current diffusion and rectified flow inpainting models:
  • Unwanted object insertion, where randomly elements that are not aligned with the unmasked region are generated;
  • Color-inconsistency, the color shift of the generated masked region, causing smear-like traces.
ASUKA proposes a post-training procedure for these models, significantly mitigates object hallucination and improves color consistency of inpainted results.

Introduction

ASUKA enhances image inpainting with color-consistency and mitigate object hallucination while leveraging the generation capacity of the frozen inpainting model. It achieves this through two main components:
  • Context-Stable Alignment: ASUKA aligns the stable MAE prior with generative models to provide a context-stable estimation of masked regions, replacing the text-condition with MAE prior.
  • Color-Consistent Alignment: ASUKA re-formulates the decoding from latent to image as a local harmonization task, trains an inpainting-specialized decoder to align masked and unmasked regions during decoding and thus mitigates color inconsistencies.

Results

  • MAE offers stable masked region estimates, yet falls short in texture detail.
  • GAN-based inpainting struggles with low fidelity.
  • SD is powerful but unstable, often introducing random elements and suffers from mask-unmask color inconsistency.
  • ASUKA ensures consistency of masked-unmasked areas during diffusion and decoding processes, mitigating object hallucination and improving color-consistency.

MISATO dataset

To validate across different domains and mask styles, we construct a evaluation dataset, dubbed as MISATO, from Matterport3D, Flickr-Landscape,MegaDepth, and COCO 2014 to handle indoor, outdoor, building and background inpainting, respectively. We select 500 representative examples of size 512x512 and 1024x1024 from each dataset, forming a total of 2,000 testing examples.

Please refer to our paper(preprint) for more details.

ASUKA and MISATO :)