Personalized Restoration via Dual-Pivot Tuning

1University of California, Los Angeles, 2Snap Inc. (* corr. author)
Interpolate start reference image.

TL;DR: By using a few reference images of an individual, we personalize a diffusion prior within a blind image restoration framework. This results in a natural image that closely resembles the individual's identity, while retaining the visual attributes of the degraded image.

Abstract

Generative diffusion models can serve as a prior which ensures that solutions of image restoration systems adhere to the manifold of natural images. However, for restoring facial images, a personalized prior is necessary to accurately represent and reconstruct unique facial features of a given individual. In this paper, we propose a simple, yet effective, method for personalized restoration, called Dual-Pivot Tuning - a two-stage approach that personalize a blind restoration system while maintaining the integrity of the general prior and the distinct role of each component. Our key observation is that for optimal personalization, the generative model should be tuned around a fixed text pivot, while the guiding network should be tuned in a generic (non-personalized) manner, using the personalized generative model as a fixed "pivot". This approach ensures that personalization does not interfere with the restoration process, resulting in a natural appearance with high fidelity to the person's identity and the attributes of the degraded image. We evaluated our approach both qualitatively and quantitatively through extensive experiments with images of widely recognized individuals, comparing it against relevant baselines. Surprisingly, we found that our personalized prior not only achieves higher fidelity to identity with respect to the person's identity, but also outperforms state-of-the-art generic priors in terms of general image quality.

Personalized Face Restoration

Personalizing the restoration process enables high-fidelity restoration, while retaining accurate subject identity. We compare with a unconditional, non-personalized restoration method (DiffBIR). While the comparison method is able to restore the test images, significant identity drifts may be noted. Please move around the slider for better visualization.


Input image (degraded).


Interpolate start reference image.

Restored Image: unconditional (left), ours (right).

Identity reference for the restored image.

Interpolate start reference image.
Interpolate start reference image.
Interpolate start reference image.
Interpolate start reference image.
Interpolate start reference image.
Interpolate start reference image.
Interpolate start reference image.

Proposed Method

Interpolate start reference image.

Our dual-pivot tuning approach aims to personalize a blind face restoration system (left) and consists of two main steps: (1) Textual pivoting- fine-tuning the generative prior G within the context of the system to leverage conditioning cues from E, and (2) Model-based pivoting- adjusting E while freezing G in order align behavior to account for the strong personalized prior. Then, at inference time, our system embeds the personalized prior and can generate output images with high fidelity to the individual appearing in the reference images.


Understanding Personalization Strategies

Input image (degraded).

Interpolate start reference image.

Unpersonalized (DiffBIR).

Only text-based pivoting.

Only model-based pivoting.

Identity reference for the restored image.

Interpolate start reference image.

We compare our proposed dual-pivot tuning based personalization strategy (image on right in slider panels) with baseline methods. Given a degraded face image as input, whose identity appears in the reference image, existing unpersonalized diffusion-based face image restoration (DiffBIR) is unable to retain identity. Only textual pivoting based personalization of the generative prior is insufficent as the face restoration model is unable to leverage the identity information. On the other hand, only model-based pivoting of E pivoted around G leads to some injection of identity cues (see face structure and eyes), however accompanied by a loss of high-frequency detail. Our method is able to significantly inject identity information into the restoration process, without losing the general image prior. Our output appears on the right side of each slider.

Text-Guided Editing

Our use of text anchoring (as opposed to prior unconditional models) enables text-guided editing. Using prompts modifiers such as "smiling" and "blue eyes" enables relevant editing along with the restoration (please zoom in to the page to examine).

Interpolate start reference image.

Face Swap

We can leverage personalized models as a means for tasks such as face swapping. An input image can be blurred, and then simply restored with the personalized model for a different identity to enable this effect.

Interpolate start reference image.

BibTeX

@article{chari2023personalized,
  author    = {Chari, Pradyumna and Ma, Sizhuo and Ostashev, Daniil and Kadambi, Achuta and Krishnan, Gurunandan and Wang, Jian and Aberman, Kfir},
  title     = {Personalized Restoration via Dual-Pivot Tuning},
  journal   = {arXiv preprint arXiv:2312.17234},
  year      = {2023},
}