This document presents DragGAN, a method for interactive point-based manipulation on the generative image manifold using generative adversarial networks (GANs). DragGAN allows users to precisely control the pose, shape, expression, and layout of generated objects by dragging points on the image. The method combines feature-based motion supervision and point tracking to achieve this control. DragGAN produces realistic outputs and outperforms prior approaches in image manipulation and point tracking tasks. The document also provides a citation, license information, and acknowledgments for the work.
Max Planck Institute for Informatics
Xingang Pan 1,2 Ayush Tewari 3 Thomas Leimkühler 1 Lingjie Liu 1,4 Abhimitra Meka 5 Christian Theobalt 1,2
1Max Planck Institute for Informatics 2Saarbrücken Research Center for Visual Computing, Interaction and AI 3MIT 4University of Pennsylvania 5Google AR/VR
Abstract
Synthesizing visual content that meets users' needs often requires flexible and precise controllability of the pose, shape, expression, and layout of the generated objects. Existing approaches gain controllability of generative adversarial networks (GANs) via manually annotated training data or a prior 3D model, which often lack flexibility, precision, and generality. In this work, we study a powerful yet much less explored way of controlling GANs, that is, to "drag" any points of the image to precisely reach target points in a user-interactive manner, as shown in Fig.1. To achieve this, we propose DragGAN, which consists of two main components including: 1) a feature-based motion supervision that drives the handle point to move towards the target position, and 2) a new point tracking approach that leverages the discriminative GAN features to keep localizing the position of the handle points. Through DragGAN, anyone can deform an image with precise control over where pixels go, thus manipulating the pose, shape, expression, and layout of diverse categories such as animals, cars, humans, landscapes, etc. As these manipulations are performed on the learned generative image manifold of a GAN, they tend to produce realistic outputs even for challenging scenarios such as hallucinating occluded content and deforming shapes that consistently follow the object's rigidity. Both qualitative and quantitative comparisons demonstrate the advantage of DragGAN over prior approaches in the tasks of image manipulation and point tracking. We also showcase the manipulation of real images through GAN inversion.
Main demo (accelerated)
Cat
Dog
Horse
Face
Human
Car
Microscope
Landscapes
Citation
@inproceedings{pan2023_DragGAN,
title={Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold},
author={Pan, Xingang and Tewari, Ayush, and Leimk{\"u}hler, Thomas and Liu, Lingjie and Meka, Abhimitra and Theobalt, Christian},
booktitle = {ACM SIGGRAPH 2023 Conference Proceedings},
year={2023}
}
License
Images, text and video files on this site are made freely available for non-commercial use under the Creative Commons CC BY-NC 4.0 license. Feel free to use any of the material in your own work, as long as you give us appropriate credit by mentioning the title and author list of our paper.
Acknowledgments
This work was supported by ERC Consolidator Grant 4DReply (770784). Lingjie Liu was supported by Lise Meitner Postdoctoral Fellowship. This project was also supported by Saarbrücken Research Center for Visual Computing, Interaction and AI.