SynFMC

Abstract

Controlling the movements of dynamic objects and the camera within generated videos is a meaningful yet challenging task. Due to the lack of datasets with comprehensive 6D pose annotations, existing text-to-video methods can not simultaneously control the motions of both camera and objects in 3D-aware manner, resulting in limited controllability over generated contents. To address this issue and facilitate the research in this field, we introduce a Synthetic Dataset for Free-Form Motion Control (SynFMC). The proposed SynFMC dataset includes diverse object and environment categories and covers various motion patterns according to specific rules, simulating common and complex real-world scenarios. The complete 6D pose information facilitates models learning to disentangle the motion effects from objects and the camera in a video. To provide precise 3D-aware motion control, we further propose a method trained on SynFMC, Free-Form Motion Control (FMC). FMC can control the 6D poses of objects and camera independently or simultaneously, producing high-fidelity videos. Moreover, it is compatible with various personalized text-to-image (T2I) models for different content styles. Extensive experiments demonstrate that the proposed FMC outperforms previous methods across multiple scenarios.

1. Visualization of SynFMC

Environment Categories

⭐ The environments in SynFMC span five types: ground, near ground, sky, water surface, and underwater.

Ground

Near Ground

Sky

Water Surface

Underwater

Scene Categories

⭐ The scenes in SynFMC span four types: static single-object, static multi-object, dynamic single-object, and dynamic multi-object. Static means fixed object locations in world space while the camera remains movable.

Static Single-Object

Dynamic Single-Object

Static Multi-Object

Dynamic Multi-Object

Auxiliary Annotation of SynFMC

⭐ Besides 6D poses of objects and the camera, SynFMC also provides auxiliary annotations, including instance segmentation maps, depth maps, and descriptions of both visual content and motion.

0442a954

3. Results of FMC

Independent Control of Camera / Object

⭐ The first/last two examples are the results from independent control of camera/object:

canyon rim with a view of red rocks

cactus in the garden

a balloon floating over the road

a butterfly flying over the ground

Simultaneous Control of Camera & Object

⭐ The first/last two examples are results from static/dynamic single-object scene:

a yellow mushroom on the road

a cat in the grass covered with leaves

a butterfly is flying over the ground

a balloon floating in the cloudy sky

⭐ The first/last two examples are results from static/dynamic multi-object scene:

a deer and a man in the grass

two birds in meadow

two UFOs are flying over the city

a shark and a yellow fish are swimming in the sea

BibTeX

Please consider to cite SynFMC if it helps your research.

@inproceedings{SynFMC,
  title={{Free-Form Motion Control}: Controlling the 6D Poses of Camera and Objects in Video Generation},
  author={Shuai, Xincheng and Ding, Henghui and Qin, Zhenyuan and Luo, Hao and Ma, Xingjun and Tao, Dacheng},
  booktitle={ICCV},
  year={2025}
}

Free-Form Motion Control: Controlling the 6D Poses of Camera and Objects in Video Generation