Track2View: 4D-Consistent Camera-Controlled Video Generation via Paired 3D Point Tracks

Re-render any video under arbitrary target camera trajectories — with explicit, temporally continuous source-to-target correspondence.

Feng Qiao¹, Zhaochong An², Zhexiao Xiong¹, Serge Belongie², Nathan Jacobs¹

¹Washington University in St. Louis ²University of Copenhagen

📢 News

[2026.06] Project page and paper are now available. Code and data annotations are coming soon.

📖 Abstract

Re-rendering an existing video from a novel camera viewpoint requires the output to follow the prescribed camera trajectory while preserving the appearance and dynamics of the original scene across every frame. Existing methods rely on per-frame pose embeddings, noisy point-cloud renderings, or implicit learned correspondences, none of which provides an explicit, temporally continuous link between source and target pixels. We propose Track2View, which conditions a video diffusion transformer on paired 3D point tracks: sparse trajectories of scene points projected into both the source and target camera views. These tracks provide explicit spatiotemporal correspondences that are temporally continuous by construction, encoding what content should appear where and when. At the core of Track2View is a dual-view track conditioner that transfers visual context from source to target view through parameter-free geometric operations and learned temporal aggregation, ensuring generalization to arbitrary camera trajectories without memorizing specific motions. We further introduce a data curation pipeline that extracts one-to-one track correspondences by running a 3D point tracker on temporally concatenated multi-camera view pairs. On a 400-video benchmark spanning static and dynamic scenes, Track2View achieves state-of-the-art results across visual quality, view synchronization, and camera accuracy, reducing rotation error by 30–65% and translation error by 61–72% relative to leading baselines.

🚧 TODO

Release inference code and pretrained checkpoints
Release training code
Release data annotations (paired 3D point tracks)

🌟 Citation

If you find our work helpful, please consider leaving a star and citing our paper:

@article{track2view2026,
  title   = {Track2View: 4D-Consistent Camera-Controlled Video Generation
             via Paired 3D Point Tracks},
  author  = {Feng Qiao and Zhaochong An and Zhexiao Xiong and Serge Belongie and Nathan Jacobs},
  journal = {arXiv preprint arXiv:2606.15534},
  year    = {2026}
}

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
assets		assets
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Track2View: 4D-Consistent Camera-Controlled Video Generation via Paired 3D Point Tracks

📢 News

📖 Abstract

🚧 TODO

🌟 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Track2View: 4D-Consistent Camera-Controlled Video Generation via Paired 3D Point Tracks

📢 News

📖 Abstract

🚧 TODO

🌟 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages