Video-to-4D Gaussians viewer

We train a generative model that maps a monocular input video to our latent space of dynamic tokens, which our decoder then converts into 3D Gaussians. For an interactive look at the Gaussians produced by our video-to-4D model, just click any thumbnail below!

Cloth simulation (image-to-4D) Gaussian viewer

We train a generative model, to map from an input initial cloth position (given as an image) to our latent space of dynamic tokens, esentially solving an image to 4D problem. The generated tokens can be decoded into 3D Gaussians, using our trained decoder. For an interactive look at the Gaussians produced by our cloth simulation model, just click any thumbnail below!

3D tracking viewer

We train a separate model which given an input RGBD video (encoded into the latent space of dynamic tokens) learns to track query points on the first frame across the video in 3D. Click any thumbnail below for an interactive viewer of the predicted tracks!

Video gallery

Press me to see the Video Gallery!

Citation

@inproceedings{malik2025velox,
  author    = {Malik, Anagh and Chan, Dorian and Zhao, Xiaoming and Lindell, David B. and Tuzel, Oncel and Chang, Jen-Hao Rick},
  title     = {Velox: Learning Representations of 4D Geometry and Appearance},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year      = {2026}
}

Velox 🚀: Learning Representations of 4D Geometry and Appearance

Video-to-4D Gaussians viewer

Cloth simulation (image-to-4D) Gaussian viewer

3D tracking viewer

Video gallery

Citation

Velox 🚀: Learning Representations
of 4D Geometry and Appearance