Video-to-4D Gaussians viewer
We train a generative model that maps a monocular input video to our latent space of dynamic tokens, which our decoder then converts into 3D Gaussians. For an interactive look at the Gaussians produced by our video-to-4D model, just click any thumbnail below!
Cloth simulation (image-to-4D) Gaussian viewer
We train a generative model, to map from an input initial cloth position (given as an image) to our latent space of dynamic tokens, esentially solving an image to 4D problem. The generated tokens can be decoded into 3D Gaussians, using our trained decoder. For an interactive look at the Gaussians produced by our cloth simulation model, just click any thumbnail below!
3D tracking viewer
We train a separate model which given an input RGBD video (encoded into the latent space of dynamic tokens) learns to track query points on the first frame across the video in 3D. Click any thumbnail below for an interactive viewer of the predicted tracks!
Video gallery
Citation
@inproceedings{malik2025velox,
author = {Malik, Anagh and Chan, Dorian and Zhao, Xiaoming and Lindell, David B. and Tuzel, Oncel and Chang, Jen-Hao Rick},
title = {Velox: Learning Representations of 4D Geometry and Appearance},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2026}
}