LiTo is a 3D latent representation that jointly captures object geometry and view-dependent appearance. Built upon this unified representation, a latent flow matching model enables high-quality image-to-3D generation.
Image-to-3D Generation Comparison
Note that TRELLIS (Xiang et al., 2025) does not respect the camera coordinate system, so sometimes their output objects will be oriented incorrectly.
Conditioning Image
Ours
TRELLIS
Interactive 3DGS Comparison
Please click each image to open the side-by-side 3DGS viewer for comparison between LiTo and TRELLIS (Xiang et al., 2025).
Apple (reconstruction)
The 3D asset is created by DigitalSouls and distributed under CC Attribution-NonCommercial license. We accessed it from here in March 2026.
Steampunk(generation)
The 3D asset is created by 3d-coat and distributed under CC Attribution license. We accessed it from here in March 2026.
Beetle (generation)
The given input image is AI-generated.
Bone (generation)
The given input image is captured in the wild.
BibTeX
@inproceedings{chang2026lito,
author = {Jen-Hao Rick Chang$^\ast$ and Xiaoming Zhao$^\ast$ and Dorian Chan and Oncel Tuzel},
title = {{LiTo: Surface Light Field Tokenization}},
booktitle = {International Conference on Learning Representations},
year = {2026},
}