LiTo: Surface Light Field Tokenization

ICLR 2026


Apple


LiTo is a 3D latent representation that jointly captures object geometry and view-dependent appearance. Built upon this unified representation, a latent flow matching model enables high-quality image-to-3D generation.

Reconstruction Comparison

Ground Truth
Ours
TRELLIS

Image-to-3D Generation Comparison

Note that TRELLIS (Xiang et al., 2025) does not respect the camera coordinate system, so sometimes their output objects will be oriented incorrectly.

Conditioning Image
Conditioning Image
Ours
TRELLIS

Interactive 3DGS Comparison

Please click each image to open the side-by-side 3DGS viewer for comparison between LiTo and TRELLIS (Xiang et al., 2025).

BibTeX



                    @inproceedings{chang2026lito,
  author = {Jen-Hao Rick Chang$^\ast$ and Xiaoming Zhao$^\ast$ and Dorian Chan and Oncel Tuzel},
  title = {{LiTo: Surface Light Field Tokenization}},
  booktitle = {International Conference on Learning Representations},
  year = {2026},
}