Large-Scale High-Quality 3D Gaussian Head Reconstruction from Multi-View Captures

01

Abstract

We propose HeadsUp, a scalable feed-forward method for reconstructing high-quality 3D Gaussian heads from large-scale multi-camera setups. Our method employs an efficient encoder-decoder architecture that compresses input views into a compact latent representation. This latent representation is then decoded into a set of UV-parameterized 3D Gaussians anchored to a neutral head template. This UV representation decouples the number of 3D Gaussians from the number and resolution of input images, enabling training with many high-resolution input views. We train and evaluate our model on an internal dataset with more than 10,000 subjects, which is an order of magnitude larger than existing multi-view human head datasets. HeadsUp achieves state-of-the-art reconstruction quality and generalizes to novel identities without test-time optimization. We extensively analyze the scaling behavior of our model across identities, views, and model capacity, revealing practical insights for quality-compute trade-offs. Finally, we highlight the strength of our latent space by showcasing two downstream applications: generating novel 3D identities and animating the 3D heads with expression blendshapes.

02

Novel Identity Reconstruction Results (Internal10k)

Per-subject reconstruction results on our internal multi-camera capture dataset with 10,000 subjects under controlled diffuse lighting. Select a view type and validation subject below.

Subject 1

Subject 2

Subject 3

Subject 4

Subject 5

03

Novel Identity Reconstruction Results (Ava-256)

Per-subject reconstruction results on Ava-256, showing renderings from novel viewpoints for different expressions. Select a method and validation subject below.

PGO261

UHV563

BGR645

IBQ026

LCJ763

PDG961

APP152

PSV686

TCE049

YJF815

INQ807

KWL586

04

Text-driven Identity Generation

Text-conditioned generation of novel 3D Gaussian head identities.

05

Blendshape-driven Latent Animation

Our reconstructed Gaussian heads are riggable via blendshape parameters. We extract blendshapes of the input images to drive and animate our latent. Select a subject on the right to change all reconstructions.

Source Identity

Smile

Driving Expression

Rendered Expression

Raise Brows

Driving Expression

Rendered Expression

Subject

Subject 1

Subject 2

Subject 3

Subject 4

06

Blendshape UI Visualization

Manipulation of reconstructed 3D Gaussian heads with direct blendshape and camera control.

07

Citation

If you find this work useful, please cite our paper.

@inproceedings{ntavelis2026headsup,
  title={Large-Scale High-Quality 3D Gaussian Head Reconstruction from Multi-View Captures},
  author={Ntavelis, Evan and Wu, Sean and Shahbazi, Mohamad and Maninchedda, Fabio and Kostiaev, Dmitry and Sevastopolsky, Artem and Megaro, Vittorio and Phillips, Trevor and Blumentals, Alejandro and Ravikumar, Shridhar and Gupta, Mehak and Knothe, Reinhard and Bayer, Jeronimo and Vestner, Matthias and Schaefer, Simon and Etterlin, Thomas and Zimmermann, Christian and Artemov, Alexey and Deschler, Mathias and Kaufmann, Peter and Brugger, Stefan and Martin, Sebastian and Amberg, Brian and Runia, Tom},
  booktitle={European Conference on Computer Vision ({ECCV})},
  year={2026},
  publisher={Springer}
}