Apple Sharp Monocular View Synthesis in Less Than a Second SHARP
  • Abstract
  • Videos

Sharp Monocular View Synthesis in Less Than a Second

Lars Mescheder, Wei Dong, Shiwei Li, Xuyang Bai, Marcel Santos, Peiyun Hu, Bruno Lecouat, Mingmin Zhen,

Amaël Delaunoy, Tian Fang, Yanghai Tsin, Stephan R. Richter, Vladlen Koltun

Apple

Paper (arXiv) Code (GitHub)

Abstract

We present SHARP, an approach to photorealistic view synthesis from a single image. Given a single photograph, SHARP regresses the parameters of a 3D Gaussian representation of the depicted scene. This is done in less than a second on a standard GPU via a single feedforward pass through a neural network. The 3D Gaussian representation produced by SHARP can then be rendered in real time, yielding high-resolution photorealistic images for nearby views. The representation is metric, with absolute scale, supporting metric camera movements. Experimental results demonstrate that SHARP delivers robust zero-shot generalization across datasets. It sets a new state of the art on multiple datasets, reducing LPIPS by 25–34% and DISTS by 21–43% versus the best prior model, while lowering the synthesis time by three orders of magnitude.

Input
Views synthesized by SHARP
SHARP

SHARP synthesizes a photorealistic 3D representation from a single photograph in less than a second. The synthesized representation supports high-resolution rendering of nearby views, with sharp details and fine structures, at more than 100 frames per second on a standard GPU. We illustrate on photographs from Unsplash.

Video Comparisons

-5wkyNA2BPc_0000-0001
-B_lu05yfgE_0000-0001
-6ebJNtXtWs_0000-0001
-591oIJnyEQ_0000-0001
-Ejwm8Z0cAU_0000-0001
Select a video to compare
courtyard_00000_0000-0001
facade_00000_0000-0001
pipes_00000_0000-0001
relief_00000_0000-0003
terrains_00002_0000-0001
Select a video to compare
49b2bcfdd9_000_0000-0001
ea068642ad_000_0000-0001
4e99cbe338_000_0000-0001
b1f97f9954_000_0000-0001
9093b3b791_000_0000-0001
Select a video to compare
d755b3d9d8_00004_0000-0006
fb5a96b1a2_00110_0000-0004
e7af285f7d_00075_0000-0009
f9f95681fd_00008_0000-0004
09c1414f1b_00000_0000-0001
Select a video to compare
Church_00022_0000-0002
Church_00040_0000-0001
Meetingroom_00004_0000-0002
Meetingroom_00023_0000-0001
Truck_00007_0000-0002
Select a video to compare
train+balanced+Motorcycle+camera_00+im6.png_00000_0000-0001
train+balanced+Bathroom+camera_00+im0.png_00000_0000-0001
train+balanced+Toilet+camera_00+im4.png_00000_0000-0001
train+balanced+Moka1+camera_00+im1.png_00000_0000-0001
train+balanced+CoffeeMaker+camera_00+im0.png_00000_0000-0001
Select a video to compare
TV+scene_000_00028_0000-0002
truck+scene_064_00031_0000-0007
remote_control+scene_127_00031_0000-0005
bottle+scene_198_00020_0000-0001
banana+scene_003_00021_0000-0005
Select a video to compare

Citation

@inproceedings{Sharp2025:arxiv,
  title      = {Sharp Monocular View Synthesis in Less Than a Second},
  author     = {Lars Mescheder and Wei Dong and Shiwei Li and Xuyang Bai and Marcel Santos and Peiyun Hu and Bruno Lecouat and Mingmin Zhen and Ama\"{e}l Delaunoyand Tian Fang and Yanghai Tsin and Stephan R. Richter and Vladlen Koltun},
  journal    = {arXiv preprint arXiv:2512.10685},
  year       = {2025},
  url        = {https://arxiv.org/abs/2512.10685},
}

© 2025 Apple. All rights reserved.