Investigating Challenges in Generalizing Neural Radiance Fields with Learned Scene Priors
Published:
This thesis, conducted under the supervision of Shohei Mori at the Visualization Research Center (VISUS), investigates how neural radiance fields (NeRFs) can be extended from single-scene models toward generalizable representations. Such models promise major advantages in precomputability and applicability across autonomous systems, navigation, and immersive AR/VR environments.
This thesis was carried out at the Visualization Research Center (VISUS) under the supervision of Shohei Mori and explores the question of how neural radiance fields (NeRFs) can move beyond single-scene specialization toward generalizable models. While conventional NeRFs require costly retraining for each individual scene, a generalized approach offers the possibility of precomputability — enabling networks to reuse learned priors and adapt to new environments without starting from scratch. This capability is of particular interest for AI foundation models, autonomous navigation, and immersive AR/VR applications, where fast adaptation and scalability are critical.
The work therefore investigates implicit scene embeddings as priors for generalized NeRFs and analyzes the architectural and training challenges that emerge from this shift in focus. The aim is to provide both theoretical groundwork and practical experiments that highlight the opportunities of scene-generalized NeRFs as well as the open problems that currently prevent robust generalization.
I have also prepared some slides for easy understanding you can view them here. The thesis itself can be viewed here.
A short abstract:
This thesis investigates methods to generalize neural radiance fields across several different 3D-Scenes. Unlike prevailing approaches that emphasize more fine grained priors on ray or sample positions - often combined with classical 3D spatial (neural) processing, this work explores the use of implicit deep scene embeddings as prior to a generalized neural radiance field for scene rendering. This work provides the theoretical groundwork for transitioning gradually from per scene retraining to a more perceiving network capable of extracting scene geometries by analyzing images and successfully building good latent representations. In practical research the framework is implemented and analyzed. Here several key issues in the conceptualization are found and analyzed, that must be addressed in training process and model redesign. Overall this thesis outlines a potential path toward scene-generalized NeRFs and highlights new issues that emerge through this shift in research focus.