PVSNet

Real-Time Light Field Reconstruction and Neural Rendering

The overall aim of this PhD project is to advance the field of immersive telepresence by developing and evaluating novel, learning based methods for real-time immersive view synthesis from a single image.
PVSDNet Project Webpage

Abstract

 

Building on the foundations of spherical light field reconstruction and rendering, this work introduces a series of progressively more capable and efficient view synthesis networks. These methods evolve from full light field reconstruction to a direct, position-aware approach capable of rendering novel views within a large volume from a single input image at interactive frame rates. To enhance these synthesized views for augmented reality (AR) applications, a unified multimodal network is developed that jointly predicts a new view and its corresponding depth map from a shared latent space. This multimodal approach ensures multi-view and geometric consistency between appearance and geometry, which is crucial for preventing visual artifacts like flickering.

 

PVSNet: Real-Time Position-Aware View Synthesis from Single-View Input

Watch the video on YouTube

We introduce a lightweight, position-aware network designed for real-time view synthesis from a single input image and a target camera pose. The proposed framework consists of a Position Aware Embedding, which efficiently maps positional information from the target pose to generate high dimensional feature maps. These feature maps, along with the input image, are fed into a Rendering Network that merges features from dual encoder branches to resolve both high and low level details, producing a realistic new view of the scene.

Code on GitHub

PVSDNet: Joint Depth Prediction and View Synthesis via Shared Latent Spaces in Real-Time

Watch the video on YouTube

We propose a unified multimodal network capable of jointly synthesizing new views and predicting consistent depth maps from a single input image. Our framework integrates an additional depth prediction module into a state-ofthe-art view synthesis architecture by leveraging a shared latent representation, thereby ensuring geometric coherence between synthesized views and their depth maps.

Code on GitHub

LFSphereNet: Real Time Spherical Light Field Reconstruction from a Single Omnidirectional Image

Watch the video on YouTube

We propose a fully learning-based method for spherical light field reconstruction from a single omnidirectional image. The proposed LFSphereNet utilizes two different networks: The first network learns to reconstruct the light field in cubemap projection (CMP) format given the six cube faces of an omnidirectional image and the corresponding cube face positions as input. The cubemap format implies a linear re-projection, which is more appropriate for a neural network. The second network refines the reconstructed cubemaps in equirectangular projection (ERP) format by removing cubemap border artifacts.

Code on GitHub

Publications

List of relevant publications

PVSDNet: Joint Depth Prediction and View Synthesis Via Shared Latent Spaces in Real-Time

Manu Gond, Emin Zerman, Sebastian Knorr, and Mårten Sjöström
IEEE Access 2026

We propose an unified multimodal network capable of jointly synthesizing new views and predicting consistent depth maps from a single input image.

A Visual Quality of Experience Toolkit for Realistic Immersive Telepresence Applications

Manu Gond, Emin Zerman, Mohammadreza Shamshirgarha, Sebastian Knorr, and Mårten Sjöström
17th International Conference on Quality of Multimedia Experience (QoMEX) 2025

We introduce a visual quality of experience toolkit for realistic immersive telepresence applications

Real-Time View Synthesis with Multiplane Image Network using Multimodal Supervision

Manu Gond, Mohammadreza Shamshirgarha, Emin Zerman, Sebastian Knorr, and Mårten Sjöström
IEEE 27th International Workshop on Multimedia Signal Processing (MMSP) 2025

We introduce a framework that directly predicts multi-plane-image (MPI) parameters from a single RGB image for real-time view synthesis.

LFSphereNet: Real Time Spherical Light Field Reconstruction from a Single Omnidirectional Image

Manu Gond, Emin Zerman, Sebastian Knorr, Mårten Sjöström
20th ACM SIGGRAPH Conference on Visual Media Production (CVMP 2023)

Considering the applications in real-time immersive telepresence, this paper investigates how a single omnidirectional image (ODI) can be used to extend 3DoF to 6DoF. To achieve this, we propose a fully learning-based method for spherical light field reconstruction from a single omnidirectional image.

Comments