Meta’s Latest Artificial Intelligence Technology Enhances Virtual Reality Video Immersion

Image: Meta

Der Artikel kann nur mit aktiviertem JavaScript dargestellt werden. Bitte aktiviere JavaScript in deinem Browser and lade die Seite neu.

Meta shows off HyperReel, a new way to store and render 6-DoF video. HyperReel could be used in AR and VR applications.

For years, 3D 180° or 360° videos have been the culmination of many efforts to produce the most immersive videos possible for virtual reality. Better cameras and higher resolutions are becoming available.

But an important step has not yet been taken: immersive six-degree-of-freedom (6-DoF) videoswhich make it possible to change the position of the head in space in addition to the direction of gaze.

There have already been early attempts to make these particularly immersive videos suitable for mass consumption, such as Google’s Lightfields technology or experiments with volumetric videos such as Sony’s Joshua Bell video.

In recent times, research has increasingly focused on “view synthesis” methods. These are AI methods that can render new insights into an environment. Neural Radiance Fields (NeRF) are an example of such a technique. They learn 3D representations of objects or entire scenes from a video or many photos.

6-DoF video should be fast, high quality, and sparse.

Despite many advances in view synthesis, no method provides high-quality representations that are simultaneously rendered quickly and require little memory. For example, even with current methods, synthesizing a single megapixel image can take nearly a minute, while dynamic scenes quickly require terabytes of memory. Additionally, capturing reflections and refraction is a major challenge.

Researchers from Carnegie Mellon University, Reality Labs Research, Meta and the University of Maryland are now demonstrating HyperReela memory-efficient method that can perform high-resolution real-time rendering.

To do this, the team relies on a neural network that learns to take rays as input and output parameters like color for a set of geometric primitives and displacement vectors. The team relies on the prediction of these geometric primitives in the scene, such as planes or spheres, and calculates intersections between rays and geometric primitives, instead of the hundreds of points along the ray path that are common in NeRFs.

Additionally, the team uses a memory-efficient method to render dynamic scenes with a high compression ratio and interpolation between individual frames.

Metas HyperReel reaches between 6.5 and 29 frames per second

The quality of the dynamic and static scenes shown surpasses most other approaches. The team achieves between 6.5 and 29 frames per second on an Nvidia RTX 3090 GPU, depending on the scene and model size. However, the 29fps is currently only possible with the Tiny model, which renders significantly lower resolutions.

Video: Meta

Video: Meta

Unlike NeRFPlayer, HyperReel is not suitable for streaming. According to Meta, this would be an easy fix because the file size is small: NeRFPlayer requires around 17 megabytes per frame, Google’s Immersive Light Field Video 8.87 megabytes per frame, and HyperReel only 1.2 megabytes.

HyperReel is not yet suitable for real-time virtual reality applications, where ideally at least 72 frames per second must be rendered in stereo. However, since the method is implemented in vanilla PyTorch, a significant speed boost could be achieved in the future with additional technical effort, Meta said.

More information, examples, and code are available on GitHub.

Leave a Comment

Your email address will not be published. Required fields are marked *