Nvidia’s Magic3D turns text into high-resolution 3D objects

Summary

Nvidia’s Magic3D can create 3D objects based on text input. The model is said to significantly outperform Google’s Dreamfusion 3D text model, which was only introduced in September.

Like Dreamfusion, Magic3D is essentially based on an image generation model that uses text to create images from different perspectives, which in turn serve as input for 3D generation. Nvidia’s research team uses its in-house eDiffi image model for this, while Google relies on Imagen.

The advantage of this method is that the generative AI model does not need to be trained with rare 3D models. Unlike Nvidia’s freely available 3D text model Get3D, Magic3D can also generate many 3D models of different categories without additional training.

From coarse to fine

With Magic3D, Nvidia goes from coarse to fine: first, eDiffi generates low-resolution text-based images, which are then transformed into an initial 3D representation via Nvidia’s Instant NGP framework.

The Magic3D generation process: Since 3D data for AI training is scarce, the AI ​​system generates 3D models using AI-generated 2D images of an object in different prospects. Using a coarse-to-fine approach, Nvidia achieves better results in less time than Google’s Dreamfusion. | Dild: Nvidia

The result is a 3D model with a resolution of up to 512 x 512 pixels that can be imported and viewed in standard graphics software, according to Nvidia.

Augmenting 3D content creation with natural language could dramatically help democratize 3D content creation for novices and expert artists alike.

Paper

Magic3D surpasses Dreamfusion in resolution and speed

According to the Nvidia research team, Magic3D takes half the time to create a 3D model compared to Dreamfusion – about 40 minutes instead of an hour and a half average – at eight times the resolution.

The following video explains the creation process and shows comparisons of 3D models with Dreamfusion starting at minute 2:40. In the first tests, 61% of users preferred Magic3D models over Dreamfusion 3D models.

Video: Nvidia

Recommendation

Google: a new graphical AI renders photorealistic 360-degree scenes
Google: a new graphical AI renders photorealistic 360-degree scenes

Magic3D also offers editing functions typical of image AI systems, which can be transferred to the 3D generation process. For example, text prompts can be adjusted after the initial build: a squirrel on a bike turns into a rabbit on a scooter.

An example of quick editing with Magic3D. | Picture: Nvidia

Dreambooth’s fine-tuning of the eDiffi broadcast model also helps optimize generated 3D models for specific subjects. The model can also transfer the style of an input image to a 3D model.

Magic3D can transfer the style of a 2D image to a generated 3D model. | Picture: Nvidia

Nvidia’s research team hopes that Magic3D can “democratize 3D synthesis” and encourage creativity in 3D content creation. It seems to be on the mind of Silicon Valley venture capitalist Andreessen Horowitz: She speculates that generative AI will transform the gaming industry, which relies on all sorts of media formats and 3D content into particular.

Leave a Comment

Your email address will not be published. Required fields are marked *