Nvidia’s Magic3D can create 3D objects based on text input. The model is said to significantly outperform Google’s Dreamfusion 3D text model, which was only introduced in September.
Like Dreamfusion, Magic3D is essentially based on an image generation model that uses text to create images from different perspectives, which in turn serve as input for 3D generation. Nvidia’s research team uses its in-house eDiffi image model for this, while Google relies on Imagen.
The advantage of this method is that the generative AI model does not need to be trained with rare 3D models. Unlike Nvidia’s freely available 3D text model Get3D, Magic3D can also generate many 3D models of different categories without additional training.
From coarse to fine
With Magic3D, Nvidia goes from coarse to fine: first, eDiffi generates low-resolution text-based images, which are then transformed into an initial 3D representation via Nvidia’s Instant NGP framework.
The result is a 3D model with a resolution of up to 512 x 512 pixels that can be imported and viewed in standard graphics software, according to Nvidia.
Augmenting 3D content creation with natural language could dramatically help democratize 3D content creation for novices and expert artists alike.
Magic3D surpasses Dreamfusion in resolution and speed
According to the Nvidia research team, Magic3D takes half the time to create a 3D model compared to Dreamfusion – about 40 minutes instead of an hour and a half average – at eight times the resolution.
The following video explains the creation process and shows comparisons of 3D models with Dreamfusion starting at minute 2:40. In the first tests, 61% of users preferred Magic3D models over Dreamfusion 3D models.
Magic3D also offers editing functions typical of image AI systems, which can be transferred to the 3D generation process. For example, text prompts can be adjusted after the initial build: a squirrel on a bike turns into a rabbit on a scooter.
Dreambooth’s fine-tuning of the eDiffi broadcast model also helps optimize generated 3D models for specific subjects. The model can also transfer the style of an input image to a 3D model.
Nvidia’s research team hopes that Magic3D can “democratize 3D synthesis” and encourage creativity in 3D content creation. It seems to be on the mind of Silicon Valley venture capitalist Andreessen Horowitz: She speculates that generative AI will transform the gaming industry, which relies on all sorts of media formats and 3D content into particular.