Nvidia’s new text-to-3D model shows the pace of generative AI evolution

Artificial intelligence is advancing very quickly, and Nvidia’s newest creation, the text-to-3D model LATTE3D, is a perfect example of this rapid progress in AI technology. Following the debut of its powerful Blackwell superchip, designed for training advanced AI models, at the NVIDIA GTC event held from March 18 to 21, Nvidia has introduced LATTE3D, a groundbreaking text-to-3D generative AI model.

LATTE3D acts like a virtual 3D printer, transforming text prompts into detailed 3D objects and animals within seconds. What sets LATTE3D apart is its remarkable speed and quality. Unlike previous models, LATTE3D can generate intricate 3D shapes almost instantly on a single GPU, such as the NVIDIA RTX A6000, which was used for the NVIDIA Research demo.

Credit: Nvidia

This advancement means creators can now achieve near-real-time text-to-3D generation, revolutionizing how ideas are brought to life, according to Sanja Fidler, Nvidia’s vice president of AI research.

“A year ago, it took an hour for AI models to generate 3D visuals of this quality – and the current state of the art is now around 10 to 12 seconds. We can now produce results an order of magnitude faster, putting near-real-time text-to-3D generation within reach for creators across industries,” ” Fidler said.

The researchers said they trained LATTE3D on specific datasets like animals and everyday objects. But developers have the flexibility to use the same model architecture to train it on other types of data.

For example, if LATTE3D is taught using a collection of 3D plant images, it could help a landscape designer by swiftly adding trees, bushes, and succulents to a digital garden design while working with a client. Likewise, if it’s trained on household items, the model could create objects to fill virtual home environments, assisting developers in preparing personal assistant robots for real-life tasks.

To train LATTE3D, NVIDIA used its powerful A100 Tensor Core GPUs. Additionally, the model learned from a variety of text prompts generated by ChatGPT, enhancing its ability to understand different descriptions of 3D objects. For example, it can recognize that prompts about various dog breeds should all result in dog-like shapes.

Nvidia's new text-to-3D model shows the pace of generative AI evolution — 3D dogs generated by the Nvidia LATTE3D AI model. Image Credit: Nvidia

NVIDIA Research is comprised of hundreds of scientists and engineers worldwide, with teams dedicated to various fields including AI, computer graphics, computer vision, self-driving cars, and robotics.

“This leap is huge. DreamFusion circa 2022 was slow and low quality, but kicked off this generative 3D revolution. Efforts like ATT3D (Amortized Text-to-3D Object Synthesis) chased speed at the cost of quality,” AI creator Bilawal Sidhu wrote on X (formerly Twitter).