Unlocking AI's 3D Narrative: Li Feifei and Google Take the Lead

AIGC's 3D track suddenly became lively.
On December 5th, Google DeepMind released the new generation world model Genie 2, which can "generate a 1-minute game 3D world from a single image", causing netizens to exclaim that "the hacker empire is here".
Just two days ago, "AI godmother" Li Feifei's World Labs officially announced a "spatial intelligence" model that supports "generating a 3D world from one image".
This is another wave of discussion on world models after Sora. From text to images, and then to videos and interactive 3D worlds, AIGC has made significant leaps overall.
For the industrial sector, creative design work and interactive experience workflows have received strong support. The world model can provide infinitely diverse and controllable 3D environments for agent training, embodied intelligence training, complex animation production, game production, physics modeling, and other fields.
Some industry insiders also say that the progress of the world model means that the ultimate AGI (General Artificial Intelligence) is one step closer.
Google expands the breadth towards AGI
Genie 2 is Google's second-generation world model, which can generate an operable 3D environment through keyboard and mouse input given an image.
The characters in the image can be recognized by the keyboard and respond to intelligent operations.
The same starting frame can generate different motion trajectories.
Genie 2 has consistent memory before and after, and even when the surrounding scenes are not visible, there will be no distortion.
What's valuable is that Genie 2 can generate new scenes in real-time based on the visuals, with a maximum duration of one minute.
This interface has similarities with games.
Games play a crucial role in the field of artificial intelligence research. Their captivating graphics, unique challenge combinations, and measurable advancements make them an ideal environment for safety testing and advancing AI functionality, "Google admitted." In fact, games have always been important to Google DeepMind and an important way for Google to train agents
However, the industry has encountered bottlenecks in the training of embodied intelligence.
A sufficiently rich and diverse training environment is necessary to promote practical progress in embodied intelligence. 21st Century Business Herald reporters learned from industry insiders in the humanoid robot industry that currently, generalization ability is a major pain point for humanoid robots.
Genie 2 is expected to help embodied intelligence solve training bottlenecks.
In terms of interactive functions, Genie 2 can model interactive relationships, such as blasting balloons, opening doors, and shooting explosive barrels.
This makes it much simpler to create diverse interactive scenes. By utilizing Genie 2 to rapidly build various interactive experience prototypes, researchers can quickly train and test embodied intelligent AI in new environments.
For example, using different images generated by Imagen 3 to prompt Genie 2 to model the differences between paper airplane, dragon, eagle, or parachute flight, and test Genie's ability to control different objects.
That is to say, AI agents can obtain almost infinite training scenarios and interaction systems in the world model.
Although this research is still in its early stages, Google researchers believe that Genie 2 is an effective path to addressing the structural issues of safety training embodied intelligence, unlocking the next wave of capabilities in embodied intelligence, and achieving the breadth and generality required to move towards AGI.
Li Feifei realizes the concept of spatial intelligence
World Labs is the first entrepreneurial project of renowned AI scholar and Chinese scientist Feifei Li, established in January 2024. By the time the company was founded six months ago, its valuation had exceeded $1 billion.
This is a space intelligence company dedicated to building large-scale world models that can perceive, generate, and interact with the 3D world. The plan is to generate virtual 3D spaces where users can manipulate variables and allow people to "create their own 3D worlds". World Labs points out that its software will be helpful to various practitioners, including artists, designers, developers, and engineers.
On December 3rd, World Labs submitted the 1.0 version assignment.
A 3D world can be generated from a single image, and users can essentially "step into" any image and explore in 3D.
The tool is also equipped with controllable sliders to adjust the simulated depth of field and simulated push-pull zoom. It supports adjusting the camera's position and field of view, changing object colors, creating spotlight effects, automatic dynamic effects, and other interactive methods, enriching the visual experience and providing a stronger sense of control.
Like Genie 2, World Labs' spatial intelligence models can also ensure consistency in the 3D world, making scenes more durable and existing once generated; Users can control and move the scene in real-time, and carefully observe the details in the scene.
The world model follows the basic physical rules of 3D geometry, combining realism and depth, effectively improving the controllability and consistency of content, and changing the way movies, games, simulators, and other digital representations of the physical world are made.
Jim Fan, Senior Research Scientist at NVIDIA, commented that "GenAI is creating increasingly high-dimensional snapshots of human experiences. Stable Diffusion is a 2D snapshot; Sora is a snapshot of 2D+time dimension; And World Labs is a 3D, fully immersive snapshot
At present, Worldlabs has opened up waiting list applications to the public, and some creators can already integrate this AI tool into their existing workflows.
In the field of film and television production, AI's 3D narrative capability will greatly improve the efficiency and quality of content creation, and reduce production costs. Creators can generate virtual scenes and characters more quickly, and use AI generated 3D worlds to build richer and more diverse story backgrounds, bringing audiences a brand new visual experience.
For example, using Worldlabs technology to generate virtual shooting scenes before filming helps directors and photographers better plan shots and scene arrangements, improving shooting efficiency and accuracy.
For the gaming industry, 3D generation will bring more possibilities for game development. Developers can use AI to generate more realistic and delicate game scenes and characters, enhancing the immersion of the game.
In the field of education, 3D content generated by large models can create more vivid and intuitive teaching scenarios, enhancing the experience of subjects such as science and history.
Li Feifei believes that "spatial intelligence" is a key part of the AI puzzle. She said in a TED talk in April this year, "Vision becomes insight; insight becomes understanding; understanding drives action. All of this generates intelligence
The space intelligence field represented by Genie 2 and Worldlabs is an important new direction for the development of AI technology. It breaks through the limitations of traditional AI on a two-dimensional plane, expanding AI's perception and understanding capabilities to three-dimensional space, making it more intuitive and closer to the essence of interaction.

浏览过的版块