Unlocking AI's 3D Narrative: Li Feifei and Google Take the Lead
hughmini
发表于 6 일전
1161
0
0
AIGC's 3D track suddenly became lively.
On December 5th, Google DeepMind released the new generation world model Genie 2, which can "generate a 1-minute game 3D world from a single image", causing netizens to exclaim that "the hacker empire is here".
Just two days ago, "AI godmother" Li Feifei's World Labs officially announced a "spatial intelligence" model that supports "generating a 3D world from one image".
This is another wave of discussion on world models after Sora. From text to images, and then to videos and interactive 3D worlds, AIGC has made significant leaps overall.
For the industrial sector, creative design work and interactive experience workflows have received strong support. The world model can provide infinitely diverse and controllable 3D environments for agent training, embodied intelligence training, complex animation production, game production, physics modeling, and other fields.
Some industry insiders also say that the progress of the world model means that the ultimate AGI (General Artificial Intelligence) is one step closer.
Google expands the breadth towards AGI
Genie 2 is Google's second-generation world model, which can generate an operable 3D environment through keyboard and mouse input given an image.
The characters in the image can be recognized by the keyboard and respond to intelligent operations.
The same starting frame can generate different motion trajectories.
Genie 2 has consistent memory before and after, and even when the surrounding scenes are not visible, there will be no distortion.
What's valuable is that Genie 2 can generate new scenes in real-time based on the visuals, with a maximum duration of one minute.
This interface has similarities with games.
Games play a crucial role in the field of artificial intelligence research. Their captivating graphics, unique challenge combinations, and measurable advancements make them an ideal environment for safety testing and advancing AI functionality, "Google admitted." In fact, games have always been important to Google DeepMind and an important way for Google to train agents
However, the industry has encountered bottlenecks in the training of embodied intelligence.
A sufficiently rich and diverse training environment is necessary to promote practical progress in embodied intelligence. 21st Century Business Herald reporters learned from industry insiders in the humanoid robot industry that currently, generalization ability is a major pain point for humanoid robots.
Genie 2 is expected to help embodied intelligence solve training bottlenecks.
In terms of interactive functions, Genie 2 can model interactive relationships, such as blasting balloons, opening doors, and shooting explosive barrels.
This makes it much simpler to create diverse interactive scenes. By utilizing Genie 2 to rapidly build various interactive experience prototypes, researchers can quickly train and test embodied intelligent AI in new environments.
For example, using different images generated by Imagen 3 to prompt Genie 2 to model the differences between paper airplane, dragon, eagle, or parachute flight, and test Genie's ability to control different objects.
That is to say, AI agents can obtain almost infinite training scenarios and interaction systems in the world model.
Although this research is still in its early stages, Google researchers believe that Genie 2 is an effective path to addressing the structural issues of safety training embodied intelligence, unlocking the next wave of capabilities in embodied intelligence, and achieving the breadth and generality required to move towards AGI.
Li Feifei realizes the concept of spatial intelligence
World Labs is the first entrepreneurial project of renowned AI scholar and Chinese scientist Feifei Li, established in January 2024. By the time the company was founded six months ago, its valuation had exceeded $1 billion.
This is a space intelligence company dedicated to building large-scale world models that can perceive, generate, and interact with the 3D world. The plan is to generate virtual 3D spaces where users can manipulate variables and allow people to "create their own 3D worlds". World Labs points out that its software will be helpful to various practitioners, including artists, designers, developers, and engineers.
On December 3rd, World Labs submitted the 1.0 version assignment.
A 3D world can be generated from a single image, and users can essentially "step into" any image and explore in 3D.
The tool is also equipped with controllable sliders to adjust the simulated depth of field and simulated push-pull zoom. It supports adjusting the camera's position and field of view, changing object colors, creating spotlight effects, automatic dynamic effects, and other interactive methods, enriching the visual experience and providing a stronger sense of control.
Like Genie 2, World Labs' spatial intelligence models can also ensure consistency in the 3D world, making scenes more durable and existing once generated; Users can control and move the scene in real-time, and carefully observe the details in the scene.
The world model follows the basic physical rules of 3D geometry, combining realism and depth, effectively improving the controllability and consistency of content, and changing the way movies, games, simulators, and other digital representations of the physical world are made.
Jim Fan, Senior Research Scientist at NVIDIA, commented that "GenAI is creating increasingly high-dimensional snapshots of human experiences. Stable Diffusion is a 2D snapshot; Sora is a snapshot of 2D+time dimension; And World Labs is a 3D, fully immersive snapshot
At present, Worldlabs has opened up waiting list applications to the public, and some creators can already integrate this AI tool into their existing workflows.
In the field of film and television production, AI's 3D narrative capability will greatly improve the efficiency and quality of content creation, and reduce production costs. Creators can generate virtual scenes and characters more quickly, and use AI generated 3D worlds to build richer and more diverse story backgrounds, bringing audiences a brand new visual experience.
For example, using Worldlabs technology to generate virtual shooting scenes before filming helps directors and photographers better plan shots and scene arrangements, improving shooting efficiency and accuracy.
For the gaming industry, 3D generation will bring more possibilities for game development. Developers can use AI to generate more realistic and delicate game scenes and characters, enhancing the immersion of the game.
In the field of education, 3D content generated by large models can create more vivid and intuitive teaching scenarios, enhancing the experience of subjects such as science and history.
Li Feifei believes that "spatial intelligence" is a key part of the AI puzzle. She said in a TED talk in April this year, "Vision becomes insight; insight becomes understanding; understanding drives action. All of this generates intelligence
The space intelligence field represented by Genie 2 and Worldlabs is an important new direction for the development of AI technology. It breaks through the limitations of traditional AI on a two-dimensional plane, expanding AI's perception and understanding capabilities to three-dimensional space, making it more intuitive and closer to the essence of interaction.
CandyLake.com is an information publishing platform and only provides information storage space services.
Disclaimer: The views expressed in this article are those of the author only, this article does not represent the position of CandyLake.com, and does not constitute advice, please treat with caution.
Disclaimer: The views expressed in this article are those of the author only, this article does not represent the position of CandyLake.com, and does not constitute advice, please treat with caution.
-
"대적전 창시자 장충모: 인텔이 AI 물결을 따라잡지 못한 삼성의 문제는 경영전략에 있지 않다"12월 9일, 대적전 창시자 장충모의 자서전 전집의 신간 발표회가 중국 대만에서 개최되였다.행사장에서 경쟁사인 인텔 ...
- 西西里柠檬2017
- 그저께 14:46
- Up
- Down
- Reply
- Favorite
-
12월 11일 CNN에 따르면 엘론 머스크의 순자산은 4000억 달러에 달해 사상 처음으로 이 관문을 돌파했다. 머스크의 재산은 그의 우주 탐사 기술 회사와 관련이 있는 200억 달러 가까이 다시 늘어난 것으로 알려졌다 ...
- 真不是我干的的
- 8 시간전
- Up
- Down
- Reply
- Favorite
-
미국 동부 시간으로 월요일, 미국 주식 3대 지수는 집단적으로 하락하여 마감 마감되었는데, 나지는 0.62%, S & P500 지수는 0.61%, 지수는 0.54% 하락했다. 나스닥 중국 진룽지수는 8.54% 상승해 인기 있는 중국계 ...
- 强绝商爸摇
- 그저께 13:58
- Up
- Down
- Reply
- Favorite
-
샤오펑자동차 웨이보 12월 11일 소식에 따르면 샤오펑 P7 + 는 출시 4주 만에 10000대의 샤오펑 P7 + 를 정식 인도했다.
- 崔炫俊献
- 어제 12:18
- Up
- Down
- Reply
- Favorite