첫 페이지 News 본문

Technology giant Google has launched a long established new model that can run on mobile phones and significantly reduce computing costs.
On December 6th local time, Google announced the launch of the "largest, strongest, and most versatile" new large-scale language model Gemini. Gemini will be the first large-scale model to run directly on a mobile phone, applied to Google Pixel 8 Pro smartphones and chatbot Bard. Google plans to license Gemini to customers through Google Cloud and will integrate it with other products in Google services in the coming months.
Google has invented many computer science concepts that make generative AI applications possible, but was once in a passive position due to OpenAI's chatbot ChatGPT released last year. Faced with the threat posed by the collaboration between OpenAI and Microsoft, one of Google's biggest competitors, Google launched its own chatbot Bard in September this year. Not long after, OpenAI released a more powerful AI software GPT-4, which became a major benchmark in the field of AI. Now, in response to GPT-4, Google has launched Gemini.
"Google has found its rightful place in the AI competition"
Demis Hassabis, CEO of Google DeepMind and representative of the Gemini team, stated at a press conference that Google has run 32 comprehensive multimodal benchmarks to compare the GPT-4 of Gemini and OpenAI, and Gemini is "significantly ahead of 30 out of 32 benchmarks.".
According to Google, Gemini performs excellently in various tasks during the later stages of training. For example, MMLU (Massive Multi Task Language Understanding) is one of the most popular methods for testing AI model knowledge and problem-solving abilities, and Gemini achieved a score of 90.0% in MMLU for the first time, being the first model to surpass human experts in MMLU testing.
Gemini's score rate on MMLU surpassed that of human experts for the first time. Source: Official Video
Gemini includes a set of three different scale models: Gemini Ultra is the largest and most powerful category, positioned as a competitor to GPT-4; Gemini Pro is a mid-range model that performs better than GPT-3.5 and can scale multiple tasks; Gemini Nano is used for specific tasks and mobile devices.
Among them, the Gemini Nano will be installed on the latest Pixel 8 Pro smartphone in the Google Pixel series, supporting new features such as "summary" in recording applications, and launching the "smart reply" function in the Google Keyboard Input Method Gboard. According to foreign media reports, Google has stated that the Gemini Nano will run "locally" on the device, and the model is specially optimized for mobile devices, so Android developers can easily build AI applications and features that support offline work or use personal information retained on the device.
Analysis suggests that this progress can help solve a major economic problem in the field of technology. Utilizing the computing power of mobile phones to run generative AI, rather than relying on cloud servers operated by large technology companies, will greatly reduce the cost of operating such systems. For those who wish to limit their personal data to devices, this also provides a layer of security. Previously, Samsung Electronics publicly showcased its first generative AI model, Gauss, in November, but it is limited to internal employees and is expected to be installed on the Galaxy S24 series phones in the first half of next year.
"I believe that the AI transformation we are witnessing will be the most profound in our lives, much larger than the previous transformation in mobile technology or the internet. This new era model represents one of the largest scientific and engineering efforts our company has ever made," wrote Sundar Pichai, CEO of Alphabet, Google's parent company, in a blog post
On the eve of Gemini's release, Pichai stated in an interview that one of the main reasons Gemini attracted attention was that it is fundamentally a multimodal model, and stated that the transition to AI is very profound and is still in its early stages, There are infinite opportunities ahead: "When we developed Gemini, we applied a lot of previous experience. We spent more time developing Gemini Ultra, partly to conduct strict security testing. At the same time, we are also fine-tuning it to fully unleash its potential."
On the X (formerly Twitter) platform, Elon Musk also commented under Pichai's Gemini introduction article, "Impressive." Musk also responded to a post by Hasabis, congratulated him, and agreed with SpaceX founder Tom Mueller's comment on Gemini, This comment reads: "I know it's difficult to define what AGI (General Artificial Intelligence) is, but no matter what it is, it's closer than you imagine."
According to Google, as a collaborative effort among various Google teams, including Google Research, Gemini is able to extract insights from hundreds of thousands of documents by reading, filtering, and understanding information, and can also understand numbers well. For example, importing a data graph and new data to Gemini, Gemini can provide the code behind this data graph and generate a data graph that imports the new data.
Gemini generates the right image from the left image and new data. Source: Official Video
In addition to text, Gemini can also understand various forms of input and output, including text, code, audio, images, and videos. Gemini is able to understand information with subtle differences and answer questions related to complex topics, which makes her particularly skilled at explaining reasoning in complex subjects such as mathematics and physics.
Gemini is able to answer questions step by step based on photos. Source: Official Video
Google also released a six minute video showcasing some interesting interactions between testers and Gemini, including asking Gemini to recognize images and describe them in multiple languages, using a map to design intelligence quizzes, and playing cup games and reasoning games with Gemini.
Throughout the process, Gemini's reaction speed was very fast, and he also generated audio and pictures to assist in answering, using some colloquial and even humorous expressions, which can be said to be eye opening. In the comments section, netizens praised the video as "shocking" and celebrated Google's return to its rightful position in the AI competition.
Gemini provides animal shapes that can be made based on two balls of yarn. Source: Official Video
When asked which direction the duck should go, Gemini said it should go to the left side with companions. Source: Official Video
In terms of coding, Gemini can also understand, interpret, and generate high-quality code written in the world's most popular programming languages, including Python, Java, C++, and Go. It can work across languages and reason complex information, and can also be used as an engine for higher-level coding systems.
Starting from December 13th, developers and enterprise clients will be able to access Gemini Pro through the Gemini API (Application Programming Interface) in Google AI Studio or Google Cloud Vertex AI, and Android developers will be able to build using Gemini Nano.
Gemini will bring the largest update since its release to the Google chatbot Bard. Google announced that starting from the day of the launch event, Bard will use Gemini Pro to achieve advanced reasoning, planning, understanding, and other functions, providing English services in over 170 countries and regions. Google plans to expand to different modalities, support new languages and regions in the coming months. At the beginning of next year, Google will launch Bard Advanced, which will use Gemini Ultra.
However, due to regulatory reasons, Bard equipped with Gemini technology will not be available in EU countries and the UK. "We will definitely work hard to solve this problem and are collaborating with local regulatory agencies to ensure that we have sufficient communication with relevant parties before launching the service in any specific region," said Sissie Hsiao, Google's Vice President and Bard Project Leader
Exaggerated promotional videos?
However, shortly after the release of Gemini, some netizens pointed out some inappropriate aspects in the promotional materials.
According to a 60 page technical report released by Google, in MMLU testing, Gemini's results are written below“ cot@32 ”The small word annotation indicates that it used the thought chain suggestion technique, tried 32 times, and selected the best result from them. As a comparison, GPT-4 provides 5 examples of silent word techniques. Under this standard, Gemini Ultra's test result is actually 83.7%, lower than GPT-4's 86.4%.
Moreover, in the graph displaying the comparison of MMLU test scores, Gemini's 90.0% test results were actually only slightly inferior to the 89.8% score of human experts, but were far apart.
Philipp Schmid, the technical director of HuggingFace, has fixed this graph using the data disclosed in the technical report. The following two data points show the GPT-4 (left) and Gemini (right) scores when using the silent word technique to give 5 examples. Source: X
Subsequently, Jeff Dean, Chief Scientist of Google DeepMind, responded to this question in a discussion on the X platform, writing, "We reported on these two methods. We believe it would be interesting for the community to see our newly developed CoT method and understand its differences from other methods."
And for that exciting interactive demonstration video, some people also discovered issues from the disclaimer in the opening text. Machine learning instructor Santiago Valdarrama believes that the statement may imply that the video presented is carefully selected and not recorded in real-time, but edited. In its statement, Google wrote, "We have been shooting video materials, testing them on various challenges, presenting a series of images to Gemini, and asking it to reason out what it sees."
Disclaimer at the beginning of the demonstration video. Source: Official Video
Subsequently, Google explained the multimodal interaction process in a blog post and indirectly acknowledged that only by using static images and multiple prompts to piece together can the effects in the demonstration video be achieved. For example, in the video, Gemini takes turns showing off her fists, scissor hands, and open palms, and Gemini can immediately conclude that she is playing a guessing game. In the article, Google acknowledges that Gemini would only come to the conclusion of a guessing game if they simultaneously displayed these three gestures to Gemini and indicated that it was a game.
Of course, even with some exaggeration in terms of promotion, the performance of Gemini cannot be underestimated.
Who can win the technology giant competition?
Since the beginning of this year, major technology giants have been making continuous moves in the field of AI, each with unique tricks.
Among them, Microsoft, one of Google's biggest competitors, is particularly prominent. In February of this year, Microsoft implanted the chatbot Bing AI into its search engine Bing. A month later, Microsoft launched the Microsoft 365 Copilot, which introduced the capabilities of the large language model GPT-4 into Office software. In addition, to help Microsoft maintain its leading advantage in introducing AI in office tools, Microsoft 365 Copilot Enterprise Edition was officially launched on November 1st, with a monthly subscription fee of $30. More than a month ago, Microsoft announced that the AI assistant Copilot will be officially integrated into Windows 11.
At the first developer conference in November, OpenAI also launched a new model GPT-4 Turbo that supports up to 12800 tokens, as well as a series of upgrades to the chatbot ChatGPT, including custom GPT. Among them, Turbo supports a contextual dialogue length of 12800 tokens and has visual input capability. It enters the multimodal API together with the text graph model DALL · E 3 and the new voice synthesis model (TTS).
For many years, Facebook's parent company Meta has also been an active participant in the AI field. In July of this year, Meta announced that its large model Llama 2, a competitor to GPT4, was officially open source, and anyone can download, modify, and add it to their products for free. This approach has won praise from some tech startups who are concerned that Google, Microsoft, and OpenAI will try to monopolize the AI market and exclude any competitors. But Meta's measures have also been criticized for making it easier for people to use AI technology for evil, such as designing computer viruses, generating sound or images to commit fraud, and so on.
The e-commerce giant Amazon, which has always been considered lagging behind in the AI competition, is also accelerating. At the 2023 re: Invent Global Conference last week, Amazon Cloud Technology (AWS) launched a generative AI assistant called "Amazon Q", which can "easily chat, generate content, and take action.". Amazon Q will focus on the workplace rather than targeting consumers. In the future, Amazon will charge a monthly subscription fee of $20 to enterprise users, while the monthly subscription fee for versions provided to developers and IT personnel is $25.
您需要登录后才可以回帖 登录 | Sign Up

本版积分规则

  • 11월 14일, 세계예선 아시아지역 제3단계 C조 제5라운드, 중국남자축구는 바레인남자축구와 원정경기를 가졌다.축구 국가대표팀은 바레인을 1-0으로 꺾고 예선 2연승을 거두었다. 특히 이번 경기 국내 유일한 중계 ...
    我是来围观的逊
    어제 15:05
    Up
    Down
    Reply
    Favorite
  • 계면신문기자 장우발 4분기의 영업수입이 하락한후 텐센트음악은 다시 성장으로 돌아왔다. 11월 12일, 텐센트음악은 최신 재보를 발표했다.2024년 9월 30일까지 이 회사의 3분기 총수입은 70억 2천만 위안으로 전년 ...
    勇敢的树袋熊1
    3 일전
    Up
    Down
    Reply
    Favorite
  • 본사소식 (기자 원전새): 11월 14일, 다다그룹 (나스닥코드: DADA) 은 2024년 3분기 실적보고를 발표했다. 수치가 보여준데 따르면 고품질발전전략에 지속적으로 전념하고 사용자체험을 끊임없이 최적화하며 공급을 ...
    家养宠物繁殖
    그저께 15:21
    Up
    Down
    Reply
    Favorite
  • 11월 12일 소식에 따르면 소식통에 따르면 아마존은 무료스트리밍서비스 Freevee를 페쇄하고 일부 종업원과 프로를 구독서비스 Prime Video로 이전할 계획이다. 올해 초 아마존이 내놓은 몇 편의 대형 드라마의 효 ...
    度素告
    3 일전
    Up
    Down
    Reply
    Favorite
我放心你带套猛 注册会员
  • Follow

    0

  • Following

    0

  • Articles

    31