Open source community watershed: Meta model Llama 3 with the highest release parameters or up to 400 billion
六月清晨搅
发表于 2024-4-19 16:12:24
210
0
0
In order to maintain the company's position in the field of AI (artificial intelligence) open source big models, social media giant Meta has launched its latest open source model.
On April 18th local time, Meta announced on its official website the release of its latest large model, Llama 3. At present, Llama 3 has opened two small parameter versions, 8 billion (8B) and 70 billion (70B), with a context window of 8k. Meta stated that by using higher quality training data and fine-tuning instructions, Llama 3 has achieved a "significant improvement" compared to the previous generation Llama 2.
In the future, Meta will launch a larger parameter version of Llama 3, which will have over 400 billion parameters. Meta will also introduce new features such as multimodality for Llama 3 in the future, including longer context windows and Llama 3 research papers.
Meta wrote in the announcement, "Through Llama 3, we are committed to building open-source models that can compete with today's best proprietary models. We want to handle developer feedback, improve the overall practicality of Llama 3, and continue to play a leading role in responsible use and deployment of LLM (Large Language Models)."
On the 18th, Meta's stock price (Nasdaq: META) closed at $501.80 per share, up 1.54%, with a total market value of $1.28 trillion.
"The best open source big model currently on the market"
According to Meta, Llama 3 has demonstrated state-of-the-art performance on various industry benchmarks, providing new features including improved inference capabilities, and is currently the best open-source large model on the market.
At the architecture level, Llama3 has chosen the standard decoder only Transformer architecture, using a tokenizer that includes a 128K token vocabulary. Llama 3 was pre trained on two 24K GPU clusters created by Meta, using over 15T of publicly available data, including 5% non English data covering over 30 languages. The training data volume was seven times that of the previous generation Llama 2, and the code included was four times that of Llama 2.
According to Meta's test results, the Llama 38B model outperforms Gemma 7B and Mistral 7B Instrument on multiple performance benchmarks such as MMLU, GPQA, and HumanEval, while the 70B model surpasses the well-known closed source model Claude 3's intermediate version Sonnet, with three wins and two losses compared to Google's Gemini Pro 1.5.
Llama 3 performs exceptionally well on multiple performance benchmarks. Source: Meta official website
In addition to conventional datasets, Meta is also committed to optimizing the performance of Llama 3 in practical scenarios, and has specifically developed a high-quality manual testing set for this purpose. This test set contains 1800 pieces of data, covering 12 key use cases such as seeking advice, closed ended question answering, brainstorming, coding, and writing, and is kept confidential by the development team.
In this test set, the results show that Llama 3 outperforms Llama 2 significantly and also surpasses well-known models such as Claude 3 Sonnet, Mistral Medium, and GPT-3.5.
Llama 3 achieved excellent results on the manual test set. Source: Meta official website
Although the 400B+model of Llama 3 is still being trained, Meta has also demonstrated some of its testing results, seemingly aimed at benchmarking against the strongest version of Claude 3, Opus. However, Meta has not released the comparison results between the Llama 3 larger parameter model and GPT-4 equivalent specification players.
The 400B+model of Llama 3 is still being trained. Source: Meta official website
The Llama 3 model will soon be available to developers on Amazon AWS, Databricks, Google Cloud, Hugging Face, Kaggle, IBM Watson X, Amazon Azure, Nvidia NIM, and Snowflake, and will receive hardware platform support from AMD, AWS, Dell, Intel, Nvidia, and Qualcomm. In order for Llama 3 to be developed responsibly, Meta will also provide new trust and security tools, including Llama Guard 2, Code Shield, and CyberSec Eval 2.
Meanwhile, Meta has released an official web version of Meta AI based on Llama3. At present, the platform is still in its early stages, with only two major functions: dialogue and painting. Users do not need to register to use the dialogue function, while using the painting function requires users to register and log in to an account.
Injecting vitality into the open source community
Meta's AI path has always been closely linked to open source, and once Llama 3 was launched, it was warmly welcomed by the open source community.
Although there are some roast about the size of Llama 3's 8k context window, Meta said that it would soon expand the Llama 3's context window. Matt Shumer, CEO and co-founder of email startup Otherside AI, is also optimistic about this and said, "We are entering a new world where GPT-4 level models are open source and accessible for free."
According to Jim Fan, a senior research scientist at Nvidia, the upcoming larger parameter Llama 3 model marks a "watershed" for the open source community, which can change the decision-making methods of many academic research and startups, and "is expected to see a surge in vitality throughout the entire ecosystem.".
However, it is worth noting that Meta has not released the training data for Llama 3, only stating that it is entirely from publicly available data. Strictly speaking, so-called "open source" software should be fully open to the public during the development and distribution process, including the source code of software products, training data, and other content. Previously, the "strongest open source model" DBRX released by data company Databricks not only had standard configurations far beyond ordinary computers, but also had this issue.
The launch of Llama 3 closely follows the progress made by Meta's self-developed chips. Just last week, Meta announced the latest version of its self-developed chip MTIA. MTIA is a customized chip series designed by Meta specifically for AI training and inference work. Compared to the Meta's first generation AI inference accelerator MTIA v1, which was officially announced in May last year, the latest version of the chip has significantly improved performance, specifically designed for the ranking and recommendation system of Meta's social software. Analysis indicates that Meta's goal is to reduce dependence on chip manufacturers such as Nvidia.
CandyLake.com is an information publishing platform and only provides information storage space services.
Disclaimer: The views expressed in this article are those of the author only, this article does not represent the position of CandyLake.com, and does not constitute advice, please treat with caution.
Disclaimer: The views expressed in this article are those of the author only, this article does not represent the position of CandyLake.com, and does not constitute advice, please treat with caution.
You may like
- Boeing announces 10% layoffs, first delivery of 777X model postponed to 2026
- Faraday Future plans to launch the first model of its second brand by the end of next year
- Will a third brand launch hybrid models overseas? NIO responds: Continuing the pure electric technology route
- He Xiaopeng: Xiaopeng's car end large model aims to achieve a 100 kilometer takeover once next year
- Faraday Future: Second brand FX plans to launch two models with a price not exceeding $50000
- Robin Lee: The average daily adjustment amount of Wenxin Model exceeded 1.5 billion, 30 times more than that of a year ago
- Will DeepMind's open-source biomolecule prediction model win the Nobel Prize and ignite a wave of AI pharmaceuticals?
- "AI new generation" big model manufacturer Qi "roll" agent, Robin Lee said that it will usher in an era of "making money by thinking"
- Robin Lee said that the illusion of the big model has basically eliminated the actual measurement of ERNIE Bot?
- Alibaba Tongyi Qianwen Code Model Qwen2.5-Coder Full Series Officially Open Source
-
11월 14일, 세계예선 아시아지역 제3단계 C조 제5라운드, 중국남자축구는 바레인남자축구와 원정경기를 가졌다.축구 국가대표팀은 바레인을 1-0으로 꺾고 예선 2연승을 거두었다. 특히 이번 경기 국내 유일한 중계 ...
- 我是来围观的逊
- 어제 15:05
- Up
- Down
- Reply
- Favorite
-
계면신문기자 장우발 4분기의 영업수입이 하락한후 텐센트음악은 다시 성장으로 돌아왔다. 11월 12일, 텐센트음악은 최신 재보를 발표했다.2024년 9월 30일까지 이 회사의 3분기 총수입은 70억 2천만 위안으로 전년 ...
- 勇敢的树袋熊1
- 3 일전
- Up
- Down
- Reply
- Favorite
-
본사소식 (기자 원전새): 11월 14일, 다다그룹 (나스닥코드: DADA) 은 2024년 3분기 실적보고를 발표했다. 수치가 보여준데 따르면 고품질발전전략에 지속적으로 전념하고 사용자체험을 끊임없이 최적화하며 공급을 ...
- 家养宠物繁殖
- 그저께 15:21
- Up
- Down
- Reply
- Favorite
-
11월 12일 소식에 따르면 소식통에 따르면 아마존은 무료스트리밍서비스 Freevee를 페쇄하고 일부 종업원과 프로를 구독서비스 Prime Video로 이전할 계획이다. 올해 초 아마존이 내놓은 몇 편의 대형 드라마의 효 ...
- 度素告
- 3 일전
- Up
- Down
- Reply
- Favorite