첫 페이지 News 본문

After OpenAI suddenly launched a "small model" GPT-4o mini, Meta decided to throw out its large model explosion with super large parameters.
On July 24th, Meta released the open-source large model series Llama 3.1 405B, as well as upgraded models in two sizes: 70B and 8B.
Llama 3.1 405B is considered the strongest open-source model currently available. According to the information released by Meta, the model supports a context length of 128K and has added support for eight languages. It is comparable to flagship models such as GPT-4o and Claude 3.5 Sonnet in terms of general knowledge, operability, mathematics, tool usage, and multilingual translation. Even in human evaluation comparisons, its overall performance is better than these two models.
Meanwhile, the upgraded versions of the 8B and 70B models are also multilingual and have been expanded to 128K context length.
Llama 3.1 405B is the largest model of Meta to date. Meta stated that the training of this model involves over 15 trillion tokens, and in order to achieve the desired results within a reasonable time, the team optimized the entire training stack, using over 16000 H100 GPUs - the first Llama model to be trained on such a large scale of computing power.
This difficult training objective was broken down by the team into multiple key steps. In order to ensure maximum training stability, Meta did not choose the MoE architecture (hybrid expert architecture), but instead adopted the standard Transformer model architecture with only decoders for minor adjustments.
According to Meta, the team also used an iterative post training process, supervised fine-tuning and direct preference optimization for each round, creating the highest quality synthetic data for each round to improve the performance of each ability. Compared to the previous version of Llama, the team has improved and enhanced the quantity and quality of data used before and after training.
At the same time as the explosion of Llama 3.1 405B, Mark Zuckerberg issued a statement titled "Open source AI is the way forward", emphasizing once again the significance and value of open source big models, and directly targeting big model companies such as OpenAI that have taken the path of closed source.
Zuckerberg reiterated the story of open-source Linux and closed source Unix, stating that the former supports more features and a wider ecosystem, and is the industry standard foundation for cloud computing and running most mobile device operating systems. I believe that artificial intelligence will also develop in a similar way
He pointed out that several technology companies are developing leading closed source models, but open source models are rapidly narrowing this gap. The most direct evidence is that Llama 2 was previously only comparable to outdated older generation models, but Llama 3 is now comparable to the latest models and has achieved leadership in certain fields.
He expects that starting next year, Llama 3 will become the most advanced model in the industry - and before that, Llama has already taken a leading position in openness, modifiability, and cost efficiency.
Zuckerberg cited many reasons to explain why the world needs open source models, stating that for developers, in addition to a more transparent development environment to better train, fine tune, and refine their own models, another important factor is the need for an efficient and affordable model.
He explained that for user oriented and offline inference tasks, developers can run Llama 3.1 405B on their own infrastructure at a cost of approximately 50% of closed source models such as GPT-4o.
The debate over the two major paths of open source and closed source has been discussed extensively in the industry before, but the main tone at that time was that each has its own value. Open source can benefit developers in a cost-effective way and is conducive to the technological iteration and development of large language models themselves, while closed source can concentrate resources to break through performance bottlenecks faster and deeper, and is more likely to be the first to achieve AGI (General Artificial Intelligence) than open source.
In other words, the industry generally believes that open source is difficult to catch up with closed source in terms of model performance. The emergence of Llama 3.1 405B may prompt the industry to reconsider this conclusion, which is likely to affect a large group of enterprises and developers who are already inclined to use closed source model services.
At present, Meta's ecosystem is already very large. After the launch of the Llama 3.1 model, over 25 partners will provide related services, including Amazon AWS, Nvidia Databricks、Groq、 Dell, Microsoft Azure, and Google Cloud, among others.
However, Zuckerberg's expectation for the Llama series models to be in a leading position is next year, and there is a possibility that they may be overturned by closed source models in the middle. During this period, the outside world may pay attention to closed source large models that cannot match the performance level of Llama 3.1 405B, and their current situation is indeed somewhat awkward.
He also specifically talked about the competition between China and the United States in the field of big models, believing that it is unrealistic for the United States to always lead China for several years in this area. But even a small lead of a few months can accumulate over time, giving the United States a clear advantage.
The advantage of the United States is decentralization and open innovation. Some people believe that we must close our models to prevent China from acquiring these models, but I think this will not work and will only put the United States and its allies at a disadvantage. "In Zuckerberg's view, a world with only closed models will lead to a few large companies and geopolitical rivals being able to gain leading models, while startups, universities, and small businesses will miss opportunities. In addition, restricting American innovation to closed development increases the possibility of being completely unable to lead.
On the contrary, I believe our best strategy is to establish a strong open ecosystem, allowing our leading companies to work closely with governments and allies to ensure they can make the best use of the latest developments and achieve sustainable first mover advantages in the long term, "said Zackberg.
您需要登录后才可以回帖 登录 | Sign Up

本版积分规则

  • 11월 14일, 세계예선 아시아지역 제3단계 C조 제5라운드, 중국남자축구는 바레인남자축구와 원정경기를 가졌다.축구 국가대표팀은 바레인을 1-0으로 꺾고 예선 2연승을 거두었다. 특히 이번 경기 국내 유일한 중계 ...
    我是来围观的逊
    3 시간전
    Up
    Down
    Reply
    Favorite
  • "영비릉: 2024회계연도 영업수입 동기대비 8% 감소"영비릉은 2024회계연도 재무제보를 발표했다.2024 회계연도 매출은 149억5500만 유로로 전년 동기 대비 8% 감소했습니다.이익은 31억 500만 유로입니다.이익률은 ...
    勇敢的树袋熊1
    3 일전
    Up
    Down
    Reply
    Favorite
  • 계면신문기자 장우발 4분기의 영업수입이 하락한후 텐센트음악은 다시 성장으로 돌아왔다. 11월 12일, 텐센트음악은 최신 재보를 발표했다.2024년 9월 30일까지 이 회사의 3분기 총수입은 70억 2천만 위안으로 전년 ...
    勇敢的树袋熊1
    그저께 15:27
    Up
    Down
    Reply
    Favorite
  • 본사소식 (기자 원전새): 11월 14일, 다다그룹 (나스닥코드: DADA) 은 2024년 3분기 실적보고를 발표했다. 수치가 보여준데 따르면 고품질발전전략에 지속적으로 전념하고 사용자체험을 끊임없이 최적화하며 공급을 ...
    家养宠物繁殖
    어제 15:21
    Up
    Down
    Reply
    Favorite
楚一帆 注册会员
  • Follow

    0

  • Following

    0

  • Articles

    38