Musk's Action Counterattacks Open Source's Top Model Pressure on OpenAI

It seems that Musk made a completely different choice from Altman to demonstrate his unwavering commitment to open source AI models. On March 17th, Musk announced the open-source Grok-1, making it the largest open-source large language model with the largest number of parameters currently available, with 314 billion parameters, far exceeding the 175 billion of OpenAI GPT-3.5.
Interestingly, Grok-1 announced that its open source cover image will be generated by Midjournal, making it an "AI help AI".
Musk, who has been roast that OpenAI is not open, naturally wants to insinuate something on the social platform, "We want to know more about the open part of OpenAI."
Grok-1 follows the Apache 2.0 protocol to open model weights and architecture. This means that it allows users to freely use, modify, and distribute the software, whether for personal or commercial use. This openness encourages broader research and application development. Since its release, the project has won 6.5k stars on GitHub and its popularity is still increasing.
The project description clearly emphasizes that since Grok-1 is a large-scale (314B parameter) model, a machine with sufficient GPU memory is needed to test the model using the example code. Netizens suggest that this may require a machine with 628 GB of GPU memory.
In addition, the implementation efficiency of the MoE layer in this repository is not high, and the reason for choosing this implementation is to avoid the need for a custom kernel to verify the correctness of the model.
Currently, popular open source models include Meta's Llama2 and France's Mistral. Generally speaking, releasing open-source models helps the community conduct large-scale testing and feedback, which means that the iteration speed of the model itself can also be accelerated.
Grok-1 is a Mixture of Experts (MOE) big model developed by xAI, an AI startup under Musk, over the past four months. Review the development process of this model:
After announcing the establishment of xAI, researchers first trained a prototype language model (Grok-0) with 33 billion parameters. This model approached the capabilities of LLaMA2 (70B) on the standard language model testing benchmark, but used fewer training resources;
Subsequently, researchers made significant improvements to the reasoning and encoding capabilities of the model, ultimately developing Grok-1 and releasing it in November 2023. This is a more powerful SOTA language model that achieved 63.2% performance in HumanEval encoding tasks and 73% in MMLU, surpassing all other models in its computational class, including ChatGPT-3.5 and Inflection-1.
What are the advantages of Grok-1 compared to other large models?
XAI emphasizes that Grok-1 is their own large model trained from scratch, that is, starting from October 2023, using custom training stacks to train on JAX and Rust without fine-tuning for specific tasks (such as conversations);
A unique and fundamental advantage of Grok-1 is that it can understand the world in real-time through the X platform, which allows it to answer spicy questions rejected by most other AI systems. The training data used in the released version of Grok-1 comes from the Internet data as of the third quarter of 2023 and the data provided by the AI trainers of xAI;
The Mixture of Experts model with 314 billion parameters has an active weight ratio of 25% for each token, providing it with powerful language comprehension and generation capabilities due to its large number of parameters.
XAI previously introduced that Grok-1 will serve as the engine behind Grok for natural language processing tasks, including question answering, information retrieval, creative writing, and encoding assistance. In the future, the understanding and retrieval of long context, as well as multimodal ability, will be one of the directions that this model will explore.

浏览过的版块