Alibaba Cloud Releases Tongyi Qianwen 2.0 with Performance Exceeding GPT-3.5 to Accelerate and Catch up with GPT-4

On October 31st, Alibaba Cloud officially released the 100 billion level parameter large model Tongyi Qianwen 2.0. In 10 authoritative evaluations, the comprehensive performance of Tongyi Qianwen 2.0 exceeds GPT-3.5 and is accelerating to catch up with GPT-4. On that day, the Tongyi Qianwen APP was officially launched in major mobile application markets, and everyone can directly experience the latest model capabilities through the APP.

Tongyi Qianwen 72B is about to be open source

In the past six months, Tongyi Qianwen 2.0 has made a huge leap in performance. Compared to the 1.0 version released in April, Tongyi Qianwen 2.0 has significantly improved its abilities in complex instruction comprehension, literary creation, general mathematics, knowledge memory, and illusion resistance. At present, the comprehensive performance of Tongyi Qianwen has exceeded GPT-3.5, accelerating its pursuit of GPT-4.

The comprehensive performance of Tongyi Qianwen 2.0 exceeds GPT-3.5 and is accelerating to catch up with GPT-4

On 10 mainstream benchmark evaluation sets, including MMLU, C-Eval, GSM8K, HumanEval, and MATH, the overall score of Tongyi Qianwen 2.0 surpasses Meta's Llama-2-70B, with nine wins and one loss compared to OpenAI's Chat-3.5 and four wins and six losses compared to GPT-4, further narrowing the gap with GPT-4.
The ability to understand both Chinese and English is a fundamental skill in large language models. In terms of English tasks, Tongyi Qianwen 2.0 has a score of 82.5 on the MMLU benchmark, second only to GPT-4. By significantly increasing the number of parameters, Tongyi Qianwen 2.0 can better understand and handle complex language structures and concepts; In terms of Chinese tasks, Tongyi Qianwen 2.0 achieved the highest score on the C-Eval benchmark with a clear advantage. This is because the model learned more Chinese language materials during training, further strengthening its Chinese understanding and expression abilities.
In fields such as mathematical reasoning and code understanding, Tongyi Qianwen 2.0 has made significant progress. In the inference benchmark test GSM8K, Tongyi Qianwen ranked second, demonstrating strong computational and logical reasoning abilities; In the HumanEval test, the Tongyi Qianwen score closely follows GPT-4 and GPT-3.5. This test mainly measures the ability of large models to understand and execute code fragments, which is the foundation of their application in programming assistance, automatic code repair, and other scenarios.

Tongyi Qianwen 2.0 Release

Tongyi Qianwen has become more mature and useful. Tongyi Qianwen 2.0 has been technologically optimized in terms of command following, tool usage, and refined creation, which can be better integrated into downstream application scenarios. The official website of Tongyi Big Model has launched multimodal and plugin functions, supporting segmented tasks such as image input and document parsing.
At the same time, eight industry model groups based on the Tongyi big model training have been launched, including Tongyi Lingcode Intelligent Coding Assistant, Tongyi Zhiwen AI Reading Assistant, Tongyi Listening and Learning AI Assistant, Tongyi Xingchen Personalized Character Creation Platform, Tongyi Dianjin Intelligent Investment Research Assistant, Tongyi Xiaomi Intelligent Customer Service, Tongyi Renxin Personal Exclusive Health Assistant, and Tongyi Farui AI Legal Advisor. The 8 major industry models are targeted at the most popular vertical scenarios today, using domain data for specialized training. Users can directly experience the model functionality on the official website, and developers can integrate model capabilities into their large model applications and services through web page embedding, API/SDK calls, and other methods.

Tongyi Big Model Family has been fully upgraded, and 8 industry model clusters have been launched

As of October, Alibaba Cloud has conducted in-depth cooperation with more than 60 industry leaders to promote the implementation of Tongyi Qianwen in fields such as office, culture and tourism, power, government affairs, medical insurance, transportation, manufacturing, finance, software development, etc.
Zhou Jingren revealed that Alibaba Cloud plans to open source the 72B version of Tongyi Qianwen in the near future. Previously, Alibaba Cloud had already opened up versions 7B and 14B of the model, with a cumulative download volume of over 1 million. Alibaba Cloud will continue to support developers of Qianxing Baiye to innovate models and applications based on the Tongyi Qianwen open-source model.

浏览过的版块