첫 페이지 News 본문

After local time on Wednesday, Nvidia is about to release its final heavyweight Q2 report for the entire secondary market, causing global investors to be highly nervous. On the previous day (August 27 local time), the US artificial intelligence processor chip Unicorn Cerebras Systems released what it called the world's fastest AI reasoning service based on its own chip computing system, claiming to be 10 to 20 times faster than the system built with Nvidia H100 GPU.
Currently, Nvidia GPUs dominate the market in both AI training and inference. Since launching its first AI chip in 2019, Cerebras has been focusing on selling AI chips and computing systems, dedicated to challenging Nvidia in the field of AI training.
According to a report by the American technology media The Information, OpenAI's revenue is expected to reach $3.4 billion this year thanks to its AI inference services. Since the cake of AI reasoning is so big, Andrew Feldman, co-founder and CEO of Cerebras, said that Cerebras also needs to occupy a place in the AI market.
Cerebras' launch of AI inference services not only opens up the AI chip and computing system, but also launches a comprehensive attack on Nvidia based on the second revenue curve of usage. Stealing enough market share from Nvidia to make them angry, "said Feldman.
Fast and cheap
Cerebras' AI inference services have shown significant advantages in both speed and cost. According to Feldman, measured by the number of tokens that can be output per second, Cerebras' AI inference speed is 20 times faster than AI inference services run by cloud service providers such as Microsoft Azure and Amazon AWS.
Feldman simultaneously launched the AI inference services of Cerebras and Amazon AWS at the press conference. Cerebras can instantly complete inference work and output, with a processing speed of 1832 tokens per second, while AWS takes a few seconds to complete the output, with a processing speed of only 93 tokens per second.
Feldman said that faster inference speed means that real-time interactive voice responses can be achieved, or by calling multiple rounds of results, more external sources, and longer documents, more accurate and relevant answers can be obtained, bringing a qualitative leap to AI inference.
In addition to its speed advantage, Cerebras also has a huge cost advantage. Feldman stated that Cerebras' AI inference service is 100 times more cost-effective than AWS and others. Taking the Llama 3.1 70B open-source large-scale language model running Meta as an example, the price of this service is only 60 cents per token, while the price of the same service provided by general cloud service providers is $2.90 per token.
56 times the current maximum GPU area
The reason why Cerebras' AI inference service is fast and cheap is due to the design of its WSE-3 chip. This is the third generation processor chip launched by Cerebras in March this year. Its size is enormous, almost equivalent to the entire surface of a 12 inch semiconductor chip, or larger than a book, with a single unit area of about 462.25 square centimeters. It is 56 times the current largest GPU area.
The WSE-3 chip does not use independent high bandwidth memory (HBM) that requires interface connection to access, as Nvidia does. On the contrary, it directly embeds memory into the chip.
Thanks to its chip size, the WSE-3 has an on-chip memory of up to 44GB, almost 900 times that of the Nvidia H100, and a memory bandwidth 7000 times that of the Nvidia H100.
Feldman stated that memory bandwidth is the fundamental factor limiting the inference performance of language models. And Cerebras integrates logic and memory into a giant chip, with huge on-chip memory and extremely high memory bandwidth, which can quickly process data and generate inference results. This is a speed that GPUs cannot achieve
In addition to its speed and cost advantages, the WSE-3 chip is also a double-edged sword for AI training and inference, with outstanding performance in handling various AI tasks.
According to the plan, Cerebras will establish AI inference data centers in multiple locations and charge for inference capabilities based on the number of requests. Meanwhile, Cerebras will also attempt to sell the CS-3 computing system based on WSE-3 to cloud service providers.
您需要登录后才可以回帖 登录 | Sign Up

本版积分规则

  • 11월 14일, 세계예선 아시아지역 제3단계 C조 제5라운드, 중국남자축구는 바레인남자축구와 원정경기를 가졌다.축구 국가대표팀은 바레인을 1-0으로 꺾고 예선 2연승을 거두었다. 특히 이번 경기 국내 유일한 중계 ...
    我是来围观的逊
    29 분전
    Up
    Down
    Reply
    Favorite
  • "영비릉: 2024회계연도 영업수입 동기대비 8% 감소"영비릉은 2024회계연도 재무제보를 발표했다.2024 회계연도 매출은 149억5500만 유로로 전년 동기 대비 8% 감소했습니다.이익은 31억 500만 유로입니다.이익률은 ...
    勇敢的树袋熊1
    3 일전
    Up
    Down
    Reply
    Favorite
  • 계면신문기자 장우발 4분기의 영업수입이 하락한후 텐센트음악은 다시 성장으로 돌아왔다. 11월 12일, 텐센트음악은 최신 재보를 발표했다.2024년 9월 30일까지 이 회사의 3분기 총수입은 70억 2천만 위안으로 전년 ...
    勇敢的树袋熊1
    그저께 15:27
    Up
    Down
    Reply
    Favorite
  • 본사소식 (기자 원전새): 11월 14일, 다다그룹 (나스닥코드: DADA) 은 2024년 3분기 실적보고를 발표했다. 수치가 보여준데 따르면 고품질발전전략에 지속적으로 전념하고 사용자체험을 끊임없이 최적화하며 공급을 ...
    家养宠物繁殖
    어제 15:21
    Up
    Down
    Reply
    Favorite
海角七号 注册会员
  • Follow

    0

  • Following

    1

  • Articles

    29