Nvidia Cup B200 Chip: Moore's Law Fails, Multi Card Interconnection Wins the King

On the early morning of March 19th Beijing time, at the NVIDIA GTC (GPU Technology Conference), NVIDIA CEO Huang Renxun announced the successor of Hopper architecture chips - Blackwell architecture B200 chips. At present, there is a high demand for Nvidia Hopper architecture chips H100 and GH200 Grace Hopper superchips, providing computing power for many of the world's most powerful supercomputing centers, while B200 will provide further intergenerational leap in computing power.
The B200 chip of Blackwell architecture is not a traditional single GPU. On the contrary, it consists of two tightly coupled chips, although according to Nvidia, they do act as a unified CUDA GPU. These two chips are connected through a 10 TB/s NV-HBI (Nvidia High Bandwidth Interface) connection to ensure that they can function as a single, completely identical chip.
Multi card interconnection is the key to improving B200 computing power. The GB200, which combines two GPUs with a single Grace CPU, can provide 30 times the performance for inference work in large language models while potentially significantly improving efficiency. Nvidia claims that compared to H100, B200 can reduce the computational cost and energy consumption of generative AI by up to 25 times.
The improvement of NVIDIA AI chip performance in terms of computing power mainly relies on data accuracy. From FP64, FP32, FP16, FP8 to the current B200 chip FP4, the maximum theoretical computational cost of FP4 is 20 petaflops (data accuracy unit). FP4 is twice the performance of FP8, and the advantage of FP4 is that it increases bandwidth by using 4 bits instead of 8 bits for each neuron, doubling computation, bandwidth, and model size. If B200 is converted to FP8 and compared with H100 in the same category, theoretically B200 only provides 2.5 times more computing power than H100, and a large part of the computing power improvement of B200 comes from the interconnection of the two chips.
The Moore's Law of the CPU era (the number of transistors that can be accommodated on an integrated circuit doubles approximately every 18 months) has entered its twilight years. TSMC's breakthrough in the 3nm process has not brought about a generational improvement in chip performance. In September 2023, the Apple A17 Pro was launched, using TSMC's first 3nm process chip, but with only a 10% improvement in CPU performance. Moreover, the development of advanced process chips is costly. According to the Far East Research Institute, TSMC's wafer foundry prices in 2023 have increased by approximately 16% (advanced process) to 34% (mature process) compared to two years ago.
Besides Apple, another major chip customer of TSMC is NVIDIA - NVIDIA's hard currency AI chip H100 adopts TSMC's N4 (5nm) process and utilizes TSMC's advanced CoWoS packaging capacity.
Moore's Law is invalid, and Huang Renxun's Huang's Law states that the efficiency of GPUs will more than double every two years, and innovation is not just about chips, but the entire stack.
Nvidia continues to move towards multi card interconnection. Since the improvement of 3nm chips is limited, Nvidia's B200 chooses to place two 4nm chips side by side and form a super large chip with over 200 billion transistors through high-speed on-chip interconnection. At NVIDIA GTC, Huang Renxun briefly mentioned the performance of the chip itself, with a focus on the DGX system.
In terms of multi card interconnection, Nvidia's NVLink and NVSwitch technologies are its moat. NVLINK is a peer-to-peer high-speed interconnect technology that can directly connect multiple GPUs to form a high-performance computing cluster or deep learning system. In addition, NVLink introduces the concept of unified memory, supporting memory pools between connected GPUs, which is a crucial feature for tasks that require large datasets.
NVSwitch is a high-speed switch technology that can directly connect multiple GPUs and CPUs to form a high-performance computing system.
With the support of NVLink Switch, Nvidia miraculously connected 72 B200s together, ultimately becoming the "new generation computing unit" GB200 NVL72. A "computing unit" cabinet like this has an FP8 precision training computing power of up to 720 PFlops, approaching a DGX SuperPod supercomputer cluster (1000 PFlops) in the H100 era.
Nvidia has revealed that this brand new chip will be launched later in 2024. Currently, Amazon, Dell, Google, Meta, Microsoft, OpenAI, and Tesla have all planned to use Blackwell GPUs.
The method of packaging and wholesale card sales also meets the card usage needs of large model companies. Packaging multiple GPUs together into a data center is more in line with the purchasing methods of large model companies and cloud service providers. According to Nvidia's 2023 financial report, 40% of Nvidia's data center business revenue comes from large-scale data centers and cloud service providers.
As of the closing of the US stock market on March 18th Eastern Time, Nvidia's stock price was $884.550, with a total market value of $2.21 trillion.

浏览过的版块