Giant supercomputer without GPU from China: LineShine with 2.4 million cores

China is taking a different tack against US export restrictions on advanced AI GPUs. The new supercomputer called LineShine, commissioned by the China National Supercomputer Center in Shenzhen, attracted attention by reaching 1.54 exaflops BF16 artificial intelligence training performance without using a GPU. The system contains a total of more than 2.45 million Armv9 CPU cores. Summary in 10 SecondsLineShine was developed with a completely CPU-based architecture instead of GPU.

The system uses 40,960 LX2 processors and 2,451,840 CPU cores. The CPU-only structure reduces GPU dependency, but has significant disadvantages in terms of power efficiency and intense AI performance. GPU-free exascale move from China Today’s most powerful supercomputers and artificial intelligence clusters generally work with CPU GPU architecture. While CPUs handle the overall workloads and coordination, GPUs come into play on the massively parallel computing side.

However, China’s new LineShine system falls outside this general trend. LineShine takes a completely CPU-based approach for artificial intelligence and high-performance computing workloads. This choice is as technically interesting as it is politically and strategically important. Because US restrictions on advanced GPU exports make it difficult for China to supply large-scale AI accelerators from manufacturers such as Nvidia.

LineShine works with 2.4 million CPU cores. The LineShine supercomputer consists of 20,480 compute nodes. Each node contains two Armv9-based LX2 processors. Each LX2 processor has 304 CPU cores. When this structure is collected, there are 40,960 LX2 processors and 2,451,840 CPU cores throughout the system. LingQi high-speed network is used on the connection side of the system. According to the information provided, this network offers 1.6 Tb/s bandwidth per node.

This shows that not only the processor power but also the data transport infrastructure is at the center of LineShine. LX2 processors were specially designed for artificial intelligence. The LX2 processor, which is at the heart of LineShine, has a different structure from classical server CPUs. Each processor uses two compute chiplets, dividing a total of 304 cores into eight clusters. There are 38 cores in each cluster.

The cores include Arm’s SVE and SME units. These units are of critical importance for artificial intelligence training and scientific calculation, especially in vector and matrix operations. The processor also has an extraordinary structure on the memory side. Each LX2 has 32 GB on-package HBM and up to 256 GB external DDR5 memory. Up to 4 TB/s bandwidth is offered on the HBM side. Although this structure reminds us of the Arm-based A64FX approach used in Fujitsu’s Fugaku supercomputer, LX2’s Armv9-based AI / HPC-oriented design takes it to a different point.

The connection to Huawei is not clear. In some analyses, the term “Huawei LX2” is used for LX2 processors. However, the China National Supercomputer Center in Shenzhen has not publicly announced the developer of the processor. Therefore, it is not yet clear whether the processor was developed directly by Huawei, whether it is the result of a joint effort between NSCC and Huawei, or whether it comes from another Chinese HPC design team.

This distinction is important. Because LineShine is not just a technical supercomputer project; It also becomes one of the symbols of China’s quest to overcome GPU embargoes with its domestic processor, network, storage and software ecosystem. Performance figures are remarkable. According to the shared data, a single LX2 processor can offer 60.3 TFLOPS FP64, 240 TFLOPS BF16/FP16 and 960 TOPS INT8 performance. At the full system level, LineShine provides 1.54 ExaFLOP/s BF16 training performance.

It is also stated that a level of 2.16 ExaFLOP/s was observed during training on an Earth observation model with 6.3 billion parameters. It is stated that on the theoretical FP64 side, the system can reach 2.47 exaflops. However, there is an important point here: Theoretical peak performance and real application performance are not the same thing. Actual efficiency depends on the structure of the model used, memory layout, network traffic, software optimization and how well the workload fits the CPU architecture.

What is the advantage of CPU-only systems? CPU-only supercomputers such as LineShine can offer significant advantages in some workloads. Since everything runs in the same processor and memory space, the need to move data between the CPU and GPU is reduced. This can be especially advantageous in scientific applications that combine big data reading, pre-processing, simulation, storage interaction and artificial intelligence training on the same line.

In addition, using HBM and DDR5 together allows the creation of larger and consistent memory pools. This can be useful for large scientific data sets, long-context models, and retrieval-augmented generation-style systems. Big problem: EfficiencyBut this approach comes at a serious price. CPU-only systems are generally not as efficient as GPU-based systems at dense matrix calculations. This is exactly where Nvidia, AMD and similar accelerators stand out: higher AI processing power per Watt.

That’s why the preferred structure throughout the industry is still the CPU GPU or CPU accelerator architecture. The importance of LineShine is not because it is “better than GPUs”, but because it shows how far we can go with CPU-centric architectures when GPU access is restricted. Strategic message for China LineShine stands out as one of the important examples of China’s goal of reducing foreign dependency in the field of artificial intelligence and supercomputers.

While US restrictions make it difficult for companies like Nvidia to sell their most powerful AI GPUs directly to China, Chinese institutions are also trying to develop alternative architectures with domestic processors, domestic networks and domestic software stacks. Therefore, LineShine is not just a supercomputer announcement. It’s also a search for a new direction in the global AI hardware race. In an era dominated by GPUs, China is testing the harder but more independent path of “doing everything with the CPU.” SourceEditor’s note LineShine should not be seen as an architecture that will completely replace GPUs.

However, it is remarkable in that it shows how China developed alternative solutions on the AI and HPC side under embargo conditions. The real question is: Will this approach remain just an interim solution out of necessity in the long run, or will CPU-only systems become a strong option again for some scientific workloads? Do you think China’s strategy of CPU-intensive supercomputers instead of GPUs can be successful in the long run?

You can share your opinions in the comments.