Huawei is reportedly preparing to unveil a major breakthrough in AI inference technology that could lessen China’s dependence on advanced High-Bandwidth Memory (HBM) chips, a critical component currently restricted by U.S. export controls. The announcement is expected to take place today at the 2025 Financial AI Inference Application Implementation and Development Forum, according to Economic Daily News citing STAR Market Daily.
The development could mark a significant milestone for China’s AI ecosystem, particularly as the industry shifts its focus from maximizing model capabilities to enhancing real-world application value. Inference, the stage where trained AI models process and respond to data, has emerged as the new frontier for performance optimization.
HBM, essential for overcoming data transfer bottlenecks in large AI models, has been a major limiting factor in inference performance. Shortages in HBM capacity can lead to stalled tasks, slower response times, and reduced efficiency. This bottleneck has been exacerbated by U.S. restrictions introduced in December 2024, which bar top suppliers SK hynix, Micron, and Samsung from exporting HBM2 or more advanced chips to China.
Turning Constraints Into Innovation
Faced with these restrictions, Huawei and Chinese research institutions have accelerated efforts to develop alternative solutions. In March 2025, Peking University, in collaboration with Huawei, launched the DeepSeek full-stack open-source inference platform. Built on the university’s self-developed SCOW (Super Computing On Web) platform and CraneSched task-scheduling system, DeepSeek is designed for efficient large-scale inference on Huawei’s Ascend AI chips, integrating multiple open-source components to maximize performance without direct reliance on restricted HBM supplies.
CloudMatrix 384: Scaling Beyond HBM Limits
Huawei is also pushing the boundaries of AI compute with its CloudMatrix 384 system, which integrates 384 Ascend 910C processors. While a single 910C chip offers roughly one-third of the performance of NVIDIA’s Blackwell GPU, the massive scale of CloudMatrix 384 enables it to reach an estimated 300 PFLOPs of dense BF16 compute, almost double the 180 PFLOPs achieved by NVIDIA’s GB200 NVL72 system.
The platform’s architecture delivers 2.1 times the total memory bandwidth and over 3.6 times the total HBM capacity compared to NVIDIA’s GB200 NVL72, using HBM2E memory technology. This approach leverages scale and system-level optimization to sidestep some of the limitations posed by restricted access to the latest HBM generations.
A Strategic Signal Ahead of Potential Trade Talks
This potential breakthrough comes as China reportedly seeks concessions from the U.S. on HBM export restrictions in the lead-up to a possible high-level summit between the two nations. While policy negotiations remain uncertain, Huawei’s progress in AI inference suggests that domestic innovation could reduce China’s reliance on imported components, especially in high-stakes sectors like AI and supercomputing.
If confirmed, Huawei’s latest achievement would not only boost China’s AI capabilities under current trade constraints but also demonstrate how technical ingenuity can help overcome critical supply chain challenges.
