Chinese language tech big Alibaba has printed a paper detailing scheduling tech it has used to realize spectacular utilization enhancements throughout the GPU fleet it makes use of to energy inferencing workloads – which is sweet, however not a breakthrough that can fear AI traders.

Titled “Aegaeon: Efficient GPU Pooling for Concurrent LLM Serving on the Market”, the paper [PDF] opens by stating that model-mart Hugging Face lists over 1,000,000 AI fashions, though prospects largely run only a few of them. Alibaba Cloud nonetheless affords many fashions however discovered it needed to dedicate 17.7 p.c of its GPU fleet to serving simply 1.35 p.c of buyer requests.

The explanation for that discrepancy is that service suppliers usually configure their GPUs to run solely two or three fashions, which is all that GPUs can deal with as a result of they don’t have sufficient reminiscence to run extra. That strategy implies that an outfit like Alibaba Cloud may have hundreds of idle GPUs devoted to seldom-used fashions.

That’s clearly untenable given the price of GPUs and, for Alibaba, the problem of buying equipment from Nvidia and AMD because of US sanctions.

The Chinese language cloud champ due to this fact developed GPU pooling and reminiscence administration tech which means it may run extra fashions on every GPU and offload information into a number’s reminiscence or different storage.

The headline figures from that strategy are spectacular: Alibaba as soon as devoted 1,192 GPUs to operating little-used fashions for patrons’ inferencing workloads. Throughout a three-month beta take a look at of Aegaeon Alibaba was in a position to make use of simply 213 GPUs, an 82 p.c GPU useful resource saving. The corporate has additionally managed to get a few of its GPUs operating “tens” of fashions.

Alibaba Cloud claims Aegaeon is superior to various options, and the truth that its paper was accepted for presentation finally week’s Proceedings of the ACM SIGOPS thirty first Symposium on Working Methods Rules – a tutorial laptop science convention – suggests the work is sound.

But it surely’s not essentially a breakthrough, as a result of hyperscalers are justifiably reticent to disclose all of the tech that powers their platforms. It’s due to this fact completely attainable that different hyperscalers have already addressed this situation – and maybe performed even higher than Alibaba.

One other factor to notice is that hyperscalers are previous masters at rising utilization charges for his or her {hardware}, as doing so improves their earnings. So whereas this paper describes some intelligent work by Alibaba, it additionally reveals the Chinese language big’s earlier setup was not environment friendly.

The Register believes the paper is nonetheless vital as a result of we’re typically instructed that as AI matures builders will create many industry-specific or scenario-specific fashions. Clouds want to have the ability to run all of them effectively, and Alibaba’s strategy suggests it’s on the best way to creating that attainable – which ought to imply the costs to run obscure fashions gained’t blow out as a result of utilizing them requires extra GPU assets.

That’s welcome. However this paper gained’t panic AI traders as occurred in January 2025’s “DeepSeek moment” when it seemed like Chinese language techies had discovered methods to dramatically cut back the portions of GPUs required to coach fashions. ®


Source link