Analysis Amazon Web Services has signaled that the future of cloud computing cannot rely alone on general-purpose chips with its new Graviton3E silicon, joining AMD and Intel in introducing specialized central processing units that are meant to perform certain applications faster and more efficiently.

While the computing world has increasingly turned to GPUs for workloads like AI training that benefit from many cores working in parallel, Intel, AMD, and now AWS are finding benefits from customizing CPUs for some data-intensive apps that are important to businesses, governments, and academia.

This means the cadence in CPU improvements will not be as straightforward in the future, given that all three companies will soon both have general-purpose and specialized central processing units available. For organizations with high-performance needs, this will demand more scrutiny over system configurations as chip designers look to eke out performance and efficiency gains in new ways.

In the case of Gravtion3E, AWS has the same target as recent and upcoming CPUs from AMD and Intel — high-performance computing — at least on a broader level. We’re talking about a wide range of applications used by scientists, engineers, and other data-concerned professionals, such as computational fluid dynamics, weather modeling, and molecular dynamics, to name a few.

AWS said this week that the Graviton3E is well-suited for HPC applications because it’s been optimized for floating point and vector math. Executive Peter DeSantis claimed this fine-tuning allows the Arm-based chip to run faster on benchmarks for life sciences and financial modeling workloads than the general-purpose Graviton3, which started powering instances earlier this year.

While AWS didn’t reveal many details about Graviton3E, we can look at the new HPC-tuned CPUs from AMD and Intel to understand how general-purpose chips can be tweaked to benefit a set of applications.

AMD ups the cache to serve technical computing apps

Earlier this year, AMD rolled out a new variant of Epyc server chips, known under the codename Milan-X, that are designed to speed up a narrower set of apps in the HPC world. The targeted workloads consist of electronic design automation, computational fluid dynamics, finite element analysis, and structural analysis simulations, for which AMD puts under the umbrella of “technical computing.”

The bulk pricing for the Milan-X chips come at a “modest premium” over vanilla 3rd-Gen Epyc processors with similar attributes, but in exchange, AMD said users can expect a major boost in performance for targeted workloads thanks to a massive amount of cache fused on top of the CPU.

The extra performance comes in the form of 768MB in L3 cache, triple the amount contained within the general-purpose 3rd-Gen Epycs that were introduced in 2021. This means a dual-socket server can sport more than 1.5GB in total L3 cache.

The enlarged L3 cache allows the CPU to hold much more data closer to the processor’s cores, which is important for technical computing workloads that regularly move around vast amounts of data.

For instance, AMD claimed that a 16-core Milan-X chip can perform 40.6 jobs per hour for Synopsys’s VCS software used for chip design. In comparison, AMD’s vanilla 16-core Epyc from the same generation could only do 24.4 jobs per hour, making the Milan-X chip 66 percent faster.

The company has also claimed that Milan-X can run 23-88 percent faster than Intel’s 3rd-Gen Xeon Scalable chips from last year for a variety of technical computing apps. As always, these and other claims made by vendors should be taken as a large grain of salt.

Intel tackles HPC with high-bandwidth memory

Intel is also tackling the problem of keeping more data closer to the cores for HPC apps, except instead of creating a larger cache, the company has designed a CPU with 64GB of high-bandwidth memory.

This is in reference to Intel’s upcoming Xeon Max Series processors, which are an HPC variant of the Sapphire Rapids server chips coming out early next year.

The x86 giant recently claimed that the Xeon Max chips will have better performance than its 3rd-Gen Xeon Scalable processors and AMD’s Milan-X chips for a wide range of HPC apps. It made this claim by showing nearly 20 HPC benchmarks where the top Xeon Max chip performed anywhere from 20 percent to nearly five times better than the last-generation processors. Remember, grains of salt.

By putting 64GB of high-bandwidth memory right into the chip, Intel is also giving more flexibility with how servers are configured. For instance, datacenter operators can forgo DRAM completely in a server by only relying on Xeon Max’s high-bandwidth memory, no code changes needed. This, in turn, is expected to reduce the costs associated with buying memory DIMMs and their energy costs.

Xeon Max can also be used to expand a total system’s memory by using DRAM in tandem with the high-bandwidth memory, though this requires code changes in software. Alternatively, users can configure Xeon Max’s high-bandwidth to act as a cache for the DDR, which doesn’t need any code changes.

While the added high-bandwidth memory is the defining feature of Xeon Max, the processor has other bells and whistles meant to boost certain HPC and AI apps, such as Intel Advanced Vector Extensions 512, Intel Deep Learning Boost, Intel Data Streaming Accelerator, and Intel Advanced Matrix Extensions.

A fragmented processing future

Specialized CPUs aren’t completely new. For instance, Intel has been churning out Xeon processors tuned for telecom workloads. >But this new batch represents a bigger wave coming of central processing units that won’t be designed to serve the widest possible array of applications.

Nvidia, for instance, is planning to release its Arm-based Grace CPU early next year for HPC and AI purposes. AMD, on the other hand, is working on future generations of Epyc chips that are optimized not just for HPC but also edge and telecom workloads. Both Intel and AMD are developing CPUs that are optimized for cloud computing too.

Then we need to consider that Intel, Nvidia, and AMD are working on ways to bring the CPU and GPU closer to each other for apps that need a lot of horsepower. For Nvidia, this will come in the form of the Grace Hopper Superchip next year. Intel is planning to accomplish this with its Falcon Shores “XPU” in 2024. Meanwhile, AMD intends to do this with next year’s Instinct MI300 chip.

All of this means that processor roadmaps are becoming more complex and will require much more homework in the future if you’re working in an IT shop with high-performance needs. Good luck. ®


Source link