HPE is about to construct a successor to the Frontier exascale system for America’s Oak Ridge Nationwide Laboratory, primarily based on the subsequent era of its Cray supercomputer platform, plus a separate AI cluster to advance machine studying with a multi-tenant cloud-like platform.

The Discovery system will “bolster productiveness as much as 10x,” in response to HPE, and like many different supercomputers will probably be used for scientific analysis into numerous areas together with medication, most cancers analysis, nuclear power, and aerospace.

ORNL GX 3D system mock-up Discovery

Mock-up of HPE’s forthcoming Discovery GX5000 system

Oak Ridge issued a request for proposals (RFP) for a successor to Frontier final yr, with an anticipated supply date of late 2027 to early 2028 and anticipated funds of $500 million.

HPE now says supply of Discovery is predicted in 2028, with person operations set to start in 2029.

The nationwide laboratory will even obtain a second HPE-built system, Lux, the AI cluster meant to help each coaching and inference work on the web site. That is anticipated to be put in early in 2026.

Discovery will probably be primarily based on HPE’s Cray Supercomputing GX5000, the subsequent iteration of its supercomputing structure, and also will characteristic a brand new Cray Storage Programs K3000 working the DAOS object storage platform, plus the subsequent era of Cray’s Slingshot high-performance networking.

HPE says the Discovery nodes will probably be constructed with AMD’s “Venice” (a code name) server processors, which aren’t because of be launched till subsequent yr, plus Intuition MI430X GPUs – additionally due subsequent yr – for the extent of efficiency required for modeling, simulation, and AI initiatives.

Nevertheless, HPE didn’t disclose what number of nodes or CPUs and GPUs will go into constructing Discovery, or how a lot reminiscence the system may have.

For interconnect, it’s going to use the subsequent era of Slingshot networking HPE gained when it acquired Cray, though this has but to launch and the corporate did not give a date as to when it’s going to. The present Slingshot 11 helps 200 Gbps per port, and may be thought to be a superset of Ethernet.

Discovery will probably be supported by Cray Storage Programs K3000, which HPE claims will help as much as 75 million enter/output operations per second per storage rack, 4x extra efficiency than the subsequent 30 storage programs on the IO 500 listing, in response to the agency.

This will probably be primarily based on the open supply DAOS (Distributed Asynchronous Object Storage) platform, however will complement reasonably than exchange the Lustre file system-based Cray Storage Programs E2000, which will even be included in Discovery.

DAOS was developed by Intel, however farmed out to an independent foundation after the chipmaker canceled its Optane reminiscence expertise in 2022 and misplaced curiosity. HPE then employed Intel’s DAOS engineers and introduced them into its personal storage group.

Lux, in the meantime, is about to be an all-AMD affair, primarily based on liquid-cooled HPE ProLiant Compute XD685 nodes with Epyc CPUs, Intuition MI355X GPUs, and linked collectively utilizing AMD’s Pensando SmartNIC networking.

Liquid cooling improvements

Crosshead textual content

Trish Damkroger, HPE’s senior VP for HPC and AI Infrastructure Options, informed The Register that the GX5000 had been within the works for years, however the firm had “made some pivots over the past yr and a half, as we have seen the expansion of TDPs (thermal design factors), the expansion of various silicon popping out from all of the distributors, and the necessity to have the ability to help all of those totally different workloads.”

She stated the racks will be capable to accommodate as much as 25 kilowatts per compute slot, 127 p.c larger than earlier than. However she appeared prouder of the liquid cooling for the GX5000 infrastructure, which now helps 40°C (104°F) water to satisfy new power necessities for lots of consumers in Europe.

HPE next-gen cooling

HPE next-gen cooling

This implies extra chillers and fridges aren’t wanted, which cuts energy, so it’s a way more energy-efficient system for upcoming deployments.

“It’s a bookend design,” she stated. “So principally, the cooling pump is designed to be extra compact. And may be positioned on the aspect of the system as a substitute of within the center. And every pump goes to have redundancy to make sure that there’s always-on operation.”

Damkroger added that customers can now management the water stream charge, so as a substitute of each single blade having the identical, it may be optimized for every blade and its workloads.

HPE stated there will probably be a chance to see the brand new GX5000 infrastructure on the SC 25 high-performance compute convention in St. Louis, Missouri, subsequent month, although the platform isn’t anticipated to be accessible to clients till early 2027. ®


Source link