Nvidia’s prefill accelerator was shelved, however Chipzilla’s Crescent Island may fill the void
COMPUTEX 2026 Intel provided new insights into its next-gen datacenter GPU codenamed Crescent Island. Alongside supporting enterprise AI deployments, the GPU may fill the void left by Nvidia’s Rubin CPX GPUs, which have been seemingly shelved late final yr following its acquisition of Groq.
As datacenter GPUs go, Intel’s Crescent Island is definitely an odd duck. It’s going to ship in a PCIe type issue when most high-end GPUs are actually utilizing socketed designs. It additionally will not use HBM and even GDDR reminiscence.
As a substitute, Intel has opted for LPDDR5x reminiscence — the identical form utilized in high-end notebooks and smartphones — and fairly a little bit of it too.
Crescent Island might be provided with as much as 480 GB of reminiscence, considerably greater than you will discover on Nvidia’s flagship GPUs, which at the moment prime out at 288 GB.
It is also low cost, a minimum of relative to HBM or GDDR, which ought to preserve costs down despite the worldwide semiconductor provide chain, which has seen reminiscence costs surge by greater than 3x since final yr.
The one factor that LPDDR5x is not is quick. Intel hasn’t shared bandwidth figures simply but however, assuming a big 1024-bit reminiscence bus, we’re round 1.2 TB/s. Crescent Island’s precise bandwidth will rely closely on how vast the reminiscence bus really finally ends up being, however for reference, Nvidia and AMD’s newest GPUs are pushing 20 TB/s.
How rapidly a GPU can churn out tokens is basically decided by how briskly the reminiscence is, making bandwidth a serious bottleneck. Or a minimum of that was the case.
The previous yr we have seen a shift towards disaggregated compute architectures which break inference into two phases: prefill and decode.
Prefill is a compute-heavy section of the pipeline. If you happen to’ve ever used an AI chatbot, you’ve skilled prefill because the wait between submitting a immediate and when the mannequin begins to reply. The sooner the compute, the shorter the wait.
Whereas prefill operations nonetheless eat a big amount of reminiscence, they’re principally compute sure, which implies you will get away with utilizing slower GDDR or LPDDR reminiscence relatively than pricy HBM.
This was the thought behind Nvidia’s Rubin CPX when it was announced late final summer season.
The Accelerator promised 128 GB of GDDR7 reminiscence and as much as 30 petaFLOPS of NVFP4 efficiency. For context, heavy workloads that required ingesting huge portions of tokens — code assistants for instance — prefill operations can be offloaded to CPX accelerators whereas token technology would proceed to run on Nvidia’s HBM4-equipped Vera Rubin Superchips.
With AI brokers quickly driving up the variety of enter tokens, the structure made quite a lot of sense. But, by March Nvidia had shelved the thought as a way to prioritize its new Groq LPU-based LPX racks.
Introduced at GTC, LPX addressed the alternative finish of the spectrum. Relatively than accelerating prefill, Nvidia’s Groq accelerators aimed to enhance consumer experiences and inference economics by juicing token technology.
However, the use case for one thing like a Rubin CPX hasn’t gone away. In a spherical desk with press this Spring, Ian Buck VP of Hyperscale and HPC at Nvidia mentioned CPX was nonetheless a good suggestion and we may even see the idea resurface in future generations.
Intel clearly sees a possibility to fill the void. The corporate, which has grown nearer to Nvidia since CEO Lip Bu Tan took the reins final yr, hasn’t mentioned a lot about Crescent Island’s supposed use case however Intel has steered that Nvidia Dynamo was coming to the platform.
Dynamo is Nvidia’s framework for disaggregating prefill and decode throughout a number of GPUs.
Whether or not Crescent Island really is smart for this use case will rely closely on its efficiency profile, one thing for which we’ve got only a few information factors proper now.
Intel hasn’t shared FLOPS figures but, however we all know the GPU will use its Xe-3P microarchitecture which provides assist for FP8 and FP4 datatypes, and can ship as a 350 watt Air-cooled PCIe card.
Whereas Intel has signaled assist for disaggregated inference through Dynamo, it isn’t the corporate’s solely choice.
Again in February, Intel and mates funneled $350 million into AI chip startup SambaNova. Then in April, the corporate revealed plans for a disaggregated inference platform utilizing Intel Xeons, SambaNova RDUs, and what turned out to be Nvidia GPUs. That platform went live this week.
Nevertheless, there is no such thing as a motive that Intel could not use one thing like LLMd — the open supply, open vendor up to date to Dynamo — to mix its personal GPUs with SambaNova RDUs as a substitute. ®
Source link


