- OpenAI’s {hardware} chief warns future AI fashions want real-time {hardware} kill switches
- Richard Ho highlights networking, reminiscence, and energy challenges in scaling infrastructure
- He requires benchmarks, observability, and cross-industry partnerships to handle reliability and belief
A senior OpenAI government has warned that future AI infrastructure would require hardware-level security options, together with kill switches.
Richard Ho, head of {hardware} on the firm, made the remarks throughout his keynote on the AI Infra Summit in Santa Clara.
“It has to be built into the hardware,” Ho said. “Today a lot of safety work is in the software. It assumes that your hardware is secure. It assumes that your hardware will do the right thing. It assumes that you can pull the plug on the hardware. I am not saying that we can’t pull the plug on that hardware, but I am telling you that these things are devious, the models are really devious, and so as a hardware guy, I want to make sure of that.”
Silicon-level safety measures
Ho argued that the growth of generative AI is forcing a rethink of system architecture and described how future agents will be long-lived, interacting in the background even when a user is not actively engaged.
This shift requires memory-rich, low-latency infrastructure to manage continuous sessions and communication across multiple agents.
Networking, Ho said, is becoming a bottleneck. “We’re going to have to have real-time tools in these – meaning that these agents communicate with each other. Some of them might be looking at a tool, some might be doing a website search. Others are thinking, and others need to talk to each other.”
Ho outlined several hardware challenges that must be addressed, including limits on high-bandwidth memory, the need for 2.5D and 3D chip integration, advances in optics, and extreme power requirements that could reach 1 megawatt per rack.
The safety measures OpenAI put forward include real-time kill switches built into AI clusters, telemetry to detect signs of abnormal behavior, and secure execution paths in CPUs and accelerators.
Ho wrapped things up by saying, “We don’t have good benchmarks for agent-aware architectures and hardware, and I think it is important to know about latency walls and latency tails, what is the efficiency and power and things like that. We need to have good observability as a hardware feature, not just as a debug tool, but built in and constantly monitoring our hardware.”
“Networking is a real important thing, and as we head towards optical, it is unclear that the reliability of the network is there today. We need to get there with enough testing of these optical testbeds and these other communication testbeds that show that we actually have the reliability.”
You might also like
Source link