Who, Me? “Expect the unexpected” is a cliché regularly trotted out during disaster planning. But how far should those plans go? Welcome to an episode of Who, Me? where a reader finds an entirely new failure mode.
Today’s tale comes from “Brian” (not his name) and is set during a period when the US state of California was facing rolling blackouts.
Our reader was working for a struggling hardware vendor in the state, a once mighty power now reduced to a mere 1,400 employees thanks to that old favourite of the HR axe-wielder: “restructuring.”
Brian worked in the data centre as a Unix/Linux sysadmin while the sole remaining facilities engineer was located at another site, not far down the highway.
“We were warned that California was going to start having ‘rolling blackouts’,” he told us, “but we were assured that the diesel generator was fuelled up and could provide power for a day or two, and we had battery backup for the data center part of the building for at least 30 minutes (just in case).”
What could possibly go wrong?
On the day of the first blackout, the lights in Brian’s cubicle farm went out and the desktops died, as expected. Employees switched to machines running on the UPS while the big generator prepared to start.
Sure enough, the diesel kicked in. However, the power did not flow. The building remained dark. Brian popped round to the back of the data centre to double check and, yes, the generators were definitely running. But for some reason the lights weren’t coming on.
He couldn’t get into the generator enclosure to troubleshoot further because it was sensibly securely locked. And the key? With the facilities engineer. Who was on the other site.
The time was now 4:30pm and anyone familiar with just how bad the traffic on that part of the I-280 could be between 4pm and 7pm knows that the chance of the engineer navigating the snarl-ups within 30 minutes was pretty much zero.
“So our facilities guy was racing to the data center at somewhere around walking speed,” said Brian.
Even worse, the air conditioning for the data centre ran on mains power, not the UPS. After all, there was only supposed to be a short blip before the generator kicked in. But it hadn’t, and things were heating up.
The team began desperately shutting everything down. Development kit, test hardware, even redundant production systems. Anything that might draw precious juice from the UPS, emit heat and wasn’t absolutely essential did not escape a jab of the power button.
“By the time the Facilities guy made it to the data center, about an hour later, the house UPS had been drained dry,” recalled Brian, “even though we’d pared down to the bare minimum of servers and network gear and we had all the doors to the building and data center wide open in a vain effort to keep things cool.”
But what had happened? The generators were running, but the switch had not occurred. With the aid of the key to the enclosure, the facilities engineer investigated and reported back.
“Turns out, everything had worked as planned except for one switch in the generator enclosure that was supposed to switch the building over to the generators.
“It had been a favorite perch of the neighborhood birds and was so encrusted with poop that it wouldn’t actually switch.”
At least that was the explanation given by the engineer.
“Ah well, there’s always the unexpected, eh?” said Brian.
We never saw “poop-encrusted switches” in any of our disaster recovery plans, but maybe we should have. Or perhaps the facilities engineer used the antics of his feathered friends to cover up a cock-up of his own. Let us know your thoughts in the comments and submit your own technological tumbles with an email to Who, Me? ®