In IT, downtime chooses you

Even amid the best-laid plans, systems can go down without good reason, so do what you can -- and remember to take a breath

Unlike some other industries, IT tends to be quieter during the holiday season than at other times of the year. Clearly this is dependent on the main business focus, but many IT folks get to enjoy a break around Christmas and New Year's. This year I decided to do the same, swapping activity for passivity, essentially only responding to actual data center problems, of which I encountered mercifully few. But I had some help in my goal to disconnect, and it gave me a glimpse of a different perspective, though it wasn't so relaxing.

On New Year's Eve, the fates decided I should be stricken with a bizarre set of technological problems that defied logic and reason. At roughly the same time, both of my data circuits dropped and my analog phone lines went dead. According to my monitoring, this all happened within the same five-minute window.

[ Get expert networking how-to advice from InfoWorld's Networking Deep Dive PDF special report. | Pick up the latest practical advice and technology news with InfoWorld's Data Center newsletter. ]

We're talking about a business-class cable connection and a backup DSL connection through two different companies, using two different technologies, riding two different wires, terminated by different hardware devices. The only logical explanation was that a tree had fallen and cut off these services, or the physical cables had somehow been severed. However, the power was still on, and cable TV worked fine. Visual inspection of all lines showed no disturbance.

Thus, on the cusp of a new year, my afternoon was spent on my cellphone with Fairpoint and Time Warner Cable trying to figure all of this out. Phone support for both was clueless; they initially tried to blame my equipment, then later acquiesced and decided to send out a tech -- on Jan. 2.

I was still puzzled by the coincidental nature of the problem and wasn't entirely sure we could rule out a physical problem with the line somewhere, perhaps in a frozen conduit or the like. The phones, cable, and cable Internet worked fine for adjacent customers, and the odds of two different service providers having an outage that affected only my circuits at the same time were too high. However, that was the only viable answer I could come up with, and both companies swore it wasn't their fault.

There was naught to do about it. It was the last day of 2012, and I spent it cut off from the Internet, except for my cellphone, despite my planning and best intentions.

We like to joke about first-world problems, and lacking Internet access is not really a hardship. But for those of us who are the last line of defense for infrastructures that can and do develop emergent problems in an instant, it's more of a concern. Not to mention it's hardly OK for an IT professional to have multiple circuit outages, no matter what the cause. Our nature dictates that we feel a sense of unease and discomfort until all the pieces to the puzzle are back in place and the bits flow again as they should.

However, I decided to dispense with that concern, if only for a day. I would silence the internal alarms that ring throughout an outage or unscheduled downtime. I would at least try to stop the background processing that rolls through unabated, mulling over the various data points of the problem, looking for correlations or clues that would fix the problem -- all of the normal trappings of an IT ninja in a crisis. After all, it was New Year's Eve, and I had my phone and tablet if something horrible happened at a remote facility.

I was moderately successful. I had a good time at a few parties that evening, and I found myself drifting back to the problem only occasionally, during lulls in conversation and other natural pauses. A good troubleshooter never rests, I suppose.

The problem was eventually resolved. The first sign was the phone ringing on New Year's Day with a call from Fairpoint telling us that our connections had been restored. Both analog lines happened to be controlled by a switching card that failed on New Year's Eve.

All Internet circuits were still down, however. I grabbed a beer and watched a few bowl games, and within a few hours, the DSL circuit came back up of its own accord. The next day, a nice fellow from Time Warner Cable showed up to check the circuit and quickly figured out they'd somehow managed to disable my modem remotely. Only a higher-tier tech could determine that and fix it, he said. He was out the door in less than 10 minutes, and all was back to normal.

If there's a lesson here, it's that even when the odds are seemingly stacked against a particular root cause, you still can't count it out. The other lesson, perhaps, is that I need to learn how to take time off -- but those instructions will have to wait for another day.

This story, "In IT, downtime chooses you," was originally published at InfoWorld.com. Read more of Paul Venezia's The Deep End blog at InfoWorld.com. For the latest business technology news, follow InfoWorld.com on Twitter.

Copyright © 2013 IDG Communications, Inc.