The VA Computer Sytem Meltdown

A story that didn’t make the news was the meltdown of the VA’s IT infrastructure on 8/31 for one day this year. This article from Computerworld is an illuminating look at Enterprise level Information Technology. The principles interviewed have blamed the problem on employees failing to follow procedure. I’ve heard this used before in the IT department of every organization, all two, that I worked for. It’s a drum beat that the process owners beat relentlessly.

It’s also completely impossible for one simple reason. People aren’t computers. There is no way to tell a person to follow a set of rules every single time and expect them to do it. Computers are good at doing this. That’s why we use them, but don’t confuse computers with people.

This article competely misses two points. One is that this is going to happen again without a doubt. The second point is poor design of their infrastructure.

The VA should plan for IT mistakes just like any other disaster scenario. They seemed to have planned for natural disasters and network problems quite well. No forethought is given to getting the data back in spec. That’s simply not supposed to happen. Paper forms specifically made for resyncing data should be prepared and sent to replace any old forms that existed prior the technology “upgrade”. Plans for workarounds should be made for getting vital information from medical instrumentation. There is no excuse for test results or machine readings to be unavailable.

When the computers failed people did the natural thing. They reverted to the prior process which was paper based. It took two weeks of data entry to get one day of paper records in the computer. This is the real disaster. Where is the plan for the computers being down for 1 day, 1 week, 1 month? Apperantly, there isn’t one. Zero downtime and 100 percent uptime are impossible. Anyone who sells you this is lying. Get over and plan for it.

Why did the VA IT infrastructure fail? We’ve all seen the internet weather several storms. It slows down and some systems fail, but the whole Internet keeps going. If this system can be brought down by one port misconfiguration it has not been properly designed. Sites on the Internet are exposed to large scale intentional attacks every day and survive. The architectural descriptions of high volume sites like Slashdot are very interesting. Where is this attention to detail? Where is this level of skill? Where is this fault tolerance?

This instance happened to the Veterans Administration. It is no less applicable to any system that puts people lives in the hands of technology. At the medical malpractice suit your family brings, do you want Bob the 23 yr old computer geek to be your cause of death?. You can’t blame him. He’s not a computer.

The VA’s computer systems meltdown: What happened and why
Computerworld
Dian Schaffhauser

Leave a Reply

You must be logged in to post a comment.