Planning for Failure

We recently had a defect that required a code change so big it was sure to cause lots of unpredictable baby defects. The thought of a never ending cycle of defects was nauseating. I knew I could not do this successfully. Or, to put it more accurately, “There was a high probability that portions of the system would be affected in unanticipated ways.” This change had to go into production between versions, actually, between maintenance releases.

How do you reduce the risk on something like this? First, draw a circle around the defect and fix the intended issue. Then draw a second bigger circle around the likely risks and add/modify features that mitigate them.

In this case encryption was added to lots of fields, which could confuse the old code that did not antcipate the encryption and might mistake the encrypted data for real data. So, the second additional feature was a kill switch on each field, a DontEncrypt flag.

This solution adds it’s own issues. The DontEncrypt code had defects to. In essence, two significant features were added at the same time. It gave me the willies at first, because I was afraid the DontEncrypt code wouldn’t work. Or worst yet, cause unrelated defects.

Now that we are in production, issues continue to arise from the Encryption code. However, we are able to quickly and easily fix clients by setting DontEncrypt flag to true. It’s saving significant development time and keeping the clients working more. We have yet had an issue, outside of development, with DontEncrypt or a defect that this flag doesn’t fix.

My point is that you should think ahead to that day when your feature is in place and users are reporting bugs. Ask yourself what you could have done to prevent this. Then remember that that day hasn’t happened yet and you can do something today to prevent issues.

Leave a Reply