When things go wrong...

...find a torch.

When incidents arise I feel the role of the architect is to mediate: to get the information needed to steer the technical team, to help identify the problem and to ensure that the solution may be deployed without making things worse.

Recently I've been involved in several incidents where concurrency issues, session hijacking, sequence number conflicts, etc. were floated as possible causes for complaints that were being raised on a website.

While all of these might be possible causes for the symptoms, they're not particularly plausible. It reminds me of a tale of a man who drops his keys on a dark street. He keeps walking until he finds a street light and starts looking for his keys there. While they're not likely to be there, he'd be more likely to find them if they were. I've seen people looking for explanations to production issues in outlying places first before trying to shed some light on the problem. I've certainly done the same when the adrenaline's pumping.

In the case of my recent production incidents, a look at the data showed that the most likely explanations were human error and an ISP recycling a user's email address.

About the author

Kevin has been working with Java for 10 years, in defence research through dot com to investment banking. Currently he works at JPMorgan developing front-office trading solutions.

While getting on well with server-side Java, Kevin's also a keen Swing developer (and possibly masochist).

E-mail : kevin.seal at codingthearchitecture.com


Re: When things go wrong...

The idea of looking in the accessible / well-understood areas of a system, rather than in the areas which are likely to hold the flaw, can apply in all sorts of places.

It reminds me of an argument Mandelbrot used when pointing out the deficiencies of the current tools used for analysing financial risk (paraphrased slightly): if the only tool you have is a hammer, all you see are nails to hit.


Add a comment Send a TrackBack