In many cases the failure modes, means of detection or appropriate response aren't understood in advance. These scenarios don't test well in isolation, and seem to prefer emerging in production environments over time. Experience and basic probability tell us we should expect this to be the case, yet the means by which we can monitor, diagnose and correct aberrant behaviour often aren't particularly good.
The dead man's handle is neither the start nor the end of fail-safe mechanisms for public transport. Similarly, our initial attempts at coping with failure are unlikely to be ideal. Automated responses to failure are great, but manual intervention may be the only way for your application to survive long enough to implement them.
E-mail addresses are not publicly displayed, so please only leave your e-mail address if you would like to be notified when new comments are added to this blog entry (you can opt-out later).