The design process doesn't have to be complicated
The right design is what really matters
One of the many discussions we had on the training course in London recently was about the actual process of software architecture and design. There's a perception that software design is a complex process steeped in formality where you must produce a large number of diagrams drawn using a formal notation. We like to present a very straightforward and lightweight approach to software design but really the important part of the process is understanding what drives the architecture. It's about understanding the functional requirements, the non-functional requirements, the environmental constraints that are imposed upon you and the architectural principles that you want to adopt. Understanding these and how to work within them does more to contribute to successful software projects than using sophisticated modelling tools and formal methodologies. I'm not saying that you shouldn't use them, but you do need to understand the drivers so that you can make the right decisions and come up with the right design. And coming up with the right design is really what matters.
Training in June - Oslo and London
Just a quick note to say that I'm running our Software Architecture for Developers training course twice in June as follows.
You can preview the slides online and here are what people have said about the course recently.
- "Excellent course over the last couple of days on Software Architecture held by @simonbrown at @skillsmatter LDN. Highly Recommended. #sa4d"
- "Great real world discussions, experiences. Exercise was based on a real world problem and experience. Great slides, presentation and website to match."
- "Very knowledgeable with obvious previous working knowledge & experience of the subject matter - not just working from a script as others do"
There are couple of other events planned for over the summer months, so stay tuned for information about those.
Rescheduled London events at Skills Matter
Following last months disruption from the volcanic ash cloud, I'm pleased to say that our training course and my "In the brain" session at Skills Matter in London have been rescheduled. Software Architecture for Developers is now running on the 17th and 18th of May, while the free "In the brain" session entitled Where do you start? is taking place on the 17th of May at 6:30pm. I'm catching the ferry rather than the aeroplane so I hope to see you there!
Build processes as architectural health indicators
A poor build process can highlight a lack of architecture
I did some consultancy work recently where my primary task was to review the development environment and build processes for a software system. From an external perspective it's a relatively simple looking system but behind the scenes it's a more complex set of interactions between a number of disparate systems. In order to understand and improve how a software system is developed, built and deployed, you need to have an understanding of a number of things from the basics of the technology stack through to the structure of the overall system (components and interactions). When I started exploring the system with the team, it became apparent that there was a lack of technical leadership and overall architectural guidance.
As an example, nobody had a single big picture understanding of the overall system, it's components and the interfaces with other systems. Everybody knew about their own areas of work but the same level of effort hadn't been placed on the integration of those areas into a single cohesive system. The result is that the development, build and deployment processes in use across the system feel different and disconnected. Essentially each team member is taking a siloed approach to building components and this also reflects the lack of a clear and consistent approach to dealing with architectural aspects such as data access, configuration, etc. While the team may be able to keep on top of the system at the moment, future development won't be able to take advantage of common infrastructure services because they simply don't exist. Reuse opportunities are being missed and the system will be harder to maintain in the future because standard approaches to solving problems aren't being adopted.
The situation here is common to many software projects and issues like those I've mentioned are warning signs that the projects are lacking technical leadership. This technical leadership doesn't have to come in the form of upfront architecture but it does require team members to be architecturally aware, defining reusable approaches and services to tackle common problems where appropriate. My task was only to look at improving the development processes, but those existing processes highlighted some problems with the overall architecture of the system. The build process usually sits on the sidelines for most software projects but it *can* be an excellent architectural health indicator.
Fail Safe
a much abused term
One of the most misunderstood engineering terms is 'fail safe'. Most people from a non-engineering background (including many software developers) believe it means something won't fail. Last week even the Economist used it incorrectly.
A 'fail safe' device/system is expected to eventually fail but when it does it will be in a safe way. Classic examples include the brakes on trains that engage when they fail and ratchet mechanisms in lifts/elevators so they can't drop if the cable breaks. Well engineered physical devices will state their Mean Time Between Failure (MTBF) and define how they can fail and what happens when they do. A well maintained physical device may never fail over its lifetime but you know what will happen if it does.
A fail safe physical device may also define what occurs when a user error causes it to behave in an undesired manner. For example the “dead man handles” in lawn-movers or electric drills. I own an angle-grinder and in order to turn it on I have to flick a switch and then pull a trigger. Importantly, if I let the trigger go the cutting blade is stopped. This means that if I drop it I'm much less likely to lose a foot. When the trigger is released the switch is also reset, making it impossible for the trigger to be pressed by bouncing off an object.
As there is no physical wear-and-tear on a software system the concept of MTBF is arguably not applicable. However software systems can and do fail all the time, so perhaps it's surprising that many software systems I've experienced don't cope with failure very well or have defined actions when they fail. For example the following may happen:
- Underlying hardware failure. Networks and external disks are the ones I encounter most.
- External system failure. Obviously your system is perfect but external systems you rely on start to feed you garbage.
- User error. If you create an idiot proof system then I guarantee they will employ a better idiot.
It's tempting to try to correct a failure situation and keep on running but this can lead to a system getting into an unknown state and creating more issues. For example:
- The network is not responding but you keep on processing inputs and queuing outputs hoping it comes back. Your caches and disks fill up affecting other systems. Eventually it does come back on line and your system stops responding as it processes hours worth of stale data.
- An external data provider starts sending blanks in a numeric field. A developer had previously decided to 'interpret' empty as a zero (whereas it was missing data) and this fed through a banks pricing systems, was forwarded onto other system which then tried to execute buys (these as they were obviously a bargain at zero!)
- In finance we worry about 'fat fingers' where a trader hits the wrong keys and buys a 12 million rather than 1 million...
All of the above are real examples I have come across. How would I have changed the failure handling? I prefer to put the system into a known, safe state if possible.
- Put limits on anything you do for recovery situations e.g. retry only three times, put a time limit on caches etc. Don't continually do something that isn't working.
- Don't make generic assumptions about correcting data across a system. If it's not a good input then fail that input as you have no idea what it really means and you are hiding the error. Note that I'm not suggesting the entire system should be suspended but the transactions that are in error should be suspended and reported upon.
- User inputs are often sanity checked but “are you sure” dialogs are automatically clicked (without reading them) or the “never show this again” checkbox is selected. Ultimately, there is only so much you can do to save the user from themselves but you might want to save an audit of the user's decisions...
It's important to not just put the system (or transaction) into a safe state but to also inform those that can resolve the situation. As developers we often write
LOG.warn(“Transaction X has failed”)
and think nothing more about it. It's amazing to use a reporting tool like Splunk on a mature system and extract all the worrying messages. Would it be more appropriate to send an email, pager message, text message or change a dashboard status etc?
We need to design the error reporting and monitoring services up front and define how the operators should be kept informed. We also need to allow the operators to resolve issues speedily and safely.
To conclude:
- How can a system fail?
- What safe state can be entered?
- How can the failure be reported?
- How can the issue be resolved?
QCon London 2010
After running the abridged version of the software architecture for developers course at QCon London on Monday with Simon, I returned for the conference "proper".
QCon felt quite diverse this year, running tracks on Agile, Java, .NET, architecture, craftsmanship, design, web and mobile development, testing, and more. As such, it’s quite hard to come away with some sort of over-arching feel for what’s driving the industry at the moment. Perhaps that’s really the take-away message this year: that there are numerous movements bubbling under, but a growing sense of pragmatism over technology choice and process adoption.
However, a few candidate themes were discernible that might constitute some sort of zeitgeist.
Firstly there was the contended issue of software craftsmanship. At its heart there was a suggestion that this is about being proud of the software we produce and developing techniques to enable, and perhaps enforce, the delivery of “good code”. Around the edges was the suggestion that it was describing a walled garden, wherein development became inbred to the detriment of broader collaboration. Despite primarily being a developer, I found myself disagreeing with the software craftsmanship argument more than I expected. The argument seemed to be predicated on the assertion that we simply knew that we had a lot of bad code and that bad code was, well, bad. It took other tracks to present any evidence of there having been no improvement in software project success rates over the last decade, which may simply be bad science anyway! In a successful bit of scheduling there were talks representing both sides of the argument, so this is definitely a topic to keep watching.
The craftsmanship track was also complemented by the development and operations track. This presented a view of development, deployment and production support disciplines as a partnership, ideally wherein they were part of the same process. Craftsmanship in the output of this sort of team feels like a more valuable ambition, where the notion of code quality reflects its ability to survive outside the bubble of our QA processes.
The Agile movement also came under some scrutiny in the Agile Evolution track. There was (finally!) a pragmatic view of it presented as an industry that has seemingly sprung up off the back of some well-intentioned recommendations. In a somewhat ironic retrospective, we see how Agile has caused us to throw away things that we have since come to reinvent, while clinging to rituals regardless of their real value; “we have ADHD, retrograde amnesia and OCD,” says Keith Braithwaite. Rachel Davies, author of “Agile Coaching”, referred to "w-agile": mini-waterfall with daily stand-ups, the point at which many attempts at agile adoption stall. However, it was refreshing to see this recognised as being a good start and perhaps as a reminder that it’s not a particular process that’s the goal, but continuous improvement.
There was a huge amount in addition to this at the conference -- certainly a couple of consistent technical themes that are worth tracking down on the QCon site. In particular, Dan Ingalls provided a reminder of the enjoyment we can get from developing software and that we shouldn't lose sight of this in our work.
Structuring the software design process
And avoiding complex and cluttered diagrams
I had a great time last week discussing software architecture across a mix of QCon, our software architecture training and the IASA session that I ran. I mentioned this earlier in the year, but we've enhanced our material around the architecture definition process to include much more guidance on how you go about actually designing software when all you have is a set of requirements and a blank sheet of paper. In addition to understanding the requirements (functional and non-functional), constraints and principles; it's really about putting some structure into the diagrams that you might draw during your initial agile modeling rather than drawing up a single very complex and cluttered picture that is hard to explain or understand.
I've already written about not needing a UML tool to undertake the software design process and I normally use either a whiteboard, flip chart or index cards, especially when I'm collaborating on the design with others. Tooling aside, here are a few essays that summarise some guidelines and my own approach to designing software.
- Start with the big picture - every picture should tell a different part of the same story, but where do you start?
- Architectural constructs - what are the building blocks?
- Systems - what is the system landscape?
- Containers - what are the executables that make up the system?
- Components - what are the major components and services?
- Interfaces - do you understand the architecturally significant interfaces?
I don't particularly want to define yet another software design process, but I do want to help people design software simply and effectively. While the guidelines don't replace the need to have a deep technical knowledge and a broad understanding of the available options plus their tradeoffs, they do help people to organise their thinking and design software. I do believe that *some* up-front design is necessary for most software systems but also that it should take hours rather than weeks. I'm sure we'll be refining this sort of content as the year progresses, but please do let us know if you have any feedback.
Architectural constructs
What are the building blocks?
The code for any software system is where most of the focus remains for the majority of the software development life cycle and this makes sense because the code is the ultimate deliverable. But if you had to explain to somebody how that system worked, would you start with the code?
Where do you start?
Slides from my IASA session are available to view and download
I had a fantastic time presenting and discussing software architecture at my IASA session called Where do you start? on Tuesday evening. We went through the things that you should do if tasked with designing a software system given a wish-list and a blank sheet of paper; which covered the requirements (both functional and non-functional), constraints and principles before looking at some guidelines for structuring the actual process of designing software.
The slides from the session are available to view online and download and some additional essays covering some of the same content will be published on the site very soon. Thanks again to Matt Deacon and Onalytica (who have stunning views across London) for organising and hosting the evening.
Upcoming events
Our plans for the next couple of weeks
Just a short post to provide a rundown of the events that we'll be speaking at over the next couple of weeks.
|
Kevin and I are presenting a one-day tutorial at QCon London 2010 called Software Architecture for Developers, which covers the majority of the content from our two-day training course without the practical case-study exercises. Following this, Kevin is presenting a session for the "Cool Stuff with Java" track about OSGi and industrial-strength Swing. |
|
I'm running our two-day training course at Skills Matter in London and, as I mentioned at the start of the year, this has been enhanced to include more content about the actual process of architecture, with a focus on coming up with a high-level software design in a few hours rather than a few weeks. You can preview the slides and there are a few places remaining if you want to come along. |
|
I'm running a session at the IASA UK Chapter that will look at where to start with designing a software system from a blank sheet of paper. The focus here is on pragmatic software design without the need for lots of process or expensive modelling tools. |
|
Finally, I'm running two sessions at DevWeek 2010; one called "Improving quality with an automated build process" (it has a .NET focus) and one called "A developer’s guide to load testing" that covers the basics for evaluating architectures where performance and/or scalability is important. |
It's going to be a busy couple of weeks, but I'm planning on spending some time in the audience at both QCon and DevWeek. If you want to meet or catch-up, feel free to send me an e-mail or a message via Twitter.















