In response to my System Context diagram as code post yesterday was this question:
@simonbrown why is that information not already in the system's code?— Nat Pryce (@natpryce) January 12, 2015
I've often asked the same thing and, if the code is the embodiment/implementation of the architecture, this information really should be present in the code. But my experience suggests this is rarely the case.
My starting point for describing a software system is to draw a system context diagram. This shows the system in question along with key user types (e.g. actors, roles, personas, etc) and system dependencies.
I should be able to get a list of user roles from the code. For example, many web applications will have some configuration that describes the various user roles, Active Directory groups, etc and the parts of the web application that they have access too. This will differ from codebase to codebase and technology to technology, but in theory this information is available somewhere.
The key system dependencies are a little harder to extract from a codebase. Again, we can scrape security configuration to identify links to systems such as LDAP and Active Directory. We could also search the codebase for links to known libraries or APIs, and make the assumption that these are a system dependencies. But what about those system interactions that are done by copying a file into a network share? I know this sounds archaic, but it still happens. Understanding inbound dependencies is also tricky, especially if you don't keep track of your API consumers.
The next level in my C4 model is a container diagram, which basically shows the various web applications, mobile apps, databases, file systems, standalone applications, etc and how they interact to form the overall software system. Again, some of this information will be present, in one form or another, in the codebase. For example, you could scrape this information out of an IDE such as IntelliJ IDEA (i.e. modules) or Visual Studio (i.e. projects). The output from build scripts for code (e.g. Ant, Maven, MSBuild, etc) and infrastructure (e.g. Puppet, Chef, Vagrant, Docker, etc) will probably result in deployable units, which can again be identified and this information used to create the containers model.
The third level of the C4 model is components (or modules, services, layers, etc). Since even a relatively small application may consist of a large number of components, this is a level that we certainly want to automate. But it turns out that even this is tricky. Usually there's a lack of an architecturally-evident coding style, which means you get a conflict between the software architecture model and the code. This is particularly true in older systems where the codebase lacks modularity and looks like a sea of thousands of classes interacting with one another. As Robert Annett suggests, there are a number of strategies that we can use to identify "components" from a codebase though; including annotations/attributes, packaging conventions, naming conventions, module systems (e.g. OSGi), library dependencies and so on.
Ultimately, I'd like to auto-generate as much of the software architecture model as possible from the code, but this isn't currently realistic. Why?
We face two key challenges here. First of all, we need to get people thinking about software architecture once again so that they are able to think about, describe and discuss the various structures needed to reason about a large and/or complex software system. And secondly, we need to find a way to get these structures into the codebase. We have a way to go but, in time, I hope that the thought of using Microsoft Visio for drawing software architecture diagrams will seem ridiculous.
Simon is an independent consultant specializing in software architecture, and the author of Software Architecture for Developers (a developer-friendly guide to software architecture, technical leadership and the balance with agility). He’s also the creator of the C4 software architecture model and the founder of Structurizr, which is a collection of open source and commercial tooling to help software teams visualise, document and explore their software architecture.