Why isn't the architecture in the code?

In response to my System Context diagram as code post yesterday was this question:

I've often asked the same thing and, if the code is the embodiment/implementation of the architecture, this information really should be present in the code. But my experience suggests this is rarely the case.

System context

My starting point for describing a software system is to draw a system context diagram. This shows the system in question along with key user types (e.g. actors, roles, personas, etc) and system dependencies.

I should be able to get a list of user roles from the code. For example, many web applications will have some configuration that describes the various user roles, Active Directory groups, etc and the parts of the web application that they have access too. This will differ from codebase to codebase and technology to technology, but in theory this information is available somewhere.

The key system dependencies are a little harder to extract from a codebase. Again, we can scrape security configuration to identify links to systems such as LDAP and Active Directory. We could also search the codebase for links to known libraries or APIs, and make the assumption that these are a system dependencies. But what about those system interactions that are done by copying a file into a network share? I know this sounds archaic, but it still happens. Understanding inbound dependencies is also tricky, especially if you don't keep track of your API consumers.

Containers

The next level in my C4 model is a container diagram, which basically shows the various web applications, mobile apps, databases, file systems, standalone applications, etc and how they interact to form the overall software system. Again, some of this information will be present, in one form or another, in the codebase. For example, you could scrape this information out of an IDE such as IntelliJ IDEA (i.e. modules) or Visual Studio (i.e. projects). The output from build scripts for code (e.g. Ant, Maven, MSBuild, etc) and infrastructure (e.g. Puppet, Chef, Vagrant, Docker, etc) will probably result in deployable units, which can again be identified and this information used to create the containers model.

Components

The third level of the C4 model is components (or modules, services, layers, etc). Since even a relatively small application may consist of a large number of components, this is a level that we certainly want to automate. But it turns out that even this is tricky. Usually there's a lack of an architecturally-evident coding style, which means you get a conflict between the software architecture model and the code. This is particularly true in older systems where the codebase lacks modularity and looks like a sea of thousands of classes interacting with one another. As Robert Annett suggests, there are a number of strategies that we can use to identify "components" from a codebase though; including annotations/attributes, packaging conventions, naming conventions, module systems (e.g. OSGi), library dependencies and so on.

Auto-generating the software architecture model

Ultimately, I'd like to auto-generate as much of the software architecture model as possible from the code, but this isn't currently realistic. Why?

We face two key challenges here. First of all, we need to get people thinking about software architecture once again so that they are able to think about, describe and discuss the various structures needed to reason about a large and/or complex software system. And secondly, we need to find a way to get these structures into the codebase. We have a way to go but, in time, I hope that the thought of using Microsoft Visio for drawing software architecture diagrams will seem ridiculous.

About the author

Simon is an independent consultant specializing in software architecture, and the author of Software Architecture for Developers (a developer-friendly guide to software architecture, technical leadership and the balance with agility). He’s also the creator of the C4 software architecture model and the founder of Structurizr, which is a collection of open source and commercial tooling to help software teams visualise, document and explore their software architecture.

You can find Simon on Twitter at @simonbrown ... see simonbrown.je for information about his speaking schedule, videos from past conferences and software architecture training.



Re: Why isn't the architecture in the code?

We did use a construct that we called Feature (but they actually are more like a component) to build up our system from them and the classes specifying them to generate a system composition graph. Maybe that is one step into the direction you describe. A short description is here: http://www.planetgeek.ch/2014/06/17/effective-teams-know-your-code/ To library I built for this is https://github.com/ursenzler/ninject.features

some appreciation

I'm coming to this from a Python-oriented perspective, because that's what I currently use. As you probably know, the Python community generally eschews monolithic IDE-based solutions. In that spirit, any partial-solutions I'm envisioning (e.g. if a rendering is helpful, then do it from a small standalone tool, which could subsequently be integrated into IDEs, be they monolithic or ad-hoc like a developer's customised Vim.) I'm loving your recent posts / videos, and feel like these ideas are at least as pertinent for us over here in the Python world. Keep up the good work.

some appreciation

Thanks Jonathan ... it's great to hear that this has some relevance outside of my Java/.NET world. :-)

Add a comment Send a TrackBack