Despite the seemingly high unpopularity of UML these days, it continues to surprise me how many software teams tell me that they use PlantUML. If you've not seen it, PlantUML is basically a tool that allows you to create UML diagrams using text. While the use of text is very developer friendly, PlantUML isn't a modelling tool. By this I mean you're not creating a single consistent model of a software system and creating different UML diagrams (views) based upon that model. Instead, you're simply creating a single UML diagram at a time, which means that you need to take responsibility for the consistent naming of elements across diagrams.
Some teams I've worked with have solved this problem by writing small applications that generate the PlantUML diagram definitions based upon a model of their software systems, but these tend to be bespoke solutions that are never shared outside of the team that created them. Something I've done recently is create a PlantUML exporter for Structurizr. Using PlantUML in conjunction with the Structurizr open source library allows you to create a model of your software system and have the resulting PlantUML diagrams be consistent with that model. If you use the Structurizr's component finder, which uses reflection to identify components in your codebase, you can create component level UML diagrams automatically.
Even if you're not interested in using my Structurizr software as a service, I would encourage you to at least use the open source library to create a model of your software system, extracting components from your code where possible. Once you have a model, you can visualise that model in a number of different ways.
Ransomware is an increasing threat to many organisations - I recently had a conversation with a (non-IT) friend whose employer had been affected, which is why I’m writing this. These are attacks where a system or data are made inaccessible until a ransom is paid. This form of extortion actually dates back to the 1980s but recent variants, such as Crytolocker, are very dangerous and destructive on modern networks.
Often the initial infection is via a phishing email that contains a link to a website, that if clicked, will download the malware. This will scan all files that the user has access to and starts encrypting them. Once the files are encrypted the user will be sent a message telling them of the infection and offering to decrypt in return for payment (usually in bitcoins). Of course the user has no guarantee that their files will be decrypted even if the ransom is paid.
If an individual's machine is infected then they might lose all their personal documents. If they are using remote drives and shares, which have multiple users, then the infection may also lock other people's files. If a user has access to a large number of files across an organisation then this could be devastating.
These are all files that a person has access to. This includes any files used by applications along with documents etc. Therefore if a developer or operational user becomes infected then the systems files they have access to can be affected. It’s very common for technical employees to have access to the files of production servers in order to make issue resolution easy. For example; log files, configuration files, data exports/imports etc.
If the technical users have write access to a mapped drive on a production server then it is trivial for the malware to encrypt these files. This may take down the service (if runtime files are affected) or even destroy the data making the service impossible to run even after a reinstall. Remember that your databases will ultimately have their data stored in files on a disk somewhere.
If people with elevated privileges are infected, you can lose entire systems as well as that person's individual files.
I won't give advice here on Endpoint Protection (antiviruses etc.) as that out-of-scope for this blog but there are many data related actions you should consider with respect to your applications.
Many of you will be reading this and thinking "well we don't allow access as you've described here" but technical staff will setup systems to make their jobs easier. Has your organisation ever performed a data audit and classification? Do you know what files, shares and sections of your network each user has access to? If you haven't then I'd strongly advise you do so - you may be surprised at what you find. There are many commercial and free tools to assist you in doing this.
You should define your users, what groups they are in and what data they have access to. This is good practice anyway (for reasons of privacy, data loss prevention etc) but if you reduce the total number of files accessible than any infection will have less effect.
If someone really needs access to files do they require write access? Log files and configuration files are a perfect example. A user shouldn't be writing to a log file and if they want to change some configuration then they should go through your normal release process rather than hacking it in manually. If you can't release configuration quickly enough, then your release process may be your real issue...
A person shouldn't be using an account used by an application and the applications shouldn't be using personal accounts. Again you may claim this isn't happening but technical users often take shortcuts like this to release quickly (or get around approval processes). A good audit should pick up on this.
It's tempting (for ease of management) to create a single account and get all applications to run as this account. If this account is compromised then all data for all applications are vulnerable. Use specific accounts for applications to reduce lateral movement between systems.
If a login account is used to run a web browser or email then it should have restricted permissions. Likewise any administrative account should not be able to run a web browser or email. Separate the concerns!
How do you backup your data? If you are using online backups, that are accessible to an infected user, then all your backups may get corrupted too! Maybe you should consider using WORM (write once read many) technology or at least use separate processes to move and permission backups appropriately once they have been taken.
Some malware may be stealthy and stay on your system for a long time before making itself known. Therefore incremental backups can be corrupted far back in time. Make sure you regularly test your restoration processes too.
It's important to remember that your data is the most important part of your application and valuable to your organisation. If something has value then nefarious parties can seek to take advantage of this. It's hard to stop some attacks but you can minimise the damage if you are attacked.
The architecture of a system should take into account where data is stored, how it is permissioned and who/what has access to it. It's very easy to become obsessed with the latest design patterns but basic data management is important and shouldn't be forgotten.
I rolled out a new feature to Structurizr at the weekend called Structurizr Express, which is basically a way to create software architecture diagrams using text. Although the core concept behind Structurizr is to create a software architecture model using code, there are times when you simply want a quick diagram, perhaps for a presentation, pre-sales proposal, etc. Structurizr Express will let you do just that - quickly create a single software architecture diagram using a textual definition. Much like tools such as PlantUML, yUML, WebSequenceDiagrams, etc.
Despite the name, this is all still based around the C4 model although it only targets one diagram at a time. The three types of diagrams currently supported are System Context, Container and Component diagrams. Structurizr Express is available to use now and the help page provides a description and examples of the syntax. I hope you find it useful.
"We value working software over comprehensive documentation" is what the manifesto for agile software development says. I know it's now a cliche, but the typical misinterpretation of these few words is "don't write documentation". Of course, that's not actually what the manifesto says and "no documentation" certainly wasn't the intent. To be honest, I think many software teams never produced or liked producing any documentation anyway, and they're now simply using the manifesto as a way to justify their approach. What's done is done, and we must move on.
One of the most common questions I get asked is how to produce "agile documentation", specifically with regards to documenting how a software system works. I've met many people who have tried the traditional "software architecture document" approach and struggled with it for a number of reasons, irrespective of whether the implementation was a Microsoft Word document or a wiki like Atlassian Confluence. My simple advice is to think of such documentation as being supplementary to the code, describing what you can't get from the code alone.
Readers of my Software Architecture for Developers ebook will know that I propose something akin to a travel guidebook. Imagine you arrive in a new city. Without any maps or a sense of direction, you'll end up just walking up and down every street trying to find something you recognise or something of interest. You can certainly have conversations with the people who you meet, but that will get tiring really quickly. If I was a new joiner on an existing software development team, what I'd personally like is something that I can sit down and read over a coffee, perhaps for an hour or so, that will give me a really good starting point to jump into and start exploring the code.
Although the content of this document will vary from team to team (after all, that's the whole point of being agile), I propose the following section headings as a starting point.
The definitions of these sections are included in my ebook and they're now available to read for free on the Structurizr website (see the hyperlinks above). This is because the next big feature that I'm rolling out on Structurizr is the ability to add lightweight supplementary documentation into the existing software architecture model. The teams I work with seem to really like the guidebook approach, and some even restructure the content on their wiki to match the section headings above. Others don't have a wiki though, and are stuck using tools like Microsoft Word. There's nothing inherently wrong with using Microsoft Word, of course, in the same way that using Microsoft Visio to create software architecture diagrams is okay. But it's 2016 and we should be able to do better.
The basic premise of the documentation support in Structurizr is to create one Markdown file per guidebook section and to link that with an appropriate element in the software architecture model, embedding software architecture diagrams where necessary. If you're interested to see what this looks like, I've pushed an initial release and there is some documentation for the techtribes.je and the Financial Risk System that I use in my workshops. The Java code and Markdown looks like this.
Even if you're not using Structurizr, I hope that this blog post and publishing the definitions of the sections I typically include in my software architecture documentation will help you create better documentation to complement your code. Remember, this is all about lightweight documentation that describes what you can't get from the code and only documenting something if it adds value.
This blog post is a follow-up to the discussions I've had with people after my recent Modular Monoliths talks. I've been enthusiastically told that the "ports & adapters" (hexagonal) architectural style is "vastly", "radically" and "hugely" different to a traditional layered architecture. I remain unconvinced, hence this blog post, which has a Java spin, but I'm also interested in how the concepts map to other programming languages. I'm also interested in exploring how we can better structure our code to prevent applications becoming big balls of mud. Layers are not the only option.
Imagine you're building a simple web application where users interact with a web page and information is stored in a database. The UML class diagrams that follow illustrate some of the typical ways that the source code elements might be organised.
Let's first list out the types in the leftmost diagram:
I'll talk about the use of interfaces later, but let's assume we're going to use interfaces for the purposes of dependency injection, substitution, testing, etc. Now let's look at the four UML class diagrams, from left to right.
On the face of it, these do all look like different ways to organise code and, therefore, different architectural styles. This starts to unravel very quickly once you start looking at code examples though. Take a look at the following example implementations of the ports & adapters style.
Spot anything? Yes, the interface (port) and implementation class (adapter) are both public. Most of the code examples I've found on the web have liberal usage of the public access modifier. And the same is true for examples of layered architectures. Marking all types as public means you're not taking advantage of the facilities that Java provides with regards to encapsulation. In some cases there's nothing preventing somebody writing some code to instantiate the concrete repository implementation, violating the architecture style. Coaching, discipline, code reviews and automated architecture violation checks in the build pipeline would catch this, assuming you have them. My experience suggests otherwise, especially when budgets and deadlines start to become tight. If left unchecked, this is what can turn a codebase into a big ball of mud.
Looking at this another way, when you make all types in your application public, the packages are simply an organisation mechanism (a grouping, like folders) rather than being used for encapsulation. Since public types can be used from anywhere in a codebase, you can effectively ignore the packages. The net result is that if you ignore the packages (because they don't provide any means of encapsulation and hiding), a ports & adapters architecture is really just a layered architecture with some different naming. In fact, if all types are public, all four options presented before are exactly the same.
Conceptually ports & adapters is different from a traditional layered architecture, but syntactically it's really the same, especially if all types are marked as public. It's a well implemented n-layer architecture, where n is the number of layers through a slice of the application (e.g. 3; web-domain-database).
The way Java types are placed into packages can actually make a huge difference to how accessible (or inaccessible) those types can be when Java's access modifiers are applied appropriately. Ignoring the controllers ... if I bring the packages back and mark (by fading) those types where the access modifier can be made more restrictive, the picture becomes pretty interesting.
The use of Java's access modifiers does provide a degree of differentiation between a layered architecture and a ports & adapters architecture, but I still wouldn't say they are "vastly" different. Bundling the types into a smaller number of packages (options 3 & 4) allows for something a little more radical. Since there are fewer inter-package dependencies, you can start to restrict the access modifiers. Java does allow interfaces to be marked as package protected (the default modifier) although if you do this you'll notice that the methods must still be marked as public. Having public methods on a type that's inaccessible outside of the package is a little odd, but it's not the end of the world.
With option 3, "vertical slicing", you can take this to the extreme and make all types package protected. The caveat here is that no other code (e.g. web controllers) outside of the package will be able to easily reuse functionality provided by the CustomerService. This is not good or bad, it's just a trade-off of the approach. I don't often see interfaces being marked as package protected, but you can use this to your advantage with frameworks like Spring. Here's an example from Oliver Gierke that does just this (the implementation is created by the framework). Actually, Oliver's blog post titled Whoops! Where did my architecture go, which is about reducing the number of public types in a codebase, is a recommended read.
I'm not keen on how the presentation tier (CustomerController) is coupled in option 3, so I tend to use option 4. Re-introducing an inter-package dependency forces you to make the CustomerComponent interface public again, but I like this because it provides a single API into the functionality contained within the package. This means I can easily reuse that functionality across other web controllers, other UIs, APIs, etc. Provided you're not cheating and using reflection, the smaller number of public types results in a smaller number of possible dependencies. Options 3 & 4 don't allow callers to go behind the service, directly to the DAO. Again, I like this because it provides an additional degree of encapsulation and modularity. The architecture rules are also simpler and easier to enforce, because the compiler can do some of this work for you. This echoes the very same design principles and approach to modularity that you'll find in a modern microservices architecture: a remotable service interface with a private implementation. This is no coincidence. Caveats apply (e.g. don't have all of your components share a single database schema) but a well-structured modular monolith will be easier to transform into a microservices architecture.
In the spirit of YAGNI, you might realise that some of those package protected DAO interfaces in options 3 and 4 aren't really necessary because there is only a single implementation. This post isn't about testing, so I'm just going to point you to Unit and integration are ambiguous names for tests. As I mention in my "Modular Monoliths" talk though, I think there's an interesting relationship between the architecture, the organisation of the code and the tests. I would like to see a much more architecturally-aligned approach to testing.
I've had the same discussion about layers vs ports & adapters with a number of different people and opinions differ wildly as to how different the two approaches really are. A Google search will reveal the same thing, with numerous blog posts and questions on Stack Overflow about the topic. In my mind, a well implemented layered architecture isn't that different to a hexagonal architecture. They are certainly conceptually different but this isn't necessarily apparent from the typical implementations that I see. And that raises another interesting question: is there a canonical ports & adapters example out there? Of course, module systems (OSGi, Java 9, etc) change the landscape because they allow us to differentiate between public and published types. I wonder how this will affect the code we write and, in particular, whether it will allow us to build more modular monoliths. Feel free to leave a comment or tweet me @simonbrown with any thoughts.
Regular readers will already know about Structurizr - a set of open source libraries to create a software architecture model as code, plus a SaaS product to visualise those models. Having created and helped create a number of models with Structurizr now, I've noticed an interesting side-effect. In the absence of architectural information being present in the code, the power of using something like Structurizr to define a software architecture model using code is in extracting information algorithmically, by codifying the rules that you've ultimately used to structure your codebase.
Let me give you an example. Imagine you're building a web-MVC web application in Java, C#, etc and you have a tens or hundreds of controller classes, each of which uses a number of other components to implement some functionality. Drawing a single diagram to visualise the static structure of the entire web application is a bad idea because it shows too much information. A better approach is to create one view per vertical slice, where there could be one vertical slice per web controller. This results in smaller, simpler diagrams like this.
So far so good, and this is relatively easy to do using static analysis techniques. But you'll notice this diagram includes an "Authenticated User", which isn't part of the code itself. This raises the question of how the user ends up getting included on the diagram. There are a number of options:
The ability to codify the rules you've used to organise the controllers in your codebase obviously depends on how much thought you've put into doing this. For example, did you dump all of these controller classes into a single package or namespace without giving it much thought at all? Or perhaps you took Martin Fowler's advice and modularised further, creating one package/namespace per functional area or aggregate root, for example. Another possibility is that you grouped controllers together based upon whether unauthenticated users, authenticated users or other software systems are using them. Organising your code well provides you with another angle to extract architectural information, because you can codify rules such as, "the Anonymous User uses all controllers in the com.mycompany.mywebapp.unsecured package/namespace".
With hindsight this is fairly obvious, but we often don't put enough thought into how we organise our code, possibly because we perceive that it doesn't actually matter that much and modern IDEs provide powerful ways to navigate large and/or complex codebases. Trying to codify the rules used to organise a codebase certainly gets you thinking, and often refactoring too.
The initial version of Structurizr was targeted at the Java ecosystem (see "Structurizr for Java"), for no other reason than it's what I'm most familiar with. Although this works for a good portion of the organisations that I visit when doing training/consulting, an equally sized portion use the Microsoft stack. For this reason, I've put together Structurizr for .NET, which is more or less a direct port of the Java version, with some automatically generated code from Swagger used as a starting point. It's by no means "feature complete" yet, especially since none of the component finder code (the part that extracts components automatically from a codebase) is present, but there's enough to create some basic diagrams. Here's some example code that creates a software model for the "Financial Risk System" case study that I use in my workshops.
It creates the following Context, Container and Component diagrams.
I'm pleased to say I'll be in the United States next month for the DevNexus 2016 conference that is taking place in Atlanta, GA. In addition to a number of talks about software architecture, I'll also be running my popular "The Art of Visualising Software Architecture" workshop. Related to the (free) book with the same name, this hands-on workshop is about improving communication and specifically software architecture diagrams. We'll talk about UML and some anti-patterns of "boxes and lines" diagrams, but the real focus is on my "C4 model" and some lightweight techniques for communicating software architecture. The agenda and slides for this 1-day workshop are available online. I hope you'll be able to join me.
Happy new year and I wish you all the best for 2016. My first trip of the year starts next week and I'll be doing some work in Shenzhen, China. As a result, I'll also be in Hong Kong on January 15th, presenting "The Art of Visualising Software Architecture" at a meetup organised by Agile Hong Kong. You can register on the Meetup page. See you there!
p.s. If anybody would like a private, in-house 1-day software architecture sketching workshop on the 15th, please drop me a note.
Last week I gave a presentation titled "2015 - A CyberSecurity Year" to the London Java Community's Open Conference. I like to present at the LJC's Open Conference on whatever topic has occupied the majority of my time in the previous year. This is partly because it's always advisable to "present on what you know" but also as a cathartic exercise to vent my frustrations! The slides can be found here but won't make much sense without the following context.
This year a huge amount of my time has been spent on Cybersecurity concerns as 2015 was the year these issues were forced into everyone's mind. The threats have been increasing for several years but many high profile (and often salacious) events mean that the press, and therefore the public, have realised this is a serious issue. The first part of my presentation described what had happened over the last few years to cause the current situation.
Once the press and the public have a concern, the politicians will pick up on it and this means... laws, regulation, 'guidelines' and consequences. The second part of the presentation discussed the large range of regulators and regulations that have been (and are still being) created. These can be complex, incomplete and sometimes contradictory. I only touched the surface of what is happening. Interestingly a few of the people watching had similar concerns and experiences but many seemed unaware of even the most basic provisions of the Data Protection Act - I suspect this could be a HUGE business risk in the next few years.
These concerns have led to an increase in actual and planned expenditure (including large announcements from governments) but many in the group expressed doubts on how effectively they would be spent.
Lots of money means... lots of companies offering products and services! We spent a while discussing some of these and again, there was concern about their maturity and effectiveness.
So to bring this back on topic! Security (whether at application, system or data level) is a highly complex quality attribute. It is also constantly changing. A good architecture will take the current security concerns into account and provide foundations for not only providing this now but also for solving future issues. You not only have to address the threat but also do this is a legally compliant way. It is possible to be secure but still in breach of the law.
It is also a concern throughout the system and cannot be considered in isolation. If you are writing an application you need to think about all the services you rely upon, the sources and destination of your data and most importantly the people using it. Your developers, system administrators, database administrators and operational teams need to communicate with each other on these issues.
Good luck, I'm sure it'll all be different by the end of 2016!