<?xml version="1.0"?>
<rss version="2.0">
<channel>
  <title>Coding the Architecture - monitoring tag</title>
  <link>http://www.codingthearchitecture.com/tags/monitoring/</link>
  <description>Software architecture for developers</description>
  <language>en</language>
  <copyright>Coding the Architecture</copyright>
  <lastBuildDate>Mon, 21 May 2012 09:41:00 GMT</lastBuildDate>
  <generator>Pebble (http://pebble.sourceforge.net)</generator>
  <docs>http://backend.userland.com/rss</docs>
  
  
  <item>
    <title>Adding the fuel gauge</title>
    <link>http://www.codingthearchitecture.com/2009/04/16/adding_the_fuel_gauge.html</link>
    
      
        <description>
          &lt;p&gt;
One of my talks at the recent DevWeek conference was about the &lt;a href=&#034;http://www.codingthearchitecture.com/2009/03/30/pitfalls_for_software_architects.html&#034;&gt;pitfalls software architects face&lt;/a&gt; and I covered some of the problems associated with technology selection. Probably *the* biggest problem is vendor marketing and hype, with many project teams simply taking this at face value. Sometimes a piece of technology will do exactly what it says on the tin, but sometimes it won&#039;t. There are truly no silver bullets and every technology, large or small, has trade-offs. You&#039;ve probably seen this yourself at some point. Vendors (open source and commercial) promising features that they haven&#039;t yet implemented through to bold claims about performance or scalability. Depending on your project context, these promises can often make or break your project.
&lt;/p&gt;

&lt;p&gt;
One of the analogies that I made during the session was about the fuel consumption figures quoted by car manufacturers in their glossy brochures. Let&#039;s imagine that you need to travel from one side of the country to the other, work out the mileage and then buy or rent a car based upon the fuel consumption figures quoted in a brochure. The quoted consumption figures are usually based on some optimum conditions but real world figures will vary according to the way that you drive, the speed you drive, the ambient temperature, the gradient, the road surface and so on. Depending on all of this, you may or may not achieve your goal.
&lt;/p&gt;

&lt;p&gt;
When we undertake a technology selection exercise, we&#039;ll typically evaluate candidates against a set of criteria and choose the one that we think best suits our needs before plugging it in to our projects. Not testing the technology before adoption is the same as driving a car across the country - you&#039;re relying on somebody else&#039;s claims and it might not go as expected. Literally, your mileage may vary!
&lt;/p&gt;

&lt;p&gt;
Of course, the key difference is that you get a fuel gauge in a car that provides you with constant feedback of how much fuel remains in the tank. In addition, newer cars have onboard computers that can provide you with real-time consumption figures and estimate the mileage remaining. This is all information and it provides a way to monitor what is happening so that you can adjust (or fill up!) as necessary. Laptop batteries are the same. The manufacturers quote maximum battery life figures and while you might not get that in real world usage, you do get to see how much battery life is remaining.
&lt;/p&gt;

&lt;p&gt;
With this in mind, it&#039;s worth thinking about why we don&#039;t usually add fuel gauges to our own software systems. These systems are usually composed of many complex technologies, each of which makes its own claims and has its own trade-offs. Yet we often deploy and run our systems as a black box. Often this will work but sometimes it won&#039;t. And worse still, without a fuel gauge you have no idea when your system will grind to a crawl or stop working completely.
&lt;/p&gt;

&lt;p&gt;
Adding a monitoring capability is fairly easy to do and can give you important insight into the health of your software. For example, it might allow you to monitor how many database connections are being used, or how many messages are waiting to be processed, or how many worker threads are busy servicing user requests. &lt;a href=&#034;http://www.codingthearchitecture.com/2007/11/09/monitoring_java_systems.html&#034;&gt;Here are some thoughts on how to cater for monitoring in your architecture&lt;/a&gt;, and they&#039;re particularly relevant if you&#039;re building Java applications.
&lt;/p&gt;

&lt;p&gt;
As with cars and laptops, there are benefits to be had by adding some simple feedback devices to our software systems. After all, wouldn&#039;t it be great if you could understand the health of your system and proactively deal with problems before they become major issues?
&lt;/p&gt;
        </description>
      
      
    
    
    
    <category>How do you define software architecture?</category>
    
    <comments>http://www.codingthearchitecture.com/2009/04/16/adding_the_fuel_gauge.html#comments</comments>
    <guid isPermaLink="true">http://www.codingthearchitecture.com/2009/04/16/adding_the_fuel_gauge.html</guid>
    <pubDate>Thu, 16 Apr 2009 06:22:15 GMT</pubDate>
  </item>
  
  <item>
    <title>The Other Interface</title>
    <link>http://www.codingthearchitecture.com/2009/02/07/the_other_interface.html</link>
    
      
        <description>
          &lt;p&gt;
One of the most succinct definitions of a technical architect is: a technologist who is responsible for a system meeting its Non-Functional Requirements.
&lt;/p&gt;&lt;p&gt;
What are often perceived as the most interesting NFRs relate to performance, stability and availability.  However, recently I&#039;ve been paying a lot of attention to perhaps the least glamorous of all the non-functionals: supportability.  In a mature system, the lion&#039;s share of the time it takes to fix a fault is taken up by diagnosing where the fault lies.  Once you&#039;ve diagnosed it, fixing it is often trivial (testing the fix less so, but that&#039;s a discussion for another day).
&lt;/p&gt;&lt;p&gt;
So how do you decrease this diagnosis time?  It boils down to logging and monitoring.  There are some excellent monitoring tools available, and I&#039;ve seen some good home-grown applications, which provide a very informative real-time view of what&#039;s going on under the hood of a process (for Java systems, JMX greatly facilitates rolling your own, although you get a lot out of the box with Sun&#039;s Java distribution these days).  Historical concerns about monitoring tools slowing processes down have all but disappeared: such tools are used on the most latency-sensitive of trading systems.  While it&#039;s relatively easy to recognise a good monitoring tool, a good approach to logging is less self-evident.
&lt;/p&gt;&lt;p&gt;
I&#039;ve encountered dramatically different views on application logging: ranging from the view that the log of a healthy long-running process should be short and readable, no bigger than one screen from top to bottom, to the view that a log file should be exhaustive, often gigabytes in size, and carefully designed post-processing scripts (yes, not just grep) can be used to build a picture of what was going on in the process at a given point in time, or in response to a given event.
&lt;/p&gt;&lt;p&gt;
The best approach will depend on the nature of your system and how it is supported.  I&#039;m currently working on a system supported by several different teams; the development team forms a third or even fourth level of support.  Therefore what the system dumps out in its logs feeds into human processes: messages logged at Error level should require manual intervention and possibly escalation to the next level of support, whereas warnings and below should be ignorable.
&lt;/p&gt;&lt;p&gt;
Everyone who can change the code needs to be aware of this, therefore a logging policy needs to be defined, published and enforced.  Ideally this policy will make your system as close to self-diagnosing as possible.  When this has not been the case, the black art of knowing which errors can be ignored, or where to look if a process fails with no log information at all, can hugely increase the support costs of the system.  It affects the speed of resolution of support incidents, increases the learning curve of new joiners in the team, makes testing more difficult, and reduces software quality by hiding or delaying the discovery of bugs.
&lt;/p&gt;&lt;p&gt;
If there is one approach which is relevant to all logging policies, it is don&#039;t cry wolf, and don&#039;t die quietly.  To put it another way:
&lt;ul&gt;
&lt;li&gt;
Messages logged as Errors / Severe / Fatal should actually be problems with the system, and should not be ignorable.
&lt;/li&gt;&lt;li&gt;
When the system fails, if there is scope in the code to log the current state, this should be done whenever possible.
&lt;/li&gt;
&lt;/ul&gt;
&lt;/p&gt;&lt;p&gt;
This may sound obvious, but I&#039;m finding out, to my expense, that applying even such a simple logging policy to a mature system after the fact can be very costly.
&lt;/p&gt;&lt;p&gt;
There&#039;s perhaps no right answer as to what makes an ideal application log, however there are many wrong answers.  The worst of all is to ignore this interface to your system.  So define your logging standard at the same time as you define your other non-functional requirements, and similarly enforce it as the system evolves.
&lt;/p&gt;

        </description>
      
      
    
    
    
    <category>How do you define software architecture?</category>
    
    <comments>http://www.codingthearchitecture.com/2009/02/07/the_other_interface.html#comments</comments>
    <guid isPermaLink="true">http://www.codingthearchitecture.com/2009/02/07/the_other_interface.html</guid>
    <pubDate>Sat, 07 Feb 2009 14:53:00 GMT</pubDate>
  </item>
  
  <item>
    <title>Podcast #2 : QCon revisited</title>
    <link>http://www.codingthearchitecture.com/2008/03/25/podcast_2_qcon_revisited.html</link>
    
      
        <description>
          &lt;p&gt;
&lt;a href=&#034;http://www.codingthearchitecture.com/2008/03/06/the_first_cta_podcast.html&#034;&gt;As promised&lt;/a&gt; the 2nd CTA podcast is a roundtable discussion between some of the CTA contributors - namely &lt;a href=&#034;http://www.codingthearchitecture.com/authors/sbrown/&#034;&gt;Simon Brown&lt;/a&gt;, &lt;a href=&#034;http://www.codingthearchitecture.com/authors/sdalton/&#034;&gt;Sam Dalton&lt;/a&gt; and &lt;a href=&#034;http://www.codingthearchitecture.com/authors/kseal/&#034;&gt;Kevin Seal&lt;/a&gt;. In this podcast we discuss some of the themes emerging from the recent QCon conference held in London and our views on those themes.
&lt;/p&gt;

&lt;p&gt;
The podcast can be downloaded &lt;a href=&#034;http://static.codingthearchitecture.com/podcasts/podcast-march2008.mp3&#034;&gt;here&lt;/a&gt; or you can &lt;a href=&#034;http://www.codingthearchitecture.com/tags/podcast/rss.xml&#034;&gt;subscribe&lt;/a&gt;. Full show notes follow below:
&lt;/p&gt;

&lt;h3&gt;1. Introduction&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;QCon Conference London detail can be found &lt;a href=&#034;http://jaoo.dk:80/london-2008/conference/&#034;&gt;here&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;2. Monitoring and Management (0:27)&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Roll your own vs. &#034;off the shelf&#034; monitoring&lt;/li&gt;
&lt;li&gt;Asynchronous monitoring&lt;/li&gt;
&lt;li&gt;&lt;a href=&#034;http://www.jinspired.com&#034;&gt;JXInsight&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;The need for an System Architect and an overall understanding of WHAT to monitor and a holistic view&lt;/li&gt;
&lt;li&gt;Including Monitoring NFRs upfront in SAD&lt;/li&gt;
&lt;li&gt;Monitoring the monitoring vs. best effort&lt;/li&gt;
&lt;li&gt;eBay and 1.5TB of monitoring and logfile information per day&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;3. Rehashing Refactoring (11:56)&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Refactoring now mainstream, now emphasizing at the micro-level&lt;/li&gt;
&lt;li&gt;Code-debt - how can you measure it?&lt;/li&gt;
&lt;li&gt;Refactoring is not something to be &#034;saved up&#034;&lt;/li&gt;
&lt;li&gt;Lots of rearchitecture projects sold as refactoring&lt;/li&gt;
&lt;li&gt;Sometimes starting again is a better solution than large scale refactoring&lt;/li&gt;
&lt;li&gt;Why do things end up in such a mess so often?&lt;/li&gt;
&lt;li&gt;Need to get back to writing code for humans to understand&lt;/li&gt;
&lt;li&gt;Worry about emphasis on low-level code optimizations rather than architecture&lt;/li&gt;
&lt;li&gt;Is refactoring its own worst enemy being referred to as a separate discipline to regular software development?&lt;/li&gt;
&lt;li&gt;We are our own worst enemies - we (especially the architects) need to stand up for the quality of our code - stop allowing ourselves to be convinced to make shortcuts&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;4. The Return of the Architect? (25:08)&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Thinking about performance, monitoring etc upfront&lt;/li&gt;
&lt;li&gt;&lt;a href=&#034;http://www.terracotta.org/&#034;&gt;Terracotta&lt;/a&gt;, &lt;a href=&#034;http://www.oracle.com/technology/products/coherence/index.html&#034;&gt;Coherence&lt;/a&gt;, &lt;a href=&#034;http://www.gigaspaces.com/&#034;&gt;Gigaspaces&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#034;http://www.codingthearchitecture.com/2007/03/15/qcon_open_terracotta.html&#034;&gt;Kevin&#039;s views on Terracotta&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;How much performance could you squeeze from a system in 2 weeks with code changes alone?&lt;/li&gt;
&lt;li&gt;Real-time VMs and performance improvements&lt;/li&gt;
&lt;li&gt;The benefits of continuous performance testing&lt;/li&gt;
&lt;li&gt;Premature optimization&lt;/li&gt;
&lt;li&gt;Understanding a system&#039;s deficiencies can be as valuable as fixing them&lt;/li&gt;
&lt;/ul&gt;
        </description>
      
      
    
    
    
    <comments>http://www.codingthearchitecture.com/2008/03/25/podcast_2_qcon_revisited.html#comments</comments>
    <guid isPermaLink="true">http://www.codingthearchitecture.com/2008/03/25/podcast_2_qcon_revisited.html</guid>
    <pubDate>Tue, 25 Mar 2008 10:23:20 GMT</pubDate>
  </item>
  
  <item>
    <title>Monitoring Java systems</title>
    <link>http://www.codingthearchitecture.com/2007/11/09/monitoring_java_systems.html</link>
    
      
        <description>
          &lt;p&gt;
Earlier in the week I wrote about &lt;a href=&#034;http://www.codingthearchitecture.com/2007/11/05/performance_tuning_java_systems.html&#034;&gt;performance tuning Java systems&lt;/a&gt; and I hinted that being able to monitor a system is a really useful first step in proving and diagnosing the cause of performance issues. Of course, there are times where you need a profiler, but that&#039;s a different story.
&lt;/p&gt;

&lt;p&gt;
So back to monitoring and I&#039;m still surprised by the number of mission critical systems, particularly in the financial markets sector, where the only way to monitor the system is by tailing a log file. As for management, well, that&#039;s typically non-existent.
&lt;/p&gt;

&lt;p&gt;
One of the things that I now do when I&#039;m architecting a new system is to always spend a little time on thinking about the system from an operations/support perspective, reflected by including a &#034;Monitoring and Management View&#034; in my architecture documents. Ultimately, you need to work out what the non-functional requirements are, and a good way to do this is to sit down with the support team to run through the following sorts of questions.
&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Who will be supporting the system?&lt;/li&gt;
&lt;li&gt;Where are they located?&lt;/li&gt;
&lt;li&gt;What is their skillset and experience?&lt;/li&gt;
&lt;li&gt;Will remote support be required?&lt;/li&gt;
&lt;li&gt;Do you think you&#039;ll want to reconfigure the system at runtime?&lt;/li&gt;
&lt;li&gt;How much control do you want over the log files?&lt;/li&gt;
&lt;li&gt;Do you have a standard monitoring infrastructure that the system should publish information to? (e.g. Tivoli, BMC, etc)&lt;/li&gt;
&lt;li&gt;Do you have existing scripts/code/etc for sending (for example) SNMP traps?&lt;/li&gt;
&lt;li&gt;What granularity of monitoring do you need?&lt;/li&gt;
&lt;li&gt;What granularity of management do you need?&lt;/li&gt;
&lt;li&gt;Do you need a single dashboard for the entire system?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;
Questions like these are important to ask because you really do need to tailor the monitoring and management to the people that will be supporting the application. Of course, like all functionality, you do need to prioritise monitoring and management features because they do have an associated cost. Similarly, you also need to perform some cost-benefit analysis because it&#039;s no good building a comprehensive web-based administration system for a &lt;a href=&#034;http://www.codingthearchitecture.com/2007/06/27/designing_the_tactical_solution.html&#034;&gt;tactical solution&lt;/a&gt; that only has a 3 month lifepsan, for example.
&lt;/p&gt;

&lt;p&gt;
With Java, a really effective way to monitor and manage applications is using the Java Management Extensions (JMX). As &lt;a href=&#034;http://www.simongbrown.com/blog/2007/01/16/what_can_jmx_do_for_you.html&#034;&gt;I&#039;ve said before&lt;/a&gt;, JMX is one of those technologies that rarely gets a look in, but once you get into it, you&#039;ll become &lt;a href=&#034;http://junit.sourceforge.net/doc/testinfected/testing.htm&#034;&gt;infected&lt;/a&gt;. Using JMX is straightforward enough, but you do need to work out what you want to monitor and manage. With this in mind, I would recommend getting those hooks in as early as you can. Finally, even if you *really* don&#039;t have the time to instrument your code, you can get a certain level of JVM monitoring for *free*, just by &lt;a href=&#034;http://java.sun.com/j2se/1.5.0/docs/guide/management/agent.html#local&#034;&gt;enabling the JMX agent&lt;/a&gt;. Try it, you have no excuse not to!
&lt;/p&gt;
        </description>
      
      
    
    
    
    <category>How do you define software architecture?</category>
    
    <comments>http://www.codingthearchitecture.com/2007/11/09/monitoring_java_systems.html#comments</comments>
    <guid isPermaLink="true">http://www.codingthearchitecture.com/2007/11/09/monitoring_java_systems.html</guid>
    <pubDate>Fri, 09 Nov 2007 11:18:45 GMT</pubDate>
  </item>
  
  </channel>
</rss>

