<?xml version="1.0"?>
<rss version="2.0">
<channel>
  <title>Coding the Architecture - data integrity tag</title>
  <link>http://www.codingthearchitecture.com/tags/data integrity/</link>
  <description>Software architecture for developers</description>
  <language>en</language>
  <copyright>Coding the Architecture</copyright>
  <lastBuildDate>Wed, 16 May 2012 08:01:04 GMT</lastBuildDate>
  <generator>Pebble (http://pebble.sourceforge.net)</generator>
  <docs>http://backend.userland.com/rss</docs>
  
  
  <item>
    <title>Data Integrity and System Design</title>
    <link>http://www.codingthearchitecture.com/2011/04/15/data_integrity_and_system_design.html</link>
    
      
        <description>
          &lt;p&gt;Having been involved in several upgrade projects over the last few years, one thing I&#039;ve often noticed is the poor quality of data that can be present in a large and long running system. This can present problems for upgrading and usually means that you have to spend quite some time fixing the data first.
 
&lt;p&gt;Upgrading is difficult and causes regression tests to fail as:
 
&lt;li&gt; The new system may have more data checking and refuse the old data.
&lt;li&gt; The new system may be more precise e.g. not rounding a number, taking a sign into account etc.
&lt;li&gt; The new system may be using data that was not used before e.g. data entry staff not bothering to enter a product&#039;s weight into a form as &#039;it is never used&#039;.
&lt;li&gt; Years of copy and paste can leave a vast amount of junk that fail consistency checks.
 
&lt;p&gt;After you have corrected the data for upgrade, the original system has much higher quality data and other issues and inconsistencies have been solved. In a recent system we also saw large performance improvements due to duplicate and junk data being removed. On another system we saved the operations staff may hours work a week as the data improvements meant a large number of post report corrections were no longer needed.
 
&lt;p&gt;So why isn&#039;t this analysis done on a regular basis to help keep a system healthy? The main reason is simply that it&#039;s just too hard for the operations staff to do. Therefore when you&#039;re designing a system you should take this into account and enable these kinds of maintenance tasks. This involves reporting and having tools that can correct sets of problematic data.
 
&lt;p&gt;Some things to consider:
 
&lt;li&gt; How easy is it to identity and delete orphaned data i.e. If you can&#039;t get to some data is it required?
&lt;li&gt; Can a user identity data that has not been used for a long time? Can they then archive it?
&lt;li&gt; Can you identify identical or similar data? A common example is user information that differs only by capitalisation e.g. an address.
&lt;li&gt; Can the user run arbitrary consistency checks that go beyond the database rules? E.g. I&#039;ve recently written a tool to allow an operations manager to run xpaths over data to check for bad bookings.
&lt;li&gt; Can the user bulk load sets of missing or corrected data?
 
&lt;p&gt;Please don&#039;t rely on database tools to do this as your operational staff probably won&#039;t know how to use them and your DBAs don&#039;t understand the business domain to analyse the data. You need tools at the appropriate level for the appropriate people and consider the complete lifecycle of your product.
        </description>
      
      
    
    
    
    <category>What is the the role of a software architect?</category>
    
    <comments>http://www.codingthearchitecture.com/2011/04/15/data_integrity_and_system_design.html#comments</comments>
    <guid isPermaLink="true">http://www.codingthearchitecture.com/2011/04/15/data_integrity_and_system_design.html</guid>
    <pubDate>Fri, 15 Apr 2011 14:29:43 GMT</pubDate>
  </item>
  
  </channel>
</rss>

