<?xml version="1.0"?>
<rss version="2.0">
<channel>
  <title>Coding the Architecture - database tag</title>
  <link>http://www.codingthearchitecture.com/tags/database/</link>
  <description>Software architecture for developers</description>
  <language>en</language>
  <copyright>Coding the Architecture</copyright>
  <lastBuildDate>Wed, 16 May 2012 08:01:04 GMT</lastBuildDate>
  <generator>Pebble (http://pebble.sourceforge.net)</generator>
  <docs>http://backend.userland.com/rss</docs>
  
  
  <item>
    <title>To SQL or not to SQL?</title>
    <link>http://www.codingthearchitecture.com/2009/07/21/to_sql_or_not_to_sql.html</link>
    
      
        <description>
          &lt;p&gt;
Eric Lai wrote an interesting article for Computerworld entitled &lt;a href=&#034;http://www.computerworld.com/s/article/9135086/No_to_SQL_Anti_database_movement_gains_steam_&#034;&gt;No to SQL? Anti-database movement gains steam&lt;/a&gt; that highlighted the small but growing trend for not using a traditional relational database for managing data. Nati Shalom&#039;s &lt;a href=&#034;http://natishalom.typepad.com/nati_shaloms_blog/2009/07/no-to-sql-anti-database-movement-gains-steam-my-take.html&#034;&gt;No to SQL? Anti-database movement gains steam - My Take&lt;/a&gt; is a good follow-up.
&lt;/p&gt;

&lt;p&gt;
I&#039;m sure that most of us have good experiences of using a relational database in our projects and, despite the pain of sometimes mapping the data into a relational schema, relational databases provide an easy to use known quantity
for managing data. With this in mind, you should certainly let your &lt;a href=&#034;http://www.codingthearchitecture.com/2008/07/30/experience_should_guide_not_constrain.html&#034;&gt;experience guide you&lt;/a&gt; but bear in mind that relational databases aren&#039;t the answer to every question.
&lt;/p&gt;

&lt;p&gt;
The case study that we use as a basis for the &lt;a href=&#034;http://www.softwarearchitecturefordevelopers.com&#034;&gt;exercises in our software architecture training course&lt;/a&gt; is relatively small yet most people decide to use a relational database to store the data without really giving the alternatives a second thought. Most of the time it comes down to experience in that most people are comfortable with using a relational database in their architecture. But there are many viable alternatives; from using flat files and object databases through to in-memory data structures, data grids and the cloud. So while relational databases might be a solution for your particular problem, it&#039;s always worth spending a couple of minutes assessing whether they are the *best* solution. Here are some things to think about before deciding on whether to go down the SQL or non-SQL route.
&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Do you have special non-functional requirements that would be hard to satisfy with a relational database? (e.g. high performance/low latency, massive scalability, etc)&lt;/li&gt;
&lt;li&gt;What is the available skillset of the team?&lt;/li&gt;
&lt;li&gt;Do you have existing licenses for an RDBMS?&lt;/li&gt;
&lt;li&gt;Could an open source RDBMS be appropriate?&lt;/li&gt;
&lt;li&gt;Do you need to access legacy systems where the data is already in an RDBMS?&lt;/li&gt;
&lt;li&gt;Do you have the hardware available to run an RDBMS?&lt;/li&gt;
&lt;li&gt;Do you already have existing backup and archival processes and procedures for relational databases?&lt;/li&gt;
&lt;li&gt;What are your management information and reporting requirements (scheduled and ad hoc)? Is it possible to satisfy these with a non-SQL solution?&lt;/li&gt;
&lt;li&gt;Do other systems need access to your data via a SQL interface? (a service gateway might be a better approach, but that&#039;s another issue)&lt;/li&gt;
&lt;li&gt;Do you have data migration requirements from an existing relational database?&lt;/li&gt;
&lt;li&gt;Can the data management problem be split up into transactional and non-transactional partitions, maybe using a relational database for only one of them?&lt;/li&gt;
&lt;li&gt;Do you really *need* persistence?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;
Relational databases are established mainstream solutions that are applicable in many cases. Just don&#039;t forget that there are other alternatives too.
&lt;/p&gt;
        </description>
      
      
    
    
    
    <category>How do you define software architecture?</category>
    
    <comments>http://www.codingthearchitecture.com/2009/07/21/to_sql_or_not_to_sql.html#comments</comments>
    <guid isPermaLink="true">http://www.codingthearchitecture.com/2009/07/21/to_sql_or_not_to_sql.html</guid>
    <pubDate>Tue, 21 Jul 2009 10:07:00 GMT</pubDate>
  </item>
  
  <item>
    <title>The joy of sets</title>
    <link>http://www.codingthearchitecture.com/2008/02/13/the_joy_of_sets.html</link>
    
      
        <description>
          &lt;p&gt;
I&#039;ve recently seen impressive performance gains in a data-centric process, which is a generic enough concept to be of general interest.
&lt;/p&gt;&lt;p&gt;
Imagine a system which consolidates the trades done in 10 different branches of a supermarket chain.  We receive these trades on a batch basis: a file per branch every night.  This file contains de-normalised rows of data containing information about a customer (e.g. identified through their loyalty card number), a product, and how much was purchased.  We need to convert this data into a normalised schema where product and customer have their own set of relational tables.  Our database doesn&#039;t store an exhaustive set of every product sold by the supermarket chain, so we only bother to store information about products when they appear the input files.  Let&#039;s also assume we&#039;re doing this by hand through stored procedures -- no fancy ETL tools here.
&lt;/p&gt;&lt;p&gt;
The iterative-programmer&#039;s style of dealing with this problem is likely to be looping through the input file, and having conditional checks on each row, e.g.:
&lt;/p&gt;&lt;p&gt;
For each row in the input file:
&lt;ul&gt;
&lt;li&gt;
if the product doesn&#039;t exist in our database, insert the product
&lt;/li&gt;
&lt;li&gt;
if the customer doesn&#039;t exist, insert the customer
&lt;/li&gt;
&lt;li&gt;
finally, insert the price and quantity which was purchased
&lt;/li&gt;
&lt;/ul&gt;
&lt;/p&gt;&lt;p&gt;
We apply the above algorithm to each file we receive from each supermarket branch we are importing data for, and, because major relational databases pride themselves on their ability to deal with concurrent access, we have a thread (or process) per branch, where each thread imports all its trades in its own transaction.  We kick them off all at once, hence finishing in approximately as much time as it takes to import the branch with the most data.  Right?
&lt;/p&gt;&lt;p&gt;
Wrong.  When certain conditions arise, this process serialises so it takes 10 times longer than expected, as if we&#039;d done each branch one at a time.  (In fact, with naive error handling, this approach may never work due to race conditions).
&lt;/p&gt;&lt;p&gt;
If one product appears for the first time in 2 different branches, let&#039;s say the West London branch and the South Manchester branch,  then whichever branch encounters that product first in the feed file will insert it.  When the other branch then tries to insert this product, it needs to wait for the first branch&#039;s transaction to commit or rollback before knowing if it can create the same row (in our database we naturally have primary key constraints to ensure each product is represented by 1 unique row).  So the second branch waits for the whole of the first branch&#039;s transaction to commit!
&lt;/p&gt;&lt;p&gt;
When you&#039;re writing PL/SQL and you&#039;re thinking at the level of cursors and if-then-else statements, the problems outlined above may not seem obvious.  If you&#039;re purely a Java/C#/C++ programmer you may start thinking along the lines of &#034;well, perhaps we should reduce the size of our transaction&#034;, or &#034;perhaps we can ignore this row and come back to it&#034;.  These are also illustrations of thinking about the problem in the wrong way.
&lt;/p&gt;&lt;p&gt;
The correct approach is not to consider the input data as a succession of rows, but rather as a set of sets: identify the set of products you are importing, the set of customers, and the set of prices and quantities.  Logic can then be applied to different sets: identify the union of new customers, and the union of all new products, and insert these before inserting the price and volume information in a concurrent manner.
&lt;/p&gt;&lt;p&gt;
The overhead of the pre-processing required is quickly recouped by the excellent performance achievable once the problem has been reduced in such a way that the lion&#039;s share of the work is embarrassingly parallel.
&lt;/p&gt;&lt;p&gt;
The original problem here arose again due to a &lt;a href=&#034;http://www.codingthearchitecture.com/2008/01/08/the_clash_of_the_paradigms.html&#034;&gt;paradigm clash&lt;/a&gt;: procedural programming, such as the constructs at your fingertips in languages like PL/SQL, encourage you to think of solutions to problems which are probably better solved using relational algebra, which ANSI SQL captures pretty well.  Of course, PL/SQL, T-SQL and the like will always have their place.  But if you consider yourself an iterative or object oriented programmer, and you find yourself writing stored procedures, the most elegant solutions will often be achieved if you approach the problem thinking in sets, not loops.
&lt;/p&gt;

        </description>
      
      
    
    
    
    <category>How do you define software architecture?</category>
    
    <comments>http://www.codingthearchitecture.com/2008/02/13/the_joy_of_sets.html#comments</comments>
    <guid isPermaLink="true">http://www.codingthearchitecture.com/2008/02/13/the_joy_of_sets.html</guid>
    <pubDate>Wed, 13 Feb 2008 17:30:00 GMT</pubDate>
  </item>
  
  </channel>
</rss>

