<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0">
    <channel>
        <title>SQL - Blogs at Near Infinity</title>


        <link>http://www.nearinfinity.com/blogs/</link>
        <description>Employee Blogs</description>
        <language>en</language>
        <copyright>Copyright 2010</copyright>
        <lastBuildDate>Sun, 04 Oct 2009 13:18:55 -0500</lastBuildDate>
        <generator>http://www.sixapart.com/movabletype/</generator>
        <docs>http://www.rssboard.org/rss-specification</docs>
        
        <item>
            <title>Hive - The next great data warehouse</title>
            <description><![CDATA[In the past few weeks I have been spending more and more time working with Hadoop and Hive.&nbsp; For those of you that don't know what Hadoop is check out what <a href="http://en.wikipedia.org/wiki/Hadoop">wikipedia</a> has to say.&nbsp; Hive is built on top of Hadoop, simply stated is it a SQL engine that submits <a href="http://en.wikipedia.org/wiki/Map_Reduce">map/reduce</a> jobs to Hadoop for execution.<br /><br />So next you ask yourself, "why do I care"?&nbsp; Well with Hive using Hadoop for all the heavy lifting, the amount of data that you can process is only limited by the amount of hardware you have in your cluster.&nbsp; Hive is used for data warehousing which means that it is designed to work on huge datasets, huge joins, huge data loads, huge query results, etc.&nbsp; However before you start thinking about getting rid of that MySQL database, think again.&nbsp; Hive is not and never will be low latency.&nbsp; All queries submit map/reduce jobs to Hadoop which then operates on files stored in HDFS.<br /><br />Hive has a lot of nice features built in, like:<br /><ul><li>It can operate on <i>raw</i> files located in HDFS, like logs from you application, like csv files from your database(s).&nbsp; So this can reduce your load time, because you don't have to actually load it into a database before you can use it.</li><li>It can operate on compressed files.&nbsp; I started using this feature last week because I am getting a 4 to 1 compression ratio with no different in performance (I am using sequence files with block compression).</li><li>In your SQL statements you can actually use the Hadoop streaming api to build your own mapper and reducers, and they don't even have to be written in Java!</li><li>You can also create your own user defined functions, so when you have to do something crazy with the data, you can!</li></ul><br />And there are lots more, so go check it out!<br /><br /><a href="http://wiki.apache.org/hadoop/Hive">Hive</a>, the real Netezza killer.<br />]]></description>
            <link>http://www.nearinfinity.com/blogs/aaron_mccurry/hive_-_the_next_great_data_war.html</link>
            <guid>http://www.nearinfinity.com/blogs/aaron_mccurry/hive_-_the_next_great_data_war.html</guid>
            
                <category domain="http://www.sixapart.com/ns/types#category">Hadoop</category>
            
                <category domain="http://www.sixapart.com/ns/types#category">Java</category>
            
                <category domain="http://www.sixapart.com/ns/types#category">SQL</category>
            
            
                <category domain="http://www.sixapart.com/ns/types#tag">hadoop</category>
            
                <category domain="http://www.sixapart.com/ns/types#tag">hdfs</category>
            
                <category domain="http://www.sixapart.com/ns/types#tag">hive</category>
            
                <category domain="http://www.sixapart.com/ns/types#tag">java</category>
            
                <category domain="http://www.sixapart.com/ns/types#tag">sql</category>
            
            <pubDate>Sun, 04 Oct 2009 13:18:55 -0500</pubDate>
        </item>
        
        <item>
            <title>Finding similar strings using character level bigrams</title>
            <description><![CDATA[<p>Earlier I <a href="http://www.nearinfinity.com/blogs/page/seths?entry=counting_beignets_soundex_levenshtein_or">looked at</a> soundex as a more robust way of comparing strings. Soundex was more robust than case &amp; space normalizing... but it missed somethings I wanted and found some things I didn't want. Here's an example based on a rather limited wine rack:
</p><pre class="prettyprint">CREATE TABLE winerack (
      wine_type TEXT
    , ready DATE
    , soundex_val TEXT
);

INSERT INTO winerack (wine_type, ready, soundex_val)
    VALUES ('cabernet sauvingon', now() + interval '5 years', 'C165');

INSERT INTO winerack (wine_type, ready, soundex_val)
    VALUES ('cabermet sauvingon', now() + interval '6 years', 'C165');

INSERT INTO winerack (wine_type, ready, soundex_val)
    VALUES ('caberet sauvingon', now() + interval '7 years', 'C163');

INSERT INTO winerack (wine_type, ready, soundex_val)
    VALUES ('cabernet franc', now() + interval '8 years', 'C165');
</pre>

<p>Now I'd like to find all of the Cabernets Sauvignon. Using a literal match one bottle shows up. Three bottles show up using soundex... but not the right three bottles:
</p><pre class="prettyprint">bigrams=# SELECT * FROM winerack WHERE soundex_val = 'C165';
     wine_type      |   ready    | soundex_val 
--------------------+------------+-------------
 cabernet sauvingon | 2012-10-08 | C165
 cabermet sauvingon | 2013-10-08 | C165
 cabernet franc     | 2015-10-08 | C165
(3 rows)
</pre>
<p>What happened? Well, soundex picked up the "cabermet" which was good. But it also picked up the Cabernet Franc which was bad. Also, it missed the "cabaret sauvingon".
</p><p>Cabernet Franc was picked up because a soundex value contains at most four consonants. 'C', 'b', 'r', 'n' in this case. Nothing else matters afterwards... shudder to imagine "cabernet sausage" showing up in the search results!
</p><p>Caberet Sauvignon was ignored because soundex doesn't compare strings. It's almost like a really primitive string hashing function. String in, string out. Easy to use though.
</p><p>I approached <a href="http://www.nearinfinity.com/blogs/page/rdonaway">Rob</a> with this and he suggested I give character-level bigrams a try. Basically this finds how similar words are based on how often pairs of characters occur. Rob is being very patient with me as I work through the gory details, but this is my general understanding:
</p><pre class="prettyprint">similarity(left_str, right_str)
    left_sum = dot_product(left_str, left_str);
    right_sum = dot_product(right_str, right_str);
    pair_sum = dot_product(left_str, right_str);
    return pair_sum / square_root(left_sum * right_sum);
</pre>

<p>The moment of truth -- did it outperform soundex?

</p><pre class="prettyprint">bigrams=# select * from bigram_similarities;
      term_a       |      term_b       |    similarity     
-------------------+-------------------+-------------------
 cabernetsauvignon | cabernetsauvignon |                 1
 cabernetsauvignon | cabermetsauvignon |             0.875
 cabernetsauvignon | caberetsauvignon  | 0.903696114115064
 cabernetsauvignon | cabernetfranc     | 0.505181485540923
</pre>

<p>In this contrived case it worked out better. No clue further than that. It is easy to use like soundex:
</p><pre class="prettyprint">bigrams=# select bigram_similarity('cabernet sauvignon', 'cabernet franc');
 bigram_similarity 
-------------------
 0.505181485540923
</pre>
<p>The infrastructure behind this is pretty gory. The code's not really polished and I have some concerns about performance. But here's a really neat way to generate a dot product using SQL:

</p><pre class="prettyprint">CREATE TABLE bigrams ( term TEXT, bigram CHAR(2), instances INT );

SELECT
    SUM(lhs.instances * rhs.instances) AS dot_product
FROM
    bigrams lhs, bigrams rhs
WHERE
    lhs.bigram = rhs.bigram
        AND
    lhs.term = 'cabernet sauvignon'
        AND
    rhs.term = 'cabernet franc'
</pre>]]></description>
            <link>http://www.nearinfinity.com/blogs/seth_schroeder/finding_similar_strings_using_character.html</link>
            <guid>http://www.nearinfinity.com/blogs/seth_schroeder/finding_similar_strings_using_character.html</guid>
            
                <category domain="http://www.sixapart.com/ns/types#category">SQL</category>
            
            
            <pubDate>Mon, 08 Oct 2007 23:49:30 -0500</pubDate>
        </item>
        
        <item>
            <title>Views keep your SQL queries DRY</title>
            <description><![CDATA[<p>A SQL view is a SELECT statement stored in the database which can be used like a table. So... what? Well, some of the benefits include:
</p><p>
  </p><ol>
    <li>Syntax verified before deployment -- avoids errors from queries built at runtime.
    </li><li>The RDBMS knows to expect it and could prepare an execution plan.
    </li><li>Force users to use a view which excludes sensitive table data.
    </li><li>Centralize important, common logic.
  </li></ol>
  
  Number 4 is the strongest argument. It is the Don't Repeat Yourself argument for data. Duplicate code <a href="http://martinfowler.com/bliki/CodeSmell.html">stinks</a>! The fix is defining and calling <cite>"a single, unambiguous, authoritative representation within a system"</cite> (<a href="http://www.pragmaticprogrammer.com/ppbook/extracts/rule_list.html">src</a>). So why not treat data that well? Here's a silly example (in PostgreSQLish):
<p>

</p><pre class="prettyprint">CREATE TABLE pantry (food TEXT, course TEXT, expiry DATE);
INSERT INTO pantry (food, course, expiry)
    VALUES ('crumpets', 'breakfast', now() + interval '2 weeks');
INSERT INTO pantry (food, course, expiry)
    VALUES ('canned ravioli', 'lunch', now() + interval '2 years');
INSERT INTO pantry (food, course, expiry)
    VALUES ('instant noodles', 'dinner', now() + interval '2 years');
INSERT INTO pantry (food, course, expiry)
    VALUES ('fruitcake', 'snack', now() + interval '2000 years');
</pre>
<p>Imagine these meal planning queries:
</p><ul>
  <li><code class="prettyprint">SELECT food FROM pantry WHERE course = 'breakfast' AND expiry > now();</code>
  </li><li><code class="prettyprint">SELECT food FROM pantry WHERE course = 'lunch' AND expiry > now();</code>
  </li><li><code class="prettyprint">SELECT food FROM pantry WHERE course = 'dinner' AND expiry > now();</code>
  </li><li><code class="prettyprint">SELECT food FROM pantry WHERE course = 'snack' AND expiry > now();</code>
</li></ul>
<p>Well, that's pretty gory. No abstraction at all -- <a href="http://www.webpagesthatsuck.com/mysterymeatnavigation.html">mystery meat</a> queries which need unnecessary work to be understood. Two things need to be abstracted:  course and expiry. No one wants stale food, so build that logic first:
</p><pre class="prettyprint">CREATE VIEW fresh_food AS
    SELECT *
    FROM pantry
    WHERE expiry &lt; now();
</pre>
<p>Voici! The per-course queries look like: <code class="prettyprint">SELECT food FROM fresh_food where course = 'foo'</code>. But, why stop abstracting now? Why not this?
</p><pre class="prettyprint">CREATE VIEW breakfast_menu AS
    SELECT *
    FROM fresh_food
    WHERE course = 'breakfast';
</pre>
<p>This example is almost too silly, but one more good point can be squeezed out yet. A key benefit of centralized logic is having one place to make bug fixes. For example, the typo in <code class="prettyprint">fresh_food</code> which returns <i>stale</i> food. Fix the view DML and all client code immediately benefits without change.
</p><p>Unfortunately in practice performance can easily suffer when using (and especially layering) views. 
</p><p>Coming soon -- part 2: Views made my queries SLOW!</p>]]></description>
            <link>http://www.nearinfinity.com/blogs/seth_schroeder/views_keep_your_sql_queries.html</link>
            <guid>http://www.nearinfinity.com/blogs/seth_schroeder/views_keep_your_sql_queries.html</guid>
            
                <category domain="http://www.sixapart.com/ns/types#category">SQL</category>
            
            
            <pubDate>Sat, 22 Sep 2007 02:46:02 -0500</pubDate>
        </item>
        
        <item>
            <title>Create Data Disaster: Avoid Unique Indexes (Mistake 3 of 10)</title>
            <description><![CDATA[<p>I really enjoyed <a href="http://www.nearinfinity.com/blogs/page/seths" dtid="1125899906842629">Seth Schroeder</a>'s <a href="http://www.nearinfinity.com/blogs/page/lrichard?entry=surrogate_keys_data_modeling_mistake#comment1" dtid="1125899906842630">critique</a> of the last post in my ten part data modeling mistake series: <a href="editor-content.html??" dtid="1125899906842631">Surrogate vs Natural Primary Keys</a>. His argument regarding data migration in particular sheds light on a major shortcoming of using surrogate keys: they lead data modelers to a false sense of security regarding the uniqueness of data. Specifically if modelers ignore uniqueness constraints they allow duplicate data. And as Seth points out this has a nasty side effect of disallowing any clear way to compare data between systems. But there are other problems too.</p>
<div style="FLOAT: right; MARGIN-LEFT: 10px">
<script type="text/javascript">
        var currentPageUrl = 'http://rapidapplicationdevelopment.blogspot.com/2007/08/create-data-disaster-avoid-unique.html';

        /* Digg */
        var diggIframe = document.createElement('iframe');
        diggIframe.setAttribute('src', 'http://digg.com/tools/diggthis.php?u=' + currentPageUrl);
        diggIframe.setAttribute('height', '80');
        diggIframe.setAttribute('width', '52');
        diggIframe.setAttribute('frameborder', '0');
        diggIframe.setAttribute('scrolling', 'no');
        diggIframe.setAttribute('style', 'margin-left:auto; margin-right:auto; display:block; text-align:center;');

        /* DotNetKicks */
        var dotnetkicksLink = document.createElement('a');
        dotnetkicksLink.setAttribute('href', 'http://www.dotnetkicks.com/kick/?url=' + currentPageUrl);
        var dotnetkicksImg = document.createElement('img');
        dotnetkicksImg.setAttribute('src', 'http://www.dotnetkicks.com/Services/Images/KickItImageGenerator.ashx?url=' + currentPageUrl);
        dotnetkicksImg.setAttribute('alt', 'Kick this article (a good thing) on DotNetKicks');
        dotnetkicksImg.setAttribute('border', '0');
        dotnetkicksImg.setAttribute('style', 'margin-left:auto; margin-right:auto; display:block; text-align:center;');
        dotnetkicksLink.appendChild(dotnetkicksImg);

        var div = document.createElement('div');
        div.appendChild(diggIframe);
        div.appendChild(document.createElement('br'));
        div.appendChild(dotnetkicksLink);

        document.write(div.innerHTML);
    </script>
</div>
<p dtid="1125899906842632">So, in this post I'll address the uniqueness problem introduced with surrogate keys by way of an example, I'll provide two how-to's, one implementing uniqueness in Visio and one in NHibernate, I'll explain the difference between unique indexes and unique constraints, and finally I'll provide reasons why unique indexes might be overlooked, specifically by providing a critique of ORM tools.</p>
<p dtid="1125899906842633"><b dtid="1125899906842634">Surrogate Keys = Data Disaster?</b></p>
<p dtid="1125899906842635">So as mentioned above the biggest problem with surrogate keys is they lull junior data modelers or lazy developers into thinking they don't need to worry about indexes. But they do; and it's as vital as implementing <a href="http://rapidapplicationdevelopment.blogspot.com/2007/07/referential-integrity-data-modeling.html" dtid="1125899906842636">referential integrity</a>. And for the same reason: data integrity.</p>
<p dtid="1125899906842637">As an example, imagine you're modeling a simple <i dtid="1125899906842638">Country</i> table. You could of course use CountryName as the primary key, but as you know from my post on <a href="http://rapidapplicationdevelopment.blogspot.com/2007/08/in-case-youre-new-to-series-ive.html" dtid="1125899906842639">surrogate keys</a>, you would have problems with varchar join speed (assuming you disagree with Seth that it's a premature optimization) and to a lesser extent cascading updates (since country names do occasionally change).</p>
<p dtid="1125899906842640"><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp2.blogger.com/_gez10dNhuPk/RtdLjeB8eiI/AAAAAAAAAKE/dQr8OWfacPQ/s1600-h/01+-+Still+Bad.jpg" dtid="1125899906842641"><img id="BLOGGER_PHOTO_ID_5104631775376472610" alt="" src="http://bp2.blogger.com/_gez10dNhuPk/RtdLjeB8eiI/AAAAAAAAAKE/dQr8OWfacPQ/s400/01+-+Still+Bad.jpg" border="0" dtid="1125899906842642" /></a></p>
<p dtid="1125899906842643">Introducing a surrogate key (CountryId) resolves these issues, but you also remove an inherent advantage that natural keys have: they require uniqueness in country names. In other words you can now have two New Zealand's and the system wouldn't stop you.</p>
<p dtid="1125899906842644">What's the big deal? <i dtid="1125899906842645">Country</i> seems like a pretty benign table to have duplicates, right? Your users from New Zealand simply have an extra list item in their drop down to pick from and some pick one and some pick the other. </p>
<p dtid="1125899906842646">For <i dtid="1125899906842647">Country</i> one problem comes in reporting. Consider delivering a revenue by <i dtid="1125899906842648">Country</i> report. Your report probably lists New Zealand twice and a quick scan by an exec sees half of the actual revenue for that country that they should. And as a result numerous innocent sheep are slaughtered&nbsp;... uh, or something.</p>
<p dtid="1125899906842649">Another major problem could come in syncing data with other systems. How do those systems know which record to use?</p>
<p dtid="1125899906842650">As you can imagine the problem is even worse with major entities like Customer, Order, Product, or something more scary like Airline Flights. And the longer the system stays in production, the more production data the system collects, the more duplicates rack up, and the more time and money that will be required to clean up the data when the problem is finally identified. In short the bigger the data disaster.</p>
<p dtid="1125899906842651"><b dtid="1125899906842652">How To #1: Visio</b></p>
<p dtid="1125899906842653">So the solution is to add at least one unique constraint (or index) to every single table. In other words if you have a table without a uniqueness constraint chances are very good you''ve done something wrong.</p>
<p dtid="1125899906842654">The good news is that it's pretty easy to implement once you agree it's necessary. If you're modeling with Microsoft Visio this is a six step process:</p>
<p dtid="1125899906842655"><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp2.blogger.com/_gez10dNhuPk/RtdLjeB8ejI/AAAAAAAAAKM/KWTH3-60ixY/s1600-h/02+-+Visio+How+To.jpg" dtid="1125899906842656"><img id="Img1" alt="" src="http://bp2.blogger.com/_gez10dNhuPk/RtdLjeB8ejI/AAAAAAAAAKM/KWTH3-60ixY/s400/02+-+Visio+How+To.jpg" border="0" dtid="1125899906842657" /></a></p>
<ol dtid="1125899906842658">
<li dtid="1125899906842659">Select the table. </li>
<li dtid="1125899906842660">Select the "Indexes" category. </li>
<li dtid="1125899906842661">Click New. </li>
<li dtid="1125899906842662">No need to enter a name, just click OK. </li>
<li dtid="1125899906842663">Select either "Unique constraint only" or "Unique index only" (more on this decision later). </li>
<li dtid="1125899906842664">Double click the column(s) to add. </li></ol>
<p dtid="1125899906842665">Then when you generate or update your database Visio puts in DBMS specific uniqueness constraints. And voila, problem solved.</p>
<p dtid="1125899906842666"><b dtid="1125899906842667">Unique Constraints vs Unique Indexes</b></p>
<p dtid="1125899906842668">The question will come up when using Visio or perhaps using various DBMS's including SQL Server whether to use a unique constraint or unique index. The short answer is that most people use unique constraints, but ultimately they're the same thing so it doesn't matter. </p>
<p dtid="1125899906842669">In case you're interested in the details though here's a quick rundown of the differences:</p>
<p dtid="1125899906842670">Unique Constraint</p>
<ul dtid="1125899906842671">
<li dtid="1125899906842672">A logical construct. </li>
<li dtid="1125899906842673">Defined in the ANSI SQL standard. </li>
<li dtid="1125899906842674">Intent: data integrity. </li>
<li dtid="1125899906842675">Usually part of a table definition. </li></ul>
<p dtid="1125899906842676">Unique Index</p>
<ul dtid="1125899906842677">
<li dtid="1125899906842678">A physical DBMS implementation. </li>
<li dtid="1125899906842679">Not specified in ANSI SQL. </li>
<li dtid="1125899906842680">Intent: performance. </li>
<li dtid="1125899906842681">Usually external to a table definition. </li></ul>
<p dtid="1125899906842682">But since most DBMS's implement unique constraints as unique indexes, it doesn't really matter which you choose.</p>
<p dtid="1125899906842683"><b dtid="1125899906842684">How To #2: NHibernate</b></p>
<p dtid="1125899906842685">Since I have the pleasure of learning the NHibernate ORM tool on my current project, I thought I'd also describe the same technique with a different tool. Basically you can either set the Unique attribute to true to obtain uniqueness in one column, or set the <a href="http://blog.benday.com/archive/2006/04/02/4007.aspx" dtid="1125899906842686">unique-key attribute</a> to obtain uniqueness among multiple columns. If you use NHibernate <a href="http://www.hibernate.org/hib_docs/nhibernate/html/mapping-attributes.html" dtid="1125899906842687">mapping attributes</a> you write:</p><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'" dtid="1125899906842688">
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; LINE-HEIGHT: normal" dtid="1125899906842689"><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'" dtid="1125899906842690">[<span style="COLOR: rgb(43,145,175)" dtid="1125899906842691">Property</span>(NotNull = <span style="COLOR: blue" dtid="1125899906842692">true</span>, Length = 100, Unique = <span style="COLOR: blue" dtid="1125899906842693">true</span>)]<o:p dtid="1125899906842694"></o:p></span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; LINE-HEIGHT: normal" dtid="1125899906842695"><span style="FONT-SIZE: 10pt; COLOR: blue; FONT-FAMILY: 'Courier New'" dtid="1125899906842696">public</span><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'" dtid="1125899906842697"> <span style="COLOR: blue" dtid="1125899906842698">virtual</span> <span style="COLOR: blue" dtid="1125899906842699">string</span> CountryName {<o:p dtid="1125899906842700"></o:p></span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; LINE-HEIGHT: normal" dtid="1125899906842701"><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'" dtid="1125899906842702"><span dtid="1125899906842703"></span><span style="COLOR: blue" dtid="1125899906842704">&nbsp;&nbsp;&nbsp;&nbsp;get</span> { <span style="COLOR: blue" dtid="1125899906842705">return</span> _strCountryName; }<o:p dtid="1125899906842706"></o:p></span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; LINE-HEIGHT: normal" dtid="1125899906842707"><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'" dtid="1125899906842708"><span dtid="1125899906842709"></span><span style="COLOR: blue" dtid="1125899906842710">&nbsp;&nbsp;&nbsp;&nbsp;set</span> { _strCountryName = <span style="COLOR: blue" dtid="1125899906842711">value</span>; }<o:p dtid="1125899906842712"></o:p></span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; LINE-HEIGHT: normal" dtid="1125899906842713"><span style="FONT-SIZE: 10pt; LINE-HEIGHT: 115%; FONT-FAMILY: 'Courier New'" dtid="1125899906842714">}</span><span style="FONT-SIZE: 11pt; LINE-HEIGHT: 115%; FONT-FAMILY: 'Calibri','sans-serif'" dtid="1125899906842715"> </span></p></span>
<p dtid="1125899906842718">Which generates the following hbm:</p><span class="m1"><span style="FONT-SIZE: 10pt; COLOR: #0000ff; LINE-HEIGHT: 115%; FONT-FAMILY: 'Verdana','sans-serif'; mso-fareast-font-family: Calibri; mso-fareast-theme-font: minor-latin; mso-bidi-font-family: 'Times New Roman'; mso-bidi-theme-font: minor-bidi; mso-ansi-language: EN-US; mso-fareast-language: EN-US; mso-bidi-language: AR-SA">&lt;</span></span><span class="t1"><span style="FONT-SIZE: 10pt; COLOR: #990000; LINE-HEIGHT: 115%; FONT-FAMILY: 'Verdana','sans-serif'; mso-fareast-font-family: Calibri; mso-fareast-theme-font: minor-latin; mso-bidi-font-family: 'Times New Roman'; mso-bidi-theme-font: minor-bidi; mso-ansi-language: EN-US; mso-fareast-language: EN-US; mso-bidi-language: AR-SA">class name</span></span><span class="m1"><span style="FONT-SIZE: 10pt; COLOR: #0000ff; LINE-HEIGHT: 115%; FONT-FAMILY: 'Verdana','sans-serif'; mso-fareast-font-family: Calibri; mso-fareast-theme-font: minor-latin; mso-bidi-font-family: 'Times New Roman'; mso-bidi-theme-font: minor-bidi; mso-ansi-language: EN-US; mso-fareast-language: EN-US; mso-bidi-language: AR-SA">="</span></span><b><span style="FONT-SIZE: 10pt; LINE-HEIGHT: 115%; FONT-FAMILY: 'Verdana','sans-serif'; mso-fareast-font-family: Calibri; mso-fareast-theme-font: minor-latin; mso-bidi-font-family: 'Times New Roman'; mso-bidi-theme-font: minor-bidi; mso-ansi-language: EN-US; mso-fareast-language: EN-US; mso-bidi-language: AR-SA">Country</span></b><span class="m1"><span style="FONT-SIZE: 10pt; COLOR: #0000ff; LINE-HEIGHT: 115%; FONT-FAMILY: 'Verdana','sans-serif'; mso-fareast-font-family: Calibri; mso-fareast-theme-font: minor-latin; mso-bidi-font-family: 'Times New Roman'; mso-bidi-theme-font: minor-bidi; mso-ansi-language: EN-US; mso-fareast-language: EN-US; mso-bidi-language: AR-SA">"&gt;</span></span><span style="FONT-SIZE: 10pt; LINE-HEIGHT: 115%; FONT-FAMILY: 'Verdana','sans-serif'; mso-fareast-font-family: Calibri; mso-fareast-theme-font: minor-latin; mso-bidi-font-family: 'Times New Roman'; mso-bidi-theme-font: minor-bidi; mso-ansi-language: EN-US; mso-fareast-language: EN-US; mso-bidi-language: AR-SA"><br /><span class="m1"><span style="COLOR: #0000ff; mso-spacerun: yes">&nbsp; &nbsp; </span><span style="COLOR: #0000ff">&lt;</span></span><span class="t1" style="COLOR: #990000">id name</span><span class="m1" style="COLOR: #0000ff">="</span><b>CountryId</b><span class="m1" style="COLOR: #0000ff">"&gt;&lt;</span><span class="t1" style="COLOR: #990000">generator</span> <span class="t1" style="COLOR: #990000">class</span><span class="m1" style="COLOR: #0000ff">="</span><b>sequence</b><span class="m1" style="COLOR: #0000ff">" /&gt;&lt;/</span><span class="t1" style="COLOR: #990000">id</span><span style="COLOR: #0000ff"><span class="m1">&gt;</span><br /></span><span class="m1"><span style="COLOR: #0000ff; mso-spacerun: yes">&nbsp; &nbsp; </span><span style="COLOR: #0000ff">&lt;</span></span><span class="t1" style="COLOR: #990000">property</span> <span class="t1" style="COLOR: #990000">name</span><span class="m1" style="COLOR: #0000ff">="</span><b>CountryName</b><span class="m1" style="COLOR: #0000ff">"</span><span class="t1" style="COLOR: #990000"> length</span><span class="m1" style="COLOR: #0000ff">="</span><b>100</b><span class="m1" style="COLOR: #0000ff">"</span><span class="t1" style="COLOR: #990000"> not-null</span><span class="m1" style="COLOR: #0000ff">="</span><b>true</b><span class="m1" style="COLOR: #0000ff">"</span><span class="t1" style="COLOR: #990000"> unique</span><span class="m1" style="COLOR: #0000ff">="</span><b>true</b><span style="COLOR: #0000ff"><span class="m1">" /&gt;</span><br /></span><span class="m1" style="COLOR: #0000ff">&lt;/</span><span class="t1" style="COLOR: #990000">class</span><span class="m1" style="COLOR: #0000ff">&gt;</span></span> 
<p dtid="1125899906842786">Which NHibernate turns into the following DDL:</p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; LINE-HEIGHT: normal; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; COLOR: blue; FONT-FAMILY: 'Courier New'; mso-no-proof: yes">create</span><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"> <span style="COLOR: blue">table</span> Country <span style="COLOR: gray">(<?xml namespace="" ns="urn:schemas-microsoft-com:office:office" prefix="o" ?><o:p></o:p></span></span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; LINE-HEIGHT: normal; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"><span style="mso-spacerun: yes">&nbsp;&nbsp; </span>CountryId NUMBER<span style="COLOR: gray">(</span>10<span style="COLOR: gray">,</span>0<span style="COLOR: gray">)</span> <span style="COLOR: gray">not</span> <span style="COLOR: gray">null,<o:p></o:p></span></span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; LINE-HEIGHT: normal; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"><span style="mso-spacerun: yes">&nbsp;&nbsp; </span>CountryName NVARCHAR2<span style="COLOR: gray">(</span>100<span style="COLOR: gray">)</span> <span style="COLOR: gray">not</span> <span style="COLOR: gray">null</span> <span style="COLOR: gray">&lt;</span>b<span style="COLOR: gray">&gt;</span><span style="COLOR: blue">unique</span><span style="COLOR: gray">&lt;/</span>b<span style="COLOR: gray">&gt;,<o:p></o:p></span></span></p>
<p class="MsoNormal" style="MARGIN: 0in 0in 0pt; LINE-HEIGHT: normal; mso-layout-grid-align: none"><span style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"><span style="mso-spacerun: yes">&nbsp;&nbsp; </span><span style="COLOR: blue">primary</span> <span style="COLOR: blue">key</span> <span style="COLOR: gray">(</span>CountryId<span style="COLOR: gray">)<o:p></o:p></span></span></p><span style="FONT-SIZE: 10pt; COLOR: gray; LINE-HEIGHT: 115%; FONT-FAMILY: 'Courier New'; mso-fareast-font-family: Calibri; mso-fareast-theme-font: minor-latin; mso-ansi-language: EN-US; mso-fareast-language: EN-US; mso-bidi-language: AR-SA; mso-no-proof: yes">)</span> 
<p>So quick quiz: was that a unique index or unique constraint it generated? If you answered who cares you're right. However if you answered a unique constraint you're also right.</p>
<p dtid="1125899906842848"><b dtid="1125899906842849">The Problem with ORM</b></p>
<p dtid="1125899906842850">Obviously ignorance of the problem and shortsightedness are two causes for systems going into production without unique indexes, but I'd like to point out a third. While Object Relational Mapping (ORM) tools like NHibernate are extremely convenient for generating database schemas, modeling database tables with classes and generating DDL can lead developers to a false sense of purpose.</p>
<p dtid="1125899906842851">This can occur because ORM tools focus entirely on the world of objects and classes. In this world data's persistence is irrelevant. It exists for the purposes of a single operation, and consequently long term data persistence issues like data integrity are deemphasized. In fact, it would be easy to lose perspective of the fact that there is a database at all.</p>
<p dtid="1125899906842852">Don't get me wrong, the benefits you get like mandatory surrogate keys, DBMS neutrality, lazy loading, and minimal data access code are wonderful. Just don't forget that tags like NHibernate's <i dtid="1125899906842853">unique</i> and <i dtid="1125899906842854">unique-key</i> exist. And are very necessary.</p>
<p dtid="1125899906842855"><b dtid="1125899906842856">Conclusion</b></p>
<p dtid="1125899906842857">To sum it up don't allow the convenience of surrogate keys to lull you into a false sense of security regarding the importance of keys. It's a trap. And it will be a disastrous one if you aren't careful.</p>]]></description>
            <link>http://www.nearinfinity.com/blogs/lee_richardson/create_data_disaster_avoid_unique.html</link>
            <guid>http://www.nearinfinity.com/blogs/lee_richardson/create_data_disaster_avoid_unique.html</guid>
            
                <category domain="http://www.sixapart.com/ns/types#category">General</category>
            
                <category domain="http://www.sixapart.com/ns/types#category">SQL</category>
            
            
                <category domain="http://www.sixapart.com/ns/types#tag">database</category>
            
            <pubDate>Thu, 30 Aug 2007 19:54:53 -0500</pubDate>
        </item>
        
        <item>
            <title>Surrogate Keys - Data Modeling Mistake 2 of 10</title>
            <description><![CDATA[<p>In case you're new to the series I've compiled a list of ten data modeling mistakes that I see over and over that I'm tackling one by one. I'll be speaking about these topics at the upcoming <a href="http://www.iasahome.org/web/capitalarea/itarc2007">IASA conference</a> in October, so I'm hoping to generate some discussion to at least confirm I have well founded arguments.</p>
<div style="FLOAT: right; MARGIN-LEFT: 10px">
<script type="text/javascript">
        var currentPageUrl = 'http://rapidapplicationdevelopment.blogspot.com/2007/08/in-case-youre-new-to-series-ive.html';

        /* Digg */
        var diggIframe = document.createElement('iframe');
        diggIframe.setAttribute('src', 'http://digg.com/tools/diggthis.php?u=' + currentPageUrl);
        diggIframe.setAttribute('height', '80');
        diggIframe.setAttribute('width', '52');
        diggIframe.setAttribute('frameborder', '0');
        diggIframe.setAttribute('scrolling', 'no');
        diggIframe.setAttribute('style', 'margin-left:auto; margin-right:auto; display:block; text-align:center;');

        /* DotNetKicks */
        var dotnetkicksLink = document.createElement('a');
        dotnetkicksLink.setAttribute('href', 'http://www.dotnetkicks.com/kick/?url=' + currentPageUrl);
        var dotnetkicksImg = document.createElement('img');
        dotnetkicksImg.setAttribute('src', 'http://www.dotnetkicks.com/Services/Images/KickItImageGenerator.ashx?url=' + currentPageUrl);
        dotnetkicksImg.setAttribute('alt', 'Kick this article (a good thing) on DotNetKicks');
        dotnetkicksImg.setAttribute('border', '0');
        dotnetkicksImg.setAttribute('style', 'margin-left:auto; margin-right:auto; display:block; text-align:center;');
        dotnetkicksLink.appendChild(dotnetkicksImg);

        var div = document.createElement('div');
        div.appendChild(diggIframe);
        div.appendChild(document.createElement('br'));
        div.appendChild(dotnetkicksLink);

        document.write(div.innerHTML);
    </script>
</div>
<p>The last post in this series <a href="http://rapidapplicationdevelopment.blogspot.com/2007/07/referential-integrity-data-modeling.html">Referential Integrity</a> was probably less controversial than this one. After all, who can argue against enforcing referential integrity? But as obvious as surrogate keys may be to some, there is a good deal of diversity of opinion as evidenced by the fact that people continue to not use them.</p>
<p>I intend to address this topic by way of a fairly ubiquitous example that should draw out all of the arguments. I'll investigate the options for primary keys in a Person table. I'll provide four possible options and explain why each of them is a bad choice. I'll then give four arguments against surrogate keys, which I will then shoot down. So without further ado:</p>
<p><b>Contender 1: Name and Location</b></p>
<p>Of course you can't use <i>name</i> as a primary key for a person, because two people can have the same name and primary keys must be unique. But all too often I've seen databases with multiple, sometime numerous, natural (or business-related) primary keys. These databases combine a field like name with enough other fields such that the likelihood of uniqueness is approaching certainty.</p>
<p>In the case of person this would be equivalent to First and Last Name (wouldn't want to violate <a href="http://en.wikipedia.org/wiki/First_normal_form">first normal form</a> by combining those into one field, but that's a whole other topic), zip code, and we ought to throw address line 1 in to be safe. This is known as either a compound, composite, or multicolumn index.</p>
<p><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp0.blogger.com/_gez10dNhuPk/RsQ5nuB8egI/AAAAAAAAAJ0/1QmW1CiIq_w/s1600-h/1.+Compound+Natural+Keys.jpg"><img id="BLOGGER_PHOTO_ID_5099264032624114178" style="CURSOR: hand" alt="" src="http://bp0.blogger.com/_gez10dNhuPk/RsQ5nuB8egI/AAAAAAAAAJ0/1QmW1CiIq_w/s400/1.+Compound+Natural+Keys.jpg" border="0" /></a></p>
<p>Now our chances of uniqueness are close enough to certain to not warrant discussion, so let's jump right to space and performance. There are three major problems with this approach.</p>
<p>Con 1: Primary key size. The primary key index for person becomes enormous. The database must now catalog four large (probably all varchar) fields. This increases the size of the index which <a href="http://www.mssqlcity.com/Tips/tipInd.htm">increases overhead</a> for insert, delete and update operations and can even decreases read speed because of the <a href="http://www.sql-server-performance.com/tips/composite_indexes_p1.aspx">increased disk I/O</a>.</p>
<p>Con 2: Foreign key size. If you have a child table like PhoneNumber then, as the diagram above shows, the foreign key becomes four columns. Those four columns take up a lot of space again. And now a common query like "Get all phone numbers for a person" involves a full table scan, or, if you throw an index on them you end up with another huge index. In fact, you most likely end up propagating huge indexes and vast amounts of data all over the place like some evil data-cancer.</p>
<p>Con 3: Asthetics. It just isn't pretty. Having four column foreign keys all over the place increases the amount of code you need to write in stored procedures, middle tier, and presentation tier. Even intellisense won't help you with this one.</p>
<p><b>Contender 2: Social Security Number</b></p>
<p>The most obvious choice for a natural key for a person object is social security number, right? Obviously it depends on what type of data person is, but regardless you'll probably face the following four problems with this primary key candidate:</p>
<p>Con 4: Optionality. The social security administration specifies that U.S. citizens <a href="http://ssa-custhelp.ssa.gov/cgi-bin/ssa.cfg/php/enduser/std_adp.php?p_faqid=78">are not required to provide social security numbers</a> in many circumstances. While employment is one of these circumstances, consumers of non-governmental services are definitely not. You can deny service if your consumer won't provide the number, but is your CEO prepared to turn away business based on a data modeling decision you make?</p>
<p>Con 5: Applicability. Only U.S. citizens have a social security number. Your system might only cater to U.S. citizens now, but will it always?</p>
<p>Con 6: Uniqueness. The social security administration "<a href="http://www.slate.com/id/2081843/">is adamant</a>" that the numbers are not recycled, even after someone dies. But eventually the numbers will run out. If you visit the slate article cited above, it calculates this date as in the next century. But the math fails to include the fact that location information is encoded in the number which significantly limits the permutations. I don't know what the real number is, but the point is: you're gambling with how long until a conflict occurs. And even if the time argument fails to sway you, just think of who is assigning the numbers. How much do you trust a government office to not make an occasional mistake?</p>
<p>Con 7: Privacy. Does your application use primary keys in the user interface tier to uniquely identify records? Does it pass primary keys between pages or use them to identify rows in a combo box? You certainly wouldn't store such a thing in a cookie or pass it across the wire unencrypted right? Social security information is sensitive information and privacy zealots care very much how you handle their data. Primary keys are necessarily are closer to end users and harder to hide than regular fields. It just isn't the type of data to take a chance on.</p>
<p><b>Contender 3: E-mail</b></p>
<p>So e-mail is a pretty likely choice right? It's a relatively safe assumption that no two people share an e-mail (maybe). And anyone with a computer has one right? So there should be no uniqueness, privacy or optionality/applicability problems. But how about this:</p>
<p>Con 8: Accidental Denormalization. Do you have more than one e-mail address? Doesn't everyone? Imagine what a pain it would be if Evite only allowed you one e-mail address per person (ok, well if you didn't know it does allow you to consolidate accounts for those of us with multiple e-mail addresses). Even if your system only stores one e-mail address per person now, just think what a pain it would be to change the database to allow N e-mail addresses per person. </p>
<p>No. Wait. Really. Think about it...</p>
<p>Yea&nbsp;... yuck.</p>
<p><b>Contender 4: Username</b></p>
<p>If your users log in with a username, that's a likely candidate for a primary key right? But what if they want to update their username (perhaps it was based on a last name that changed). This leads us to:</p>
<p>Con 9: Cascading Updates. If you have a natural key that might change you'll need to implement some type of cascading updates (whether your DBMS supports it or you write code by hand). In other words, change the username in the person table and you have to change the username foreign key in all child records of the invoices, comments, sales, certifications, defects, or whatever other tables you track. It may not happen often, but when it does it sure will wreak havoc on your indexes. Imagine rebuilding even 10% of your indexes at once because of one operation. It's just unnecessary.</p>
<p>Con 10: Varchar join speed. I left this to last because it applies to all of contenders thus far and is by far the most compelling argument against natural keys. Nine out of ten natural keys are varchar fields. Even an employee number as generated by another system may have a significant zero. It's a fact: joining across tables with varchars is always slower than joining across tables with integers. How much? According to Peter Zaitsev who runs a MySql performance blog it's <a href="http://www.mysqlperformanceblog.com/2007/06/18/using-char-keys-for-joins-how-much-is-the-overhead/">20% to 600% slower</a>. And that's for one join. How many joins do you think comprise an average user interaction? Five? Ten? Twenty? It could very likely make a significant difference to your end user.</p>
<p><b>And The Winner Is </b></p>
<p>So surrogate keys win right? Well, let's review and see if any of the con's of natural key's apply to surrogate keys:</p>
<ul>
<li>Con 1: Primary key size - Surrogate keys generally don't have problems with index size since they're usually a single column of type int. That's about as small as it gets.</li>
<li>Con 2: Foreign key size - They don't have foreign key or foreign index size problems either for the same reason as Con 1.</li>
<li>Con 3: Asthetics - Well, it's an eye of the beholder type thing, but they certainly don't involve writing as much code as with compound natural keys.</li>
<li>Con 4 &amp; 5: Optionality &amp; Applicability - Surrogate keys have no problems with people or things not wanting to or not being able to provide the data.</li>
<li>Con 6: Uniqueness - They are 100% guaranteed to be unique. That's a relief.</li>
<li>Con 7: Privacy - They have no privacy concerns should an unscrupulous person obtain them.</li>
<li>Con 8: Accidental Denormalization&nbsp;- You can't accidentally denormalize non-business data.</li>
<li>Con 9: Cascading Updates - Surrogate keys don't change, so no worries about how to cascade them on update.</li>
<li>Con 10: Varchar join speed - They're generally int's, so they're generally as fast to join over as you can get.</li></ul>
<p>For every natural key con I see a surrogate key pro. But not everyone agrees. Here are some arguments against them.</p>
<p><b>Disadvantage 1: Getting The Next Value</b></p>
<p>Some have argued that getting the next value for a surrogate keys is a pain. Perhaps that's true in Oracle with its sequences, but generally it just takes a couple minutes research, or you can use ORM tools to hide the details for you.</p>
<p><b>Disadvantage 2: Users Don't Understand Them</b></p>
<p>One argument I uncovered is if users were to perform ad-hoc queries on the database they wouldn't be able to understand how to use surrogate keys. </p>
<p>Bunk. Bunk, bunk, bunk. End users shouldn't be fiddling in databases any more than airline customers should be fiddling in airplane engines. And if they are savvy enough to try, then let them learn to perform joins like the pros do.</p>
<p><b>Disadvantage 3: Extra Joins</b></p>
<p>Suppose you have users table with a social security number natural primary key, and a phone number child table with social security as a foreign key. </p>
<p><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp1.blogger.com/_gez10dNhuPk/RsQ5n-B8ehI/AAAAAAAAAJ8/QWuvaB0l86g/s1600-h/2.+Social+Security+Number.jpg"><img id="Img1" style="CURSOR: hand" alt="" src="http://bp1.blogger.com/_gez10dNhuPk/RsQ5n-B8ehI/AAAAAAAAAJ8/QWuvaB0l86g/s400/2.+Social+Security+Number.jpg" border="0" /></a></p>
<p>If your user enters a social security number on a log in screen you could theoretically get their phone numbers without accessing the users table. In a surrogate key world you would have to look up the surrogate key in the person table before getting their phone numbers.</p>
<p>While this is true, I have found that with most <a href="http://en.wikipedia.org/wiki/Create%2C_read%2C_update_and_delete">CRUD</a> applications there are few times when this scenario comes up. The vast majority of queries involve already known surrogate keys. So while this argument may be true in some situations, it just isn't true enough of the time to be significant.</p>
<p><b>Disadvantage 4: Extra Indexes</b></p>
<p>I find this to be the most persuasive argument against natural keys. If your person object would normally have a natural key on social security number, then in surrogate-world you should have a unique index on social security number in addition to your primary key index on the surrogate key. In other words, you now have two indexes instead of one. In fact, if you have N indexes per table in natural key world, you'll always have N + 1 indexes in surrogate key world.</p>
<p>While the additional indexes do indeed add indexes, which increase database size, and slow insert and update performance, you could offset some of that expense by converting your old natural key, social security number for example, to a <a href="http://en.wikipedia.org/wiki/Index_%28database%29">clustered index</a>. </p>
<p>Or you could just relax in the knowledge that there are pro's and con's to every architectural decision and for surrogate keys the pro's outweigh the con's.</p>
<p><b>Summary</b></p>
<p>So now if some well meaning DBA argues to use natural keys on your next project you should have ten arguments against them, which will double as ten arguments for surrogate keys, and you should be prepared with rebuttals for four arguments against surrogate keys. Whew, that was a lot. But I assure you, if you use surrogate keys today it will definitely make your life easier in the long run.</p>]]></description>
            <link>http://www.nearinfinity.com/blogs/lee_richardson/surrogate_keys_data_modeling_mistake.html</link>
            <guid>http://www.nearinfinity.com/blogs/lee_richardson/surrogate_keys_data_modeling_mistake.html</guid>
            
                <category domain="http://www.sixapart.com/ns/types#category">General</category>
            
                <category domain="http://www.sixapart.com/ns/types#category">SQL</category>
            
            
                <category domain="http://www.sixapart.com/ns/types#tag">database</category>
            
            <pubDate>Thu, 16 Aug 2007 08:08:45 -0500</pubDate>
        </item>
        
        <item>
            <title>Referential Integrity - Data Modeling Mistake 1 of 10</title>
            <description><![CDATA[<p>In my mind data models are like the foundations of a house. Whether you use ORM or a more traditional modeling tool, they form the base of the entire rest of your project. Consequently, every decision you make (or don't make) regarding your data model during the design phase(s) of your project will significantly affect the duration of your project and the maintainability and performance of your application. </p>
<div style="FLOAT: right; MARGIN-LEFT: 10px">
<script type="text/javascript">
        var currentPageUrl = 'http://rapidapplicationdevelopment.blogspot.com/2007/07/referential-integrity-data-modeling.html';

        /* Digg */
        var diggIframe = document.createElement('iframe');
        diggIframe.setAttribute('src', 'http://digg.com/tools/diggthis.php?u=' + currentPageUrl);
        diggIframe.setAttribute('height', '80');
        diggIframe.setAttribute('width', '52');
        diggIframe.setAttribute('frameborder', '0');
        diggIframe.setAttribute('scrolling', 'no');
        diggIframe.setAttribute('style', 'margin-left:auto; margin-right:auto; display:block; text-align:center;');

        /* DotNetKicks */
        var dotnetkicksLink = document.createElement('a');
        dotnetkicksLink.setAttribute('href', 'http://www.dotnetkicks.com/kick/?url=' + currentPageUrl);
        var dotnetkicksImg = document.createElement('img');
        dotnetkicksImg.setAttribute('src', 'http://www.dotnetkicks.com/Services/Images/KickItImageGenerator.ashx?url=' + currentPageUrl);
        dotnetkicksImg.setAttribute('alt', 'Kick this article (a good thing) on DotNetKicks');
        dotnetkicksImg.setAttribute('border', '0');
        dotnetkicksImg.setAttribute('style', 'margin-left:auto; margin-right:auto; display:block; text-align:center;');
        dotnetkicksLink.appendChild(dotnetkicksImg);

        var div = document.createElement('div');
        div.appendChild(diggIframe);
        div.appendChild(document.createElement('br'));
        div.appendChild(dotnetkicksLink);

        document.write(div.innerHTML);
    </script>
</div>
<p>You could de-emphasize up-front planning, but every correction you make to the data model once code has been written on top of it will introduce significant delays to the project as developers refactor data access, business logic, and user interface tiers. That's why mistakes made during design are expensive, and it would behoove any architect (or project manager) to be well aware of the repercussions of data model decisions and minimize mistakes before construction begins.</p>
<p>After years of working with or maintaining applications based on poorly designed data models, and after years of modeling my own databases from scratch I've seen and made a lot of mistakes. So, I've compiled ten of the most common ones and the arguments for and against them. </p>
<p>I'll be speaking on this topic in the upcoming <a href="http://www.iasahome.org/web/capitalarea/itarc2007">IASA conference</a> in October, and so I wanted to vet these ideas with the community. I know there are strong feelings on these topics, so please help me out by commenting if you feel I've missed something or am off base. </p>
<p>I'll start with Mistake #1: Not Using Referential Integrity in this post. I'll give four common reasons for avoiding referential integrity and then rebuff them. I'll then cover the more controversial Mistake #2 Not Using Surrogate Keys in my next post.</p>
<p><b>Mistake #1 - Not using referential integrity</b></p>
<p>I've heard a lot of excuses for not using referential integrity, but I've never been swayed by one of them. If you have a record with a foreign key field you should be 100% certain that it will <i>always</i> refer to the primary key of an <i>existing</i> record in <i>one and only one</i> foreign table. The last thing you want to do is write large amounts of conditional logic because you aren't 100% certain that you aren't dealing with orphaned data. Nonetheless, here are some almost compelling arguments I've heard for not using it:</p>
<p><b>Reason #1: Project Too Small</b></p>
<p>If your project or database is only a few tables and a couple lines of code then you don't need referential integrity right? Wrong, numerous projects start small, get big, and have major problems because of it. It doesn't take much extra time to put in constraints. Avoid the urge to be lazy.</p>
<p><b>Reason #2: Accidental Oversight</b></p>
<p>Numerous applications I've seen forget a relationship or two. This is borne of writing and executing database creation statements by hand and is the reason that data modeling tools exist. When you visualize your database in a model it's hard to miss a relationship. So use a modeling tool and keep it in sync with your database, you won't regret it.</p>
<p>Incidentally I like Microsoft Visio for data modeling because you can change your schema during development and Visio won't delete your data. This enables you to keep your data model in sync with the database for the entire lifetime of the database. There are other benefits too, if you're interested see my article on <a href="http://blueink.biz/DataModelingVisio.aspx">data modeling in Microsoft Visio</a>.</p>
<p><b>Reason #3: Maximize Insert Speed</b></p>
<p>It's a fact: indexes and constraints slow down insert and update operations. If your application is heavy on writing and light on reading, then you could argue referential integrity isn't for you. This argument is often combined with the "Only one application ever uses my database" argument.</p>
<p>There are two problems with this. One problem comes when either a well meaning DBA modifies data by hand and messes up the state of the database, or more realistically when there's a bug in the application that accidentally orphans data. Orphaned data may not affect your application, but a well designed solution should plan for the future. When that data warehouse project finally gets around to importing data from your database, what do they do with the orphaned data? Ignore it? Try to integrate it? Who knows? If you've been in this position, you'll know what I mean when I say the responsible architect's name (or their app) will be synonymous with a curse word.</p>
<p>The second problem is that even if a database without referential integrity don't end up with orphaned data, a second application that might want to integrate can still never be 100% certain that foreign keys refer to existing records. It comes down to designing for the future.</p>
<p>The answer to speed is to build your database with referential integrity, drop or disable your constraints and indexes before a bulk load, and re-enable them after the bulk load. It will increase the duration of your bulk load operation over not using constraints at all, but it will be much faster than leaving them enabled and checking them for each insert. So use referential integrity: the pros outweigh the cons.</p>
<p><b>Reason #4: Mutually exclusive relationships</b></p>
<p>Too often I've seen databases with a foreign key that relates to one of five tables based on the value of a char(1) field. The space conserving mindset that comes up with this implementation is admirable, but it produces far too many negative side effects.</p>
<p>What happens when the char(1) field gets out of sync with the foreign key field? What happens when someone deletes the foreign record or changes its primary key? More orphaned data happens.</p>
<p>The solution is to use five fields that each refer to a single table. You may have more nullable fields that take up more space in the database, but it's worth it in the long run.</p>
<p><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp1.blogger.com/_gez10dNhuPk/Rp6YOlrrHgI/AAAAAAAAAI0/Nxao5WbYlFM/s1600-h/Mutually+Exclusive+Problem.gif"><img id="BLOGGER_PHOTO_ID_5088672005376122370" style="CURSOR: hand" alt="" src="http://bp1.blogger.com/_gez10dNhuPk/Rp6YOlrrHgI/AAAAAAAAAI0/Nxao5WbYlFM/s400/Mutually+Exclusive+Problem.gif" border="0" /></a></p>
<p><b>Conclusion</b></p>
<p>Well, hopefully I've convinced you to avoid the urge to be a lazy data modeler, design for the future, use a data modeling tool, and drop constraints during bulk load operations. In short, always use referential integrity. But if not, hopefully you'll at least understand when people curse your name several years from now. :)</p>]]></description>
            <link>http://www.nearinfinity.com/blogs/lee_richardson/referential_integrity_data_modeling_mistake.html</link>
            <guid>http://www.nearinfinity.com/blogs/lee_richardson/referential_integrity_data_modeling_mistake.html</guid>
            
                <category domain="http://www.sixapart.com/ns/types#category">General</category>
            
                <category domain="http://www.sixapart.com/ns/types#category">SQL</category>
            
            
                <category domain="http://www.sixapart.com/ns/types#tag">database</category>
            
            <pubDate>Wed, 18 Jul 2007 19:02:03 -0500</pubDate>
        </item>
        
        <item>
            <title>An Entity Relationship Diagram Example</title>
            <description><![CDATA[<p>It seems like a dying art, but I still strongly feel that Entity Relationship Diagrams (ERD) should be the starting point of all software development projects. Since they are for me anyway, I wanted a place to refer colleagues to for how to read these diagrams, and an Entity Relationship Diagram Example seemed like a great place to start.</p>
<div style="FLOAT: right; MARGIN-LEFT: 10px">
<script type="text/javascript">
        var currentPageUrl = 'http://rapidapplicationdevelopment.blogspot.com/2007/06/entity-relationship-diagram-example.html';

        /* Digg */
        var diggIframe = document.createElement('iframe');
        diggIframe.setAttribute('src', 'http://digg.com/tools/diggthis.php?u=' + currentPageUrl);
        diggIframe.setAttribute('height', '80');
        diggIframe.setAttribute('width', '52');
        diggIframe.setAttribute('frameborder', '0');
        diggIframe.setAttribute('scrolling', 'no');
        diggIframe.setAttribute('style', 'margin-left:auto; margin-right:auto; display:block; text-align:center;');

        /* DotNetKicks */
        var dotnetkicksLink = document.createElement('a');
        dotnetkicksLink.setAttribute('href', 'http://www.dotnetkicks.com/kick/?url=' + currentPageUrl);
        var dotnetkicksImg = document.createElement('img');
        dotnetkicksImg.setAttribute('src', 'http://www.dotnetkicks.com/Services/Images/KickItImageGenerator.ashx?url=' + currentPageUrl);
        dotnetkicksImg.setAttribute('alt', 'Kick this article (a good thing) on DotNetKicks');
        dotnetkicksImg.setAttribute('border', '0');
        dotnetkicksImg.setAttribute('style', 'margin-left:auto; margin-right:auto; display:block; text-align:center;');
        dotnetkicksLink.appendChild(dotnetkicksImg);

        var div = document.createElement('div');
        div.appendChild(diggIframe);
        div.appendChild(document.createElement('br'));
        div.appendChild(dotnetkicksLink);

        document.write(div.innerHTML);
    </script>
</div>
<p><b>The Example: A Resource Management Application</b></p>
<p>Consider that we're writing a resource management application. The first step to creating an ERD is always to identify the nouns (entities). In this case let's start with:</p>
<p>
<ul>
<li>Company</li>
<li>Employee </li>
<li>Project; and </li>
<li>Technology Project (which are a specific type of Project that perhaps require special fields like "number of entities")</li></ul>
<p></p>
<p>Here's the Example Entity Relationship Diagram I'll decipher piece by piece in this article (click to enlarge):</p>
<p><a href="http://bp3.blogger.com/_gez10dNhuPk/Rmdg-XjZyjI/AAAAAAAAAH0/sn94SwOFWO4/s1600-h/01-ErdExample.gif"><img id="BLOGGER_PHOTO_ID_5073130129846815282" style="CURSOR: hand" alt="" src="http://bp3.blogger.com/_gez10dNhuPk/Rmdg-XjZyjI/AAAAAAAAAH0/sn94SwOFWO4/s400/01-ErdExample.gif" border="0" /></a></p>
<p>(note that I'm now using singular names since my somewhat controversial decision to switch to <a href="http://rapidapplicationdevelopment.blogspot.com/2007/03/entity-naming-conventions.html" target="">naming entities in the singular</a>)</p>
<p><b>To read the notations of an Entity Relationship Diagram:</b></p>
<p>An Entity Relationship Diagram conveys a lot of information with a very concise notation. The important part to keep in mind is to limit what you're reading using the following technique:</p>
<p>
<ol>
<li>Choose two entities (e.g. Company and Employee) </li>
<li>Pick one that you're interested in (e.g. how <b>a single Company</b> relates to employees)</li>
<li>Read the notation on the second entity (e.g. the crow's feet with the O above it next to the Employee entity). </li></ol>
<p></p>
<p>The set of symbols consist of Crow's feet (which <a href="http://en.wikipedia.org/wiki/Entity-relationship_model#Crow.27s_Feet" target="_blank">Wikipedia</a> describes as looking like the forward digits of a bird's claw), O, and dash, but they can be combined in four distinct combinations. Here are the four combinations:</p>
<p>
<ul>
<li>Zero through Many (crow's feet, O)</li>
<li>One through Many (crow's feet, dash)</li>
<li>One and Only One (dash, dash)</li>
<li>Zero or One (dash, O)</li></ul>
<p></p>
<p><b>Zero through Many</b></p>
<p><a href="http://bp1.blogger.com/_gez10dNhuPk/RmeNA3jZyqI/AAAAAAAAAIs/2qnhtwSfqDc/s1600-h/02-ZeroThroughMany.gif"><img id="BLOGGER_PHOTO_ID_5073178551308110498" style="CURSOR: hand" alt="" src="http://bp1.blogger.com/_gez10dNhuPk/RmeNA3jZyqI/AAAAAAAAAIs/2qnhtwSfqDc/s400/02-ZeroThroughMany.gif" border="0" /></a></p>
<p>If, as in the diagram above, the notation closest to the second entity is a crow's feet with an O next to it, then the first entity can have zero, one, or many of the second entity. Consequently the diagram above would read: "A company can have zero, one, or many employees".</p>
<p>This is the most common relationship type, and consequently many people ignore the O. While you can consider the O optional, I consider it a best practice to be explicit to differentiate it from the less common one through many relationship.</p>
<p><b>One through Many</b></p>
<p><a href="http://bp0.blogger.com/_gez10dNhuPk/Rmdg-njZylI/AAAAAAAAAIE/xLc99gOFKvQ/s1600-h/03-OneThroughMany.gif"><img id="Img2" style="CURSOR: hand" alt="" src="http://bp0.blogger.com/_gez10dNhuPk/Rmdg-njZylI/AAAAAAAAAIE/xLc99gOFKvQ/s400/03-OneThroughMany.gif" border="0" /></a></p>
<p>If, as the next diagram shows, the notation closest to the second entity is a crow's feet with a dash, then the first entity can have one through many of the second entity. More specifically it may not contain zero of the second entity. The example above would thus read (read bottom to top): "A Project can have one through many Employees working on it."</p>
<p>This is an interesting combination because it can't (and for various reasons probably shouldn't if it could) be enforced by a database. Thus, you will only see these in logical, but not a physical, data models. It is still useful to distinguish, but your application will need to enforce the relationship in business rules.</p>
<p><b>One and Only One (onne)</b></p>
<p><a href="http://bp1.blogger.com/_gez10dNhuPk/Rmdg-3jZymI/AAAAAAAAAIM/v6ISuqqp3R8/s1600-h/04-OneAndOnlyOne.gif"><img id="Img3" style="CURSOR: hand" alt="" src="http://bp1.blogger.com/_gez10dNhuPk/Rmdg-3jZymI/AAAAAAAAAIM/v6ISuqqp3R8/s400/04-OneAndOnlyOne.gif" border="0" /></a></p>
<p>If the notation closest to the second entity contains two dashes it indicates that the first entity can have one and only one of the second. More specifically it cannot have zero, and it cannot have more than one. The example would thus read: "An Employee can have one and only one Company."</p>
<p>This combination is the most common after zero through many, and so frequently people consider the second dash optional. In fact, some ignore both dashes, but I would highly recommend at least using one for clarity so as not to confuse the notation with "I'll fill in the relationship details later".</p>
<p><b>Zero or One</b></p>
<p><a href="http://bp1.blogger.com/_gez10dNhuPk/Rmdg-3jZynI/AAAAAAAAAIU/e3XGZU-69Uk/s1600-h/05-ZeroOrOne.gif"><img id="Img4" style="CURSOR: hand" alt="" src="http://bp1.blogger.com/_gez10dNhuPk/Rmdg-3jZynI/AAAAAAAAAIU/e3XGZU-69Uk/s400/05-ZeroOrOne.gif" border="0" /></a></p>
<p>A zero or one relationship is indicated by a dash and an O. It indicates that the first entity can have zero or one of the second, but not more than one. The relationship in the example above would thus read: "A Project can have zero or one Technology Project."</p>
<p>The zero or one relationship is quite common and is frequently abbreviated with just an O (however it is most commonly seen in a many-to-many relationship rather than the one-to-one above, more on this later).</p>
<p><b>Relationship Types</b></p>
<p>Having examined the four types of notation, the discussion wouldn't be complete without a quick overview of the three relationship types. These are:</p>
<p>
<ul>
<li>One to Many</li>
<li>Many to Many</li>
<li>One to One</li></ul>
<p></p>
<p><b>One-to-Many</b></p>
<p>A one-to-many (1N) is by far the most common relationship type. It consists of either a <i>one through many</i> or a <i>zero through many</i> notation on one side of a relationship and a <i>one and only one</i> or <i>zero or one</i> notation on the other. The relationship between Company and Employee in the example is a one-to-many relationship.</p>
<p><b>Many-to-Many</b></p>
<p>The next most common relationship is a many-to-many (NM). It consists of a zero through many or one through many on both sides of a relationship. This construct only exists in logical data models because databases can't implement the relationship directly. Physical data models implement a many-to-many relationship by using an associative (or link or resolving) table via two one-to-many relationships.</p>
<p>The relationship between Employee and Project in the example is a many to many relationship. It would exist in logical and physical data models as follows:</p>
<p><a href="http://bp3.blogger.com/_gez10dNhuPk/RmdhGXjZyoI/AAAAAAAAAIc/MheEUX8GMDc/s1600-h/06-Many-To-Many.gif"><img id="Img5" style="CURSOR: hand" alt="" src="http://bp3.blogger.com/_gez10dNhuPk/RmdhGXjZyoI/AAAAAAAAAIc/MheEUX8GMDc/s400/06-Many-To-Many.gif" border="0" /></a></p>
<p><b>One-to-One</b></p>
<p>Probably the least common and most misunderstood relationship is the one-to-one. It consists of a one and only one notation on one side of a relationship and a zero or one on the other. It warrants a discussion unto itself, but for now the Project to Technology Project relationship in the example is a one to one. Because these relationships are easy to mistake for traditional one-to-many relationships, I have taken to drawing a red dashed line around them. The red dashed line is not standard at all (although a colleague, Steve Dempsey uses a similar notation), but in my experience it can help eliminate confusion.</p>
<p><a href="http://bp1.blogger.com/_gez10dNhuPk/Rmdms3jZypI/AAAAAAAAAIk/XLcplhldx3M/s1600-h/07-One-To-One.gif"><img id="BLOGGER_PHOTO_ID_5073136426268871314" style="CURSOR: hand" alt="" src="http://bp1.blogger.com/_gez10dNhuPk/Rmdms3jZypI/AAAAAAAAAIk/XLcplhldx3M/s400/07-One-To-One.gif" border="0" /></a> </p>
<p><b>Conclusion</b></p>
<p>I hope you've found this a useful example for deciphering and verifying entity relationship diagrams. As always please add any comments, disagreements, thoughts or related resources.</p>]]></description>
            <link>http://www.nearinfinity.com/blogs/lee_richardson/an_entity_relationship_diagram_example.html</link>
            <guid>http://www.nearinfinity.com/blogs/lee_richardson/an_entity_relationship_diagram_example.html</guid>
            
                <category domain="http://www.sixapart.com/ns/types#category">General</category>
            
                <category domain="http://www.sixapart.com/ns/types#category">SQL</category>
            
            
                <category domain="http://www.sixapart.com/ns/types#tag">database</category>
            
            <pubDate>Thu, 07 Jun 2007 00:38:24 -0500</pubDate>
        </item>
        
        <item>
            <title>Quick and dirty SQL histogram</title>
            <description><![CDATA[<p>Sometimes you really want a quick &amp; dirty histogram while looking through a database:
</p><ul>
<li>when you suspect the mean value is misleading
</li><li>when you want to understand how the values are distributed
</li><li>... and easily switch between different sources of values
</li><li>... without exporting data &amp; switching applications
</li></ul>
<p>Here are the story scores from a recent front page of reddit.com: 175, 456, 140, 191, 230, 186, 134, 215, 171, 83, 102, 171, 182, 322, 193, 310, 338, 345, 174, 134, 92, 109, 241, 256, 132

</p><p>A basic statistical query returns:

</p><pre class="prettyprint">+-----+-----+-----+--------+
| max | min | avg | stddev |
+-----+-----+--------------+
| 456 |  83 | 203 |   90   |
+-----+-----+-----+--------+
</pre>

<p>The standard deviation seems awfully large. Maybe not many of the scores are close to the mean score of 203? What if another query could show the distribution?

</p><pre id="dual" class="prettyprint">+--------+----------+--------+----------+         +--------+----------+--------+----------+
| bucket | contents | _floor | _ceiling |         | bucket | contents | _floor | _ceiling |
+--------+----------+--------+----------+         +--------+----------+--------+----------+
|      1 |        8 |     83 |      157 |         |      1 |        4 |     83 |      119 |
|      2 |       10 |    158 |      232 |         |      2 |        4 |    120 |      156 |
|      3 |        2 |    233 |      307 |         |      3 |        8 |    157 |      193 |
|      4 |        4 |    308 |      382 |         |      4 |        2 |    194 |      230 |
|      5 |        1 |    383 |      457 |         |      5 |        2 |    231 |      267 |
+--------+----------+--------+----------+         |      6 |        0 |    268 |      304 |
                                                  |      7 |        3 |    305 |      341 |
                                                  |      8 |        1 |    342 |      378 |
                                                  |      9 |        0 |    379 |      415 |
                                                  |     10 |        1 |    416 |      452 |
                                                  +--------+----------+--------+----------+
</pre>
<p>The histogram on the left has fewer, larger buckets. This is a lot more informative than the mean &amp; stddev. The histogram on the right uses more, smaller buckets. Maybe this is too verbose? What if you wanted seven buckets?
</p><pre class="prettyprint">        update dhg.bucket_count set num_buckets = 7;
        select * from dhg.results;

        +--------+----------+--------+----------+
        | bucket | contents | _floor | _ceiling |
        +--------+----------+--------+----------+
        |      1 |        7 |     83 |      135 |
        |      2 |        7 |    136 |      188 |
        |      3 |        5 |    189 |      241 |
        |      4 |        1 |    242 |      294 |
        |      5 |        4 |    295 |      347 |
        |      6 |        0 |    348 |      400 |
        |      7 |        1 |    401 |      453 |
        +--------+----------+--------+----------+
</pre>

<p>Instructions:
</p><ol>
<li>Insert numbers to be analyzed:</li>
<pre class="prettyprint">class="prettyprint"INSERT INTO dhg.source SELECT foo FROM bar;</pre>
<li>Choose how many buckets in the histogram</li>
<pre class="prettyprint">UPDATE dhg.bucket_count SET num_buckets = 10;</pre>
<li>Read the results!</li>
<pre class="prettyprint">SELECT * FROM dhg.results_full;</pre>
</ol>

<h2>Materials:</h2>
<p>All views, tables, and functions will live in a dynamic histogram (dhg) schema. The SQL is pretty minimal yet hopefully reasonably structured and commented. The MySQL flavor is larger due to an implementation of width_bucket.
</p><p><b>WARNING:</b> The implementation suffers from a variety of rounding errors and poor error handling. This is a quick and dirty solution for rough estimates only. 
</p><p><b>NOTE:</b> Using more than 20 buckets will need a small tweak. Grep the SQL for empty_buckets.
</p><ul>	
<li><a href="http://www.nearinfinity.com/blogs/resources/seths/postgres.txt">PostgreSQL DDL</a> (may work with Oracle 9+)
</li><li><a href="http://www.nearinfinity.com/blogs/resources/seths/mysql.txt">MySQL DDL</a>
</li><li><a href="http://www.nearinfinity.com/blogs/resources/seths/starter_data.txt">starter data</a>: scores from reddit, digg, and comment counts from slashdot
</li></ul>
<p>I'd love to hear criticisms, comments, &amp; suggestions!</p>]]></description>
            <link>http://www.nearinfinity.com/blogs/seth_schroeder/quick_and_dirty_sql_histogram.html</link>
            <guid>http://www.nearinfinity.com/blogs/seth_schroeder/quick_and_dirty_sql_histogram.html</guid>
            
                <category domain="http://www.sixapart.com/ns/types#category">SQL</category>
            
            
                <category domain="http://www.sixapart.com/ns/types#tag">histogram</category>
            
            <pubDate>Tue, 15 May 2007 14:57:27 -0500</pubDate>
        </item>
        
        <item>
            <title>Export Visio Database Table Names to Excel</title>
            <description><![CDATA[<p>If you use the Enterprise Architect edition of Microsoft Visio for data modeling regularly, then there is a good chance that at some point you've wanted to export just the table names into Excel. You might want to do this to map logical ERD entities to physical data model tables, track project status by entity, or track overlap between database versions.</p>
<div style="FLOAT: right; MARGIN-LEFT: 10px">
<script type="text/javascript">
        var currentPageUrl = 'http://rapidapplicationdevelopment.blogspot.com/2007/05/export-visio-database-table-names-to.html';

        /* Digg */
        var diggIframe = document.createElement('iframe');
        diggIframe.setAttribute('src', 'http://digg.com/tools/diggthis.php?u=' + currentPageUrl);
        diggIframe.setAttribute('height', '80');
        diggIframe.setAttribute('width', '52');
        diggIframe.setAttribute('frameborder', '0');
        diggIframe.setAttribute('scrolling', 'no');
        diggIframe.setAttribute('style', 'margin-left:auto; margin-right:auto; display:block; text-align:center;');

        /* DotNetKicks */
        var dotnetkicksLink = document.createElement('a');
        dotnetkicksLink.setAttribute('href', 'http://www.dotnetkicks.com/kick/?url=' + currentPageUrl);
        var dotnetkicksImg = document.createElement('img');
        dotnetkicksImg.setAttribute('src', 'http://www.dotnetkicks.com/Services/Images/KickItImageGenerator.ashx?url=' + currentPageUrl);
        dotnetkicksImg.setAttribute('alt', 'Kick this article (a good thing) on DotNetKicks');
        dotnetkicksImg.setAttribute('border', '0');
        dotnetkicksImg.setAttribute('style', 'margin-left:auto; margin-right:auto; display:block; text-align:center;');
        dotnetkicksLink.appendChild(dotnetkicksImg);

        var div = document.createElement('div');
        div.appendChild(diggIframe);
        div.appendChild(document.createElement('br'));
        div.appendChild(dotnetkicksLink);

        document.write(div.innerHTML);
    </script>
</div>
<p>Regardless, it turns out to be non-trivial to perform this export, particularly if you are unable to generate to a database to retrieve the table names. The trick is to use the reporting feature of Visio, but there are many reports and report options, and you will need one that is table-based to get the data into Excel easily.</p>
<p>Note: If you are unfamiliar with the capabilities of Microsoft Visio as a data modeling tool you may wish to take a look at my <a href="http://www.blueink.biz/DataModelingVisio.aspx" target="_blank">Data Modeling in Microsoft Visio Tutorial</a>.</p>
<p><b>Export Procedure</b></p>
<p>1. This may seem a little unusual, but if you don't have any comments in any of your tables (which really shouldn't be the case), you will need to add comments for at least one of your tables. Without this step Visio will not display tables in a grid format in the report.</p>
<p><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp2.blogger.com/_gez10dNhuPk/RkOVRcasq2I/AAAAAAAAAG8/WOaiqX7aBpw/s1600-h/00+Table+Notes.jpg"><img id="BLOGGER_PHOTO_ID_5063054533013056354" style="CURSOR: hand" alt="" src="http://bp2.blogger.com/_gez10dNhuPk/RkOVRcasq2I/AAAAAAAAAG8/WOaiqX7aBpw/s400/00+Table+Notes.jpg" border="0" /></a></p>
<p>2. Now select the somewhat obscure "Report" item off of the "Database" menu.</p>
<p><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp3.blogger.com/_gez10dNhuPk/RkOVRsasq3I/AAAAAAAAAHE/JQA2hQSj84c/s1600-h/01+Report+Menu.jpg"><img id="Img1" style="CURSOR: hand" alt="" src="http://bp3.blogger.com/_gez10dNhuPk/RkOVRsasq3I/AAAAAAAAAHE/JQA2hQSj84c/s400/01+Report+Menu.jpg" border="0" /></a></p>
<p>3. Only the "Table Report" provides the ability to layout database tables in a grid. Select it and click Finish.</p>
<p><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp3.blogger.com/_gez10dNhuPk/RkOVRsasq4I/AAAAAAAAAHM/l3hew4Itvdw/s1600-h/02+Table+Report.jpg"><img id="Img2" style="CURSOR: hand" alt="" src="http://bp3.blogger.com/_gez10dNhuPk/RkOVRsasq4I/AAAAAAAAAHM/l3hew4Itvdw/s400/02+Table+Report.jpg" border="0" /></a></p>
<p>4. Under "Predefined logical/physical reports" Click the button labeled "Default To: General Report" and change it to "Default To: Database Report." This will remove tables formatted per page from the end of the report.</p>
<p><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp0.blogger.com/_gez10dNhuPk/RkOVR8asq5I/AAAAAAAAAHU/qvlawWZhedQ/s1600-h/03+Database+Report+Type.jpg"><img id="Img3" style="CURSOR: hand" alt="" src="http://bp0.blogger.com/_gez10dNhuPk/RkOVR8asq5I/AAAAAAAAAHU/qvlawWZhedQ/s400/03+Database+Report+Type.jpg" border="0" /></a></p>
<p>5. Under the "Attributes" tab select "Deselect All" then select the "Table stats summary" option.</p>
<p><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp0.blogger.com/_gez10dNhuPk/RkOVR8asq6I/AAAAAAAAAHc/4435zlyq1Wk/s1600-h/04+Table+Stats+Summary.jpg"><img id="Img4" style="CURSOR: hand" alt="" src="http://bp0.blogger.com/_gez10dNhuPk/RkOVR8asq6I/AAAAAAAAAHc/4435zlyq1Wk/s400/04+Table+Stats+Summary.jpg" border="0" /></a></p>
<p>6. Click "Export to RTF," save the file somewhere, and open it with Microsoft Word.</p>
<p>7. (optional) If you have any new lines in the notes field you may have to replace them with spaces. Just do a search and replace for "^l" and replace with " ".</p>
<p>8. Now you're ready to copy and paste.</p>
<p><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp0.blogger.com/_gez10dNhuPk/RkOVY8asq7I/AAAAAAAAAHk/0M_u7Bh6aIE/s1600-h/05+Word+Copy.jpg"><img id="Img5" style="CURSOR: hand" alt="" src="http://bp0.blogger.com/_gez10dNhuPk/RkOVY8asq7I/AAAAAAAAAHk/0M_u7Bh6aIE/s400/05+Word+Copy.jpg" border="0" /></a></p>
<p><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp0.blogger.com/_gez10dNhuPk/RkOVY8asq8I/AAAAAAAAAHs/mJetOKvLlUE/s1600-h/06+Excel+Paste.jpg"><img id="Img6" style="CURSOR: hand" alt="" src="http://bp0.blogger.com/_gez10dNhuPk/RkOVY8asq8I/AAAAAAAAAHs/mJetOKvLlUE/s400/06+Excel+Paste.jpg" border="0" /></a></p>
<p>And you're done! Hopefully this tutorial will make life easier for you next time you need to export table names from Visio to Microsoft Excel.</p>]]></description>
            <link>http://www.nearinfinity.com/blogs/lee_richardson/export_visio_database_table_names.html</link>
            <guid>http://www.nearinfinity.com/blogs/lee_richardson/export_visio_database_table_names.html</guid>
            
                <category domain="http://www.sixapart.com/ns/types#category">General</category>
            
                <category domain="http://www.sixapart.com/ns/types#category">SQL</category>
            
            
                <category domain="http://www.sixapart.com/ns/types#tag">Visio</category>
            
            <pubDate>Thu, 10 May 2007 18:23:07 -0500</pubDate>
        </item>
        
        <item>
            <title>Entity Naming Conventions</title>
            <description><![CDATA[<p>It seems as though as software developers mature they develop consistency in their approach to just about every aspect of their work, regardless if there is a good reason for adopting a particular practice or not.</p>
<p>For instance, in data modeling I developed the habit of always naming my tables in the plural&nbsp;- Employees instead of Employee, and such. There's no reason for this convention, other than perhaps I copied what I saw from the Northwind database.</p>
<p>But it's important to question these practices from time to time, and after over seven years of doing things the same way I have decided to make a change. And for the second time now (see my post <a href="http://rapidapplicationdevelopment.blogspot.com/2007/01/importance-of-logical-data-model.html">The Importance of a Logical Data Model</a>), it was a colleague: Steve Dempsey who initiated the change. So why would one opt for singular names over plural ones?</p>
<p>Developers might chose singular names because they are shorter and require less typing, but this argument never held for me because of tools like intellisense and code generation (not to mention touch typing). But Steve is extremely adamant about singular names for a different reason: because of relationship readability.</p>
<p>For instance, in Sharepoint, workflows relate to events. Specifically, a workflow (singular) is initiated by one and only one event, and an event (singular) can initiate multiple workflows, as is expressed below:</p>
<p><img src="http://www.nearinfinity.com/blogs/resources/lrichard/Singular.gif" border="0" /></p>
<p>The objective of modeling is thus to express the relationship of a single entity (a workflow, an event, or whatever) to zero or one or many of another entity. So why not just name your entities appropriately in the first place: by making them singular?</p>
<p>Of course now the problem is getting an old dog to remember his new trick. Or is it tricks?</p>
<p>---</p>
<p><small>note: I am now double posting my blog entries, this post is also available on <a href="http://rapidapplicationdevelopment.blogspot.com/2007/03/entity-naming-conventions.html">Blogspot</a>.</small></p>]]></description>
            <link>http://www.nearinfinity.com/blogs/lee_richardson/entity_naming_conventions.html</link>
            <guid>http://www.nearinfinity.com/blogs/lee_richardson/entity_naming_conventions.html</guid>
            
                <category domain="http://www.sixapart.com/ns/types#category">SQL</category>
            
            
                <category domain="http://www.sixapart.com/ns/types#tag">database</category>
            
            <pubDate>Mon, 26 Mar 2007 10:14:17 -0500</pubDate>
        </item>
        
    </channel>
</rss>
