<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0">
    <channel>
        <title>Hadoop - Blogs at Near Infinity</title>


        <link>http://www.nearinfinity.com/blogs/</link>
        <description>Employee Blogs</description>
        <language>en</language>
        <copyright>Copyright 2010</copyright>
        <lastBuildDate>Tue, 05 Jan 2010 22:34:20 -0500</lastBuildDate>
        <generator>http://www.sixapart.com/movabletype/</generator>
        <docs>http://www.rssboard.org/rss-specification</docs>
        
        <item>
            <title>Using HBase-dsl</title>
            <description><![CDATA[At the beginning of last month I started prototyping various solutions for a customer using HBase. &nbsp;However I found myself writing tons of code to perform some fairly simple tasks. &nbsp;So I set out to simply my HBase code and ended up writing a Java <a href="http://wiki.github.com/nearinfinity/hbase-dsl" target="_blank">HBase DSL</a>. &nbsp;It's still fairly rough around the edges but it does allow the use of standard Java types and it's extensible.<div><br /><font class="Apple-style-span" style="font-size: 1.25em; "><font class="Apple-style-span" style="font-size: 1.25em; ">

Simple Put and Get Example</font></font><br /><br /><b>

Direct HBase API:</b><br />

<br />
<pre class="prettyprint">public class PutAndGet {
   public static void main(String[] args) throws IOException {
      HTable hTable = new HTable("test");

      byte[] rowId = Bytes.toBytes("abcd");
      byte[] famA = Bytes.toBytes("famA");
      byte[] col1 = Bytes.toBytes("col1");
      Put put = new Put(rowId).
         add(famA, col1, Bytes.toBytes("hello world!"));
      hTable.put(put);
      Get get = new Get(rowId);
      Result result = hTable.get(get);
      byte[] value = result.getValue(famA, col1);
      System.out.println(Bytes.toString(value));
   }
}
</pre><b>HBase-dsl API:</b><br /><br />
<pre class="prettyprint">public class PutAndGetWithDsl { 
   public static void main(String[] args) throws IOException { 
      HBase&lt;QueryOps, String&gt; hBase = new HBase&lt;QueryOps&lt;String&gt;, String&gt;(String.class);

      hBase.save("test").  
         row("abcd"). 
            family("famA"). 
               col("col1", "hello world!"); 
      String value = hBase.fetch("test"). 
         row("abcd").
            family("famA"). 
               value("col1", String.class)
      System.out.println(value);
   }
 }</pre>

Now this is where the dsl becomes more powerful!<div><br /><font class="Apple-style-span" style="font-size: 1.25em; "><font class="Apple-style-span" style="font-size: 1.25em; ">

Scanner Example</font></font><br /><br /><b>

Direct HBase API:</b><br /><br />

<pre class="prettyprint">public class Scanner {
   public static void main(String[] args) throws IOException {
      byte[] famA = Bytes.toBytes("famA");
      byte[] col1 = Bytes.toBytes("col1");  

      HTable hTable = new HTable("test");  

      Scan scan = new Scan(Bytes.toBytes("a"), Bytes.toBytes("z"));
      scan.addColumn(famA, col1);  

      SingleColumnValueFilter singleColumnValueFilterA = new SingleColumnValueFilter(
           famA, col1, CompareOp.EQUAL, Bytes.toBytes("hello world!"));
      singleColumnValueFilterA.setFilterIfMissing(true);  

      SingleColumnValueFilter singleColumnValueFilterB = new SingleColumnValueFilter(
           famA, col1, CompareOp.EQUAL, Bytes.toBytes("hello hbase!"));
      singleColumnValueFilterB.setFilterIfMissing(true);  

      FilterList filter = new FilterList(Operator.MUST_PASS_ONE, Arrays
           .asList((Filter) singleColumnValueFilterA,
                singleColumnValueFilterB));  

      scan.setFilter(filter);  

      ResultScanner scanner = hTable.getScanner(scan);  

      for (Result result : scanner) {
         System.out.println(Bytes.toString(result.getValue(famA, col1)));
      }
   }
}</pre>
<b>HBase-dsl API:</b><br /><br />

<pre class="prettyprint">public class ScannerWithDsl {
   public static void main(String[] args) throws IOException {
      HBase&lt;QueryOps, String&gt; hBase = new HBase&lt;QueryOps&lt;String&gt;, String&gt;(String.class);

      hBase.scan("test","a","z").
         select().
            family("famA").
               col("col1").
         where().
            family("famA").
               col("col1").eq("hello world!","hello hbase!").
         foreach(new ForEach<row>() {
            @Override
            public void process(Row row) {
               System.out.println(row.value("famA", "col1", String.class));
            }
         });
  }
}</row></pre><br />
See the unit tests, for more examples.<br /><br /></div></div>]]></description>
            <link>http://www.nearinfinity.com/blogs/aaron_mccurry/using_hbase-dsl.html</link>
            <guid>http://www.nearinfinity.com/blogs/aaron_mccurry/using_hbase-dsl.html</guid>
            
                <category domain="http://www.sixapart.com/ns/types#category">Database</category>
            
                <category domain="http://www.sixapart.com/ns/types#category">Hadoop</category>
            
                <category domain="http://www.sixapart.com/ns/types#category">Java</category>
            
                <category domain="http://www.sixapart.com/ns/types#category">Persistence</category>
            
            
                <category domain="http://www.sixapart.com/ns/types#tag">hadoop</category>
            
                <category domain="http://www.sixapart.com/ns/types#tag">hbase</category>
            
                <category domain="http://www.sixapart.com/ns/types#tag">hbase-dsl</category>
            
                <category domain="http://www.sixapart.com/ns/types#tag">java</category>
            
            <pubDate>Tue, 05 Jan 2010 22:34:20 -0500</pubDate>
        </item>
        
        <item>
            <title>Hive - The next great data warehouse</title>
            <description><![CDATA[In the past few weeks I have been spending more and more time working with Hadoop and Hive.&nbsp; For those of you that don't know what Hadoop is check out what <a href="http://en.wikipedia.org/wiki/Hadoop">wikipedia</a> has to say.&nbsp; Hive is built on top of Hadoop, simply stated is it a SQL engine that submits <a href="http://en.wikipedia.org/wiki/Map_Reduce">map/reduce</a> jobs to Hadoop for execution.<br /><br />So next you ask yourself, "why do I care"?&nbsp; Well with Hive using Hadoop for all the heavy lifting, the amount of data that you can process is only limited by the amount of hardware you have in your cluster.&nbsp; Hive is used for data warehousing which means that it is designed to work on huge datasets, huge joins, huge data loads, huge query results, etc.&nbsp; However before you start thinking about getting rid of that MySQL database, think again.&nbsp; Hive is not and never will be low latency.&nbsp; All queries submit map/reduce jobs to Hadoop which then operates on files stored in HDFS.<br /><br />Hive has a lot of nice features built in, like:<br /><ul><li>It can operate on <i>raw</i> files located in HDFS, like logs from you application, like csv files from your database(s).&nbsp; So this can reduce your load time, because you don't have to actually load it into a database before you can use it.</li><li>It can operate on compressed files.&nbsp; I started using this feature last week because I am getting a 4 to 1 compression ratio with no different in performance (I am using sequence files with block compression).</li><li>In your SQL statements you can actually use the Hadoop streaming api to build your own mapper and reducers, and they don't even have to be written in Java!</li><li>You can also create your own user defined functions, so when you have to do something crazy with the data, you can!</li></ul><br />And there are lots more, so go check it out!<br /><br /><a href="http://wiki.apache.org/hadoop/Hive">Hive</a>, the real Netezza killer.<br />]]></description>
            <link>http://www.nearinfinity.com/blogs/aaron_mccurry/hive_-_the_next_great_data_war.html</link>
            <guid>http://www.nearinfinity.com/blogs/aaron_mccurry/hive_-_the_next_great_data_war.html</guid>
            
                <category domain="http://www.sixapart.com/ns/types#category">Hadoop</category>
            
                <category domain="http://www.sixapart.com/ns/types#category">Java</category>
            
                <category domain="http://www.sixapart.com/ns/types#category">SQL</category>
            
            
                <category domain="http://www.sixapart.com/ns/types#tag">hadoop</category>
            
                <category domain="http://www.sixapart.com/ns/types#tag">hdfs</category>
            
                <category domain="http://www.sixapart.com/ns/types#tag">hive</category>
            
                <category domain="http://www.sixapart.com/ns/types#tag">java</category>
            
                <category domain="http://www.sixapart.com/ns/types#tag">sql</category>
            
            <pubDate>Sun, 04 Oct 2009 13:18:55 -0500</pubDate>
        </item>
        
    </channel>
</rss>
