Lee Richardson

All | General | Java | Ruby | .Net
XML
20080527 Tuesday May 27, 2008
A Major Silverlight PITA and Two Annoying 3.0 Limitations

Pardon my rant, but the thing I currently hate most about Silverlight (besides copious XML) is the Visibility property. Any sane framework would implement Visibility as a Boolean. Not Silverlight though. It’s creators in undoubted infinite wisdom implemented it as an enumeration. The values of the enumeration? There are two: Visible and Collapsed. Hmmm.

Of course this causes superfluous verbosity in common everyday code:

button1.Visibility = makeVisible ? Visibility.Visible : Visibility.Collapsed;

Or worse when things get a little more complex:

// don't display the panel if its button's aren't visible
panel1.Visibility = !(button1.Visibility == Visibility.Visible && button2.Visibility == Visibility.Visible) ? Visibility.Visible : Visibility.Collapsed;

Clearly this was done to keep Silverlight compatible with the Windows Presentation Foundation (WPF) which has three values in its enumeration property (Visible, Hidden, and Collapsed). But that’s just as ridiculous. Why WPF couldn’t use two properties, Visible (Boolean) and NotVisibleBehavior (enumeration) is beyond me.

It’s ok though, because .Net 3.0 gave me a cure to any Framework shortcomings: Extension Methods. A syntactic sugar cure for all my bitterness:

public static void SetVisible(this FrameworkElement element, bool visible) {
    element.Visibility = visible ? Visibility.Visible : Visibility.Collapsed;
}

public static bool IsVisible(this Visibility visibility) {
    return visibility == Visibility.Visible;
}

Fantastic, now my "complex" example becomes:

// don't display the panel if its button's aren't visible
panel1.SetVisible(!(button1.IsVisible() && button2.IsVisible()));

Still not quite as nice as a Boolean visible property, but certainly doable.

3.0 Limitation #1, By Ref Extension Methods

But wait. Isn’t it best practice in Silverlight to use binding for these types of things? Separation of logic from presentation and all. So I should do:

<StackPanel Visibility="{Binding IsPanelVisible}">

And then:

public class DisplayStuff : INotifyPropertyChanged {
    public
Visibility IsPanelVisible { get; private set; }

    public void UpdateStatus(bool makeVisible) {
        IsPanelVisible = makeVisible ? Visibility.Visible : Visibility.Collapsed;
        // make sure to notify the control that the property has changed
        PropertyChanged(this, new PropertyChangedEventArgs("IsPanelVisible"));
    }
}

And we can set the DataContext of some parent element to an instance of DisplayStuff and all the children including our panel magically databind. That’s cool, but the ugliness is back (well, not as bad since I removed the buttons to simply the example, but you can pretend). This is because we extended FrameworkElement not Visibility. No problem, just extend Visibility right?

public static void SetVisible(this Visibility visibility, bool visible) {
    visibility = visible ? Visibility.Visible : Visibility.Collapsed;
}

Except this doesn’t work. Can you spot the problem?

It compiles. It runs. But the value of IsPanelVisible never changes. Oh yea, C# is pass by value by default. And now the .Net Framework 3.0 limitation. This isn’t possible:

public static void SetVisible(this ref Visibility visibility, bool visible) {

You get "The parameter modifier 'ref' cannot be used with 'this'." Grr.

Limitation #2, By Ref Automatic Properties

Ok, so remove “this”, and go back to C# 2.0 helper functions which extension methods are syntactic sugar for anyway:

public static void SetVisible(ref Visibility visibility, bool visible) {

And now our class can do:

ExtensionMethods.SetVisible(ref IsPanelVisible, makePanelVisible);

Right? Not so fast I’m afraid. Compile error. “A property or indexer may not be passed as an out or ref parameter”. And I guess this is reasonable. You can’t pass the address of a function, which is what a property is in the background. So you should pass the private variable that backs the property.

Except that I don’t have one! I used an automatic property. And .Net doesn’t let me access the private variable backing the automatic property. So I’m stuck!

And this is .Net 3.0 limitation #2. Automatic properties are wonderful until you try to do much with them. Why couldn’t the framework notice that I’m using an automatic property and pass the variable that I can’t access by ref to my function?

And now I find myself back in a .Net 2.0 world because all the features I like so much in 3.0 are more sugar than substance.

Conclusion

Allowing automatic properties to pass by reference or allowing access to the private member behind them would be nice. Allowing extension methods to change the instance they extend would be nice. But ultimately none of this would be a problem if Visible had been implemented as a Boolean. The way every other framework in the world does. </Complaining>

Posted by lrichard May 27 2008, 08:52:54 AM EDT
20080327 Thursday March 27, 2008
Expression Trees: Why LINQ to SQL is Better than NHibernate

In my last post I described how the Where() function works for LINQ to Objects via extension methods and the yield statement. That was interesting. But where things get crazy is how the other LINQ technologies, like LINQ to SQL use extension methods. In particular it’s their use of a new C# 3 feature called expression trees that makes them extremely powerful. And it’s an advantage that more traditional technologies like NHibernate will never touch until they branch out from being a simple port of a Java technology. In this post I’ll explain the inherent advantage conferred on LINQ technologies by expression trees and attempt to describe how the magic works.

What’s so Magic about LINQ to SQL?

LINQ to SQL (and it’s more powerful unreleased cousin LINQ to Entities) is a new Object Relational Mapping (ORM) technology from Microsoft. It allows you to write something like the following:

IEnumerable<Product> products = northwindDataContext.Products.Where(
      p => p.Category.CategoryName == "Beverages"
      );

Which as you’d expect returns products from the database whose category is Beverages. But wait, aren’t you impressed? If not read over that code again, you should be very impressed. In the background that C# code is converted into the following SQL:

SELECT [t0].[ProductID], [t0].[ProductName], ...
FROM [dbo].[Products] AS [t0]
LEFT OUTER JOIN [dbo].[Categories] AS [t1]
ON [t1].[CategoryID] = [t0].[CategoryID]
WHERE [t1].[CategoryName] = @p0

In other words it’s pretty smart. It isn’t just returning all products and filtering them in memory using the LINQ to Objects version of Where() I discussed previously.

Doing something like that using NHibernate Criteria would require something like this:

ICriteria c = session.CreateCriteria(typeof(Product));
c.Add(Expression.Eq("Category.CategoryName", "Beverages"));
IEnumerable<Product> products = c.List<Product>();

You could use HQL too, but both NHibernate options suffer from the same problem. Did you spot it?

The LINQ to SQL version is taking actual strongly typed C# code and somehow smartly converting it to useful SQL. The NHibernate version does the same thing, but always using a weakly typed alternative. In other words the column “CategoryName” in NHibernate is a string. If it or its data type change in NHibernate you won’t find out until runtime. And that is the beauty of LINQ to SQL: you’ll find more errors at compile time. And if you’re like me you want the compiler to find your mistakes before the unit tests that you (or your fellow developers) may or may not have written do.

So you’re probably now wondering if you can put strongly typed C# in your where clause and it somehow magically gets converted to SQL, what’s the limit? If you put in a String.ToLower() or StartsWith() will it get converted to equivalent SQL? What about a loop or conditional? A function call? A recursive function call? At some point it has to break down and either return all products and filter them in memory or just fail right? Before answering those questions we need to understand what’s going on.

Understanding the Magic

The Magic happens in a class called Expression<T>. Expression takes a generic argument that must be a delegate and is usually one of the built in Func methods.  However the class can only be instantiated to a lambda expression. That’s right, not a delegate or anonymous method, only a Lambda expression. So in my deferred execution post where I explained what Lambda expression are, I said they were essentially syntactic sugar for an anonymous methods. Well, the emphasis is on the essentially, because they really aren’t sugar at all. When you assign a lambda expression to an Expression, the compiler, rather than generating the IL to evaluate the expression, generates IL that constructs an abstract syntax tree (AST) for the expression! You can then parse the tree and perform actions based on the code in the lambda expression.

Below is an example adapted from the .Net Developer’s guide on MSDN that shows how this works:

// convert the lambda expression to an abstract syntax tree
Expression<Func<int, bool>> expression = i => i < 5;

ParameterExpression param = (ParameterExpression)expression.Parameters[0];
// this next line would fail if we change the Lambda expression much
BinaryExpression operation = (BinaryExpression)expression.Body;
ParameterExpression left = (ParameterExpression)operation.Left;
ConstantExpression right = (ConstantExpression)operation.Right;

Console.WriteLine("Decomposed expression: {0} => {1} {2} {3}",
      param.Name,
      left.Name,
      operation.NodeType,
      right.Value
      );

This outputs “Decomposed expression: i => i LessThan 5”. The first line is the most important. It defines an Expression that takes a delegate with a single int parameter and a return type of bool. It then instantiates the Expression to a simple lambda expression.  Incidentally this would also work if we defined our own Delegate:

public delegate bool LessThanFive(int i);

public static void DoStuff() {
      Expression<LessThanFive> expression = i => i < 5;
}

It would, however, not work if we used an anonymous method:

Expression<Func<int, bool>> expression = delegate(int i) { return i < 5; };

While that looks legal it actually results in the compile time error “An anonymous method expression cannot be converted to an expression tree.”

There is a lot of complexity in parsing the AST, far beyond the scope of this article. However, the MSDN does have a nice diagram that helps explain how the following slightly more complicated Lambda expression that determines if a string has more letters than a number:

Expression<Func<string, int, bool>> expression =
    (str, num) => num > str.Length;

How Deep Does The Rabbit Hole Go?

So LINQ to SQL uses this Expression Tree technique to parse a plethora of possible code that you could throw at it and turn it into smart SQL. For instance check out a couple of the following conversions that LINQ to SQL will (or will not) perform:

p => p.Category.CategoryName.ToLower() == "beverages"

Results In:

SELECT [t0].[ProductID], ...
FROM [dbo].[Products] AS [t0]
LEFT OUTER JOIN [dbo].[Categories] AS [t1] ON [t1].[CategoryID] = [t0].[CategoryID]
WHERE LOWER([t1].[CategoryName]) = @p0

Not bad, huh? How about:

p => p.Category.CategoryName.Contains("everage")

That results in the following SQL snippet:

WHERE [t1].[CategoryName] LIKE @p0

And it sets @p0 to “%everage%”. Pretty cool. Ok this will get it to fail though, right?

public static string GetCat() {
    return "Beverages";
}

IEnumerable<Product> products = northwindDataContext.Products.Where(
      p => p.Category.CategoryName == GetCat()
      );

It turns out that LINQ to SQL will look inside of other functions! Alright, there’s no way it can do complicated conditionals:

p => p.Category.CategoryName ==
    "Beverages" ? p.UnitsInStock < 5 : !p.Discontinued

This should only pick up Beverages that have fewer than 5 items in stock regardless of whether they are discontinued and any other products that aren’t discontinued. Would you believe that it runs a single SQL statement:

SELECT [t0].[ProductID], ...
FROM [dbo].[Products] AS [t0]
LEFT OUTER JOIN [dbo].[Categories] AS [t1] ON [t1].[CategoryID] = [t0].[CategoryID]
WHERE (
    (CASE
        WHEN [t1].[CategoryName] = @p0 THEN
            (CASE
                WHEN [t0].[UnitsInStock] < @p1 THEN 1
                WHEN NOT ([t0].[UnitsInStock] < @p1) THEN 0
                ELSE NULL
             END)
        ELSE CONVERT(Int,
            (CASE
                WHEN NOT ([t0].[Discontinued] = 1) THEN 1
                WHEN NOT NOT ([t0].[Discontinued] = 1) THEN 0
                ELSE NULL
             END))
     END)) = 1

Wow, it sure isn’t pretty, but it scales to multiple conditionals, and most importantly it didn’t return all products and process them in memory. Not bad.

Conclusion

I asserted up front that using expression trees and the strong typing that comes with them is the reason LINQ to SQL is inherently better that NHibernate. I really can’t make that claim without admitting one of LINQ to SQL’s biggest shortcomings: It currently does not support multiple table inheritance. Ultimately, however, it’s a short term fault since the forthcoming LINQ to Entities does. And I stand by my claim because from a long term perspective as long as technologies like NHibernate remain pure ports of Java code they will never realize the full benefits of equivelant LINQ technologies that take advantage of .Net's native strengths: like expression trees.


Note: Please post comments to my blogspot blog

Posted by lrichard Mar 27 2008, 04:59:40 PM EST
20080314 Friday March 14, 2008
How System.Linq.Where() Really Works

After writing my last blog entry on Deferred Execution in LINQ I had a conversation with Seth Schroeder who rightly pointed out among other things that I really didn't show how LINQ's deferred execution works internally. So in this post I wanted to implement my own LINQ Where() extension method based off of the one in the System.Linq namespace. So I'll show you the code, explain interesting parts of how it works including collection initializiers and extension methods, and then explain where the deferred execution behavior comes from (i.e. the yield statement). I will only explain in the context of LINQ to Objects since that's far simpler than other Linq's. I will implement a Where() like LINQ to SQL does in a later blog post (that's where things get really crazy).

Implementing MyWhere()

Let's start out with some code. The first question is does this compile?

using System;
using System.Collections.Generic;
using MyExtensionMethods;

namespace PlayingWithLinq {
    public class LinqToObjects {
        public static void DoStuff() {
            IList<int> ints = new List<int>() {9,8,7,6,5,4,3,2,1};

            IEnumerable<int> result = ints.MyWhere(i => i < 5);

            foreach (int i in result) {
                Console.WriteLine(i);
            }
        }
    }
}

namespace MyExtensionMethods {
    public static class ExtensionMethods {
        public static IEnumerable<TSource> MyWhere<TSource>(
            this IEnumerable<TSource> source,
            Func<TSource, bool> predicate
            ) {

            foreach (TSource element in source) {
                if (predicate(element)) {
                    yield return element;
                }
            }
        }
    }
}

Side note: putting two namespaces in on file is far from a best practice, but yes that is allowed.

Lambdas and Collection Initializers

If you're new to C# 3.5 then your first thought may be that:

IList<int> ints = new List<int>() {9,8,7,6,5,4,3,2,1};

is not allowed. Actually it is. It's the collection initializer syntax that I initially whined about in my post C# 3.0: The Sweet and Sour of Syntactic Sugar (ironically I actually like this syntax the more I use it.)

Your next thought may be that:

i => i < 5

is not legitimate. This is in fact a Lambda Expression, and as I explained in Deferred Execution, The Elegance of LINQ it conceptually compiles down to an anonymous method. Incidentally those that know Groovy (myself not included) or Lisp may know this as a closure since as we'll see later it can access local variables.

Extension Methods

Ok, the .Net Framework certainly has no MyWhere() function on the List object so this certainly wouldn't compile in C# 2. But that's where C# 3's Extension Methods come in. The "this" in:

MyWhere<TSource>(this IEnumerable<TSource> source,

says that MyWhere() can be applied to any generic IEnumerable. If you want to, you can still call MyWhere() normally:

IList<int> ints = new List<int>() {9,8,7,6,5,4,3,2,1};
ExtensionMethods.MyWhere(ints, i => i < 5);

And in fact this is what the compiler does in the background when you call MyWhere() off of an IEnumerable. But now with extension methods you don't have to.

But does MyWhere() now exist on all IEnumerable objects everywhere? No, it turns out you only get MyWhere() when you import the namespace it exists in (MyExtensionMethods). Incidentally unlike Groovy and Ruby there is no way to add an extension method to a class itself, only to instances.

Whose got the Func()?

The last two questionable parts of the code are the Func<TSource, bool> and the yield. Func is pretty easy. It's simply one of several new predefined delegates (method signatures) that comes with the .Net framework off of the System namespace. The two generic argument one above will match any function that returns the second generic argument and takes the first generic argument as a parameter. It looks like this:

delegate TResult Func<T, TResult>(T arg1);

So rather than using a Lambda expression in my initial example I could have been very explicit about the delegate instance (myFunc):

public static void DoStuff() {
      IList<int> ints = new List<int>() {9,8,7,6,5,4,3,2,1};

      Func<int, bool> myFunc = IsSmall;
      IEnumerable<int> result = ints.MyWhere<int>(myFunc);

      foreach (int i in result) {
            Console.WriteLine(i);
      }
}

public static bool IsSmall(int i) {
      return i < 5;
}

And that would have done the same thing. Notice I had to specify the generic type on the call to MyWhere() since the compiler can't infer the type in this example.

Yield

Now the really interesting part: yield. Yield is what makes deferred execution work. It actually was introduced with C# 2.0, but I don't think anyone really used it (I didn't know about it until recently). So because MyWhere() returns an IEnumerable (and because it isn't anonymous and doesn't have ref or out parameters) it is allowed to use the yield statement. When a method has a yield return (or yield break) statement, then execution of the method doesn't even begin until a calling method first iterates over the resulting IEnumerable. Execution then begins in the method and runs to the first yield statement, returns a result, and passes execution back to the caller. When the calling method iterates to the next value execution continues in the method where it left off until it gets to the next yield statement and then it passes execution back to the caller again and so on. Weird huh? Joshua Flanagan has a nice article that explains this in more detail along with some of the nice benefits like a smaller memory footprint.

So here's a quiz. What happens when you execute the following code?

IList<int> ints = new List<int>() {9,8,7,6,5,4,3,2,1};

IEnumerable<int> result = ints.MyWhere<int>(i => i < 4);

ints.Add(0);

foreach (int i in result) {
      Console.WriteLine(i);
}

Without the yield you'd get the numbers 3 through 1 since you added 0 after the call to MyWhere(). But since the yield in MyWhere() (and the Where() in System.Linq) defers execution until the foreach statement, you actually get 3 through 0. Ready for a little more mind bending? How about this:

IList<int> ints = new List<int>() {9,8,7,6,5,4,3,2,1};

int j = 4;

IEnumerable<int> result = ints.MyWhere<int>(i => i < j);

ints.Add(0);
j = 3;

foreach (int i in result) {
      Console.WriteLine(i);
}

Does the state of j get captured? My intuition would say yes. If so you'd expect 3 through 0. Well, the closure part of anonymous methods and lambdas work by keeping a reference to their calling object (this). So consequently they always get the most up to date value of a variable. So if your intuition works like mine you'd be wrong. You actually get the numbers 2 through 0. Crazy huh? And definitely something I hope I won't run into in someone's code (JetBrains ReSharper actually warns you if you do something crazy like this).

Conclusion

If this made sense then you should have a pretty solid grasp of how most of Linq to Objects works. Understanding extension methods, Func delegates, and yield statements should form the majority of what Linq does. Well, except for expression trees. But that's a topic for another post. Please post if this doesn't make sense or if I got it all wrong, I'd love to hear from you.


P.S. To comment on this article please use my public Blog.

Posted by lrichard Mar 14 2008, 02:53:47 PM EST
20080220 Wednesday February 20, 2008
Deferred Execution, the Elegance of LINQ

One of the things I love about LINQ is its deferred execution model. It's the type of thing that makes sense academically when you first read about it (e.g. in Part Three of Scott Gunthrie's LINQ to SQL series), but for me anyway, it took some time to understand enough to use effectively.

For instance the Daily RSS Download open source application that I wrote about last week needs to download entries (posts) that are newly published since the last download. While it isn't a complicated problem, my first attempt at a solution didn't use the power of LINQ correctly. I'll explain my naïve solution in this post, describe how LINQ's deferred execution works (i.e. Lambda expressions), explain the problems with my solution, then give an the elegant solution that is only possible because of LINQ's deferred execution model. See if you can spot my error along the way.

Downloading the Latest Entries

Downloading the latest entries would be a ridiculously simple problem if there weren't multiple formats for RSS. But since the solution needs to support Atom and RSS 2.0 and 1.0 and potentially other future formats, the class structure should be set up appropriately:

The newspaper class primarily exists to enumerate feeds:

public class Newspaper {
    public void DownloadNow() {
      foreach
(Feed objFeed in Settings.Feeds) {
        objFeed.DownloadRecentEntries(...);
      }
    }
}

The Feed class is abstract and during runtime is either an RssFeed or an AtomFeed. The relevant function Feed.DownloadRecentEntries() calls the abstract Feed.GetEntries() method, which returns a group of Entry objects.

public abstract class Feed {
    public
abstract IEnumerable<Entry> GetEntries(XDocument rssfeed);

    public
void DownloadRecentEntries(...) {
        XDocument xdocFeed = XDocument.Load(Url);
        IEnumerable<Entry> lstRecentPosts = GetEntries(xdocFeed);

        foreach (Entry objEntry in lstRecentPosts) {
            objEntry.Download(...)
        }
    }
}

The Feed classes, RssFeed and AtomFeed then implement GetEntries as follows:

publpublic class RssFeed : Feed {
    public override IEnumerable<Entry> GetEntries(XDocument rssfeed) {
        return from item in rssfeed.Descendants("item")
               where (DateParser.ParseDateTime(item.Element("pubDate").Value)
                   >= this.LastDownloaded)
                   || this.LastDownloaded == null
               select
(Entry)new RssEntry(item, this);
    }
}

public class AtomFeed : Feed {
    public override IEnumerable<Entry> GetEntries(XDocument rssfeed) {
        return from item in rssfeed.Descendants(_atomNamespace + "entry")
               where (DateParser.ParseDateTime(
                   item.Element(_atomNamespace + "published").Value)
                    >= this.LastDownloaded)
                    || this.LastDownloaded == null
               select (Entry)new AtomEntry(item, this);
    }
}

Yes, that's all LINQ to XML in there. It looks a lot like SQL, but as you'll see in a second it's really just glorified syntactic sugar. Expressive though, isn't it? While the astute reader may have already spotted the inelegance of my solution, for those unfamiliar with LINQ, let's first describe what AtomFeed.GetEntries() does.

What is this Deferred Execution Stuff?

If you already understand LINQ and how delayed execution works feel free to skip this section. For everyone else it's important to understand that the following line:

from item in rssfeed.Descendants("item")
               where (DateParser.ParseDateTime(item.Element("pubDate").Value)
                   >= this.LastDownloaded)
                   || this.LastDownloaded == null
               select
(Entry)new RssEntry(item, this);

Is actually just syntactic sugar for the following set of statements:

rssfeed
    .Descendants(_atomNamespace + "entry")
    .Where( item => (DateParser.ParseDateTime(
        item.Element(_atomNamespace + "published").Value)
        >= this.LastDownloaded)
        || this.LastDownloaded == null)
    .Select( item => (Entry)new AtomEntry(item, this));

Now XDocument.Descendants() returns IEnumerable which most definitely does not have a Where() function on it. And if you look at the return type of Where() in this context, it returns an IEnumerable which definetly does not have a Select() method on it. That's because Where() and Select() are extension methods, meaning you can attach them on to just about anything. They're new to C# 3.0 and are beyond the scope of this article.

But more important for the topic of deferred execution is the => operator, which is a Lambda expression and is also new to C# 3.0. The best way to understand them is that they are essentially syntactic sugar for an anonymous method (e. (e.g. a type safe function pointer to code). So we could again rewrite our code as follows:

rssfeed
    .Descendants(_atomNamespace + "entry")
    .Where(delegate(XElement item) {
        return (DateParser.ParseDateTime(
            item.Element(_atomNamespace + "published").Value)
            >= this.LastDownloaded) || this.LastDownloaded == null; }
)
    .Select(delegate(XElement item) {
        return (Entry)new AtomEntry(item, this); }
);

Back in familiar territory yet? If not you probably aren't familiar with C# 2.0. In the background the compiler takes the anonymous methods above and turns them into methods on the current class and instantiates new delegates of the correct type that points to them and passes them to the Select() and Where() methods.

The The key thing to note is that the arguments for select and where are delegates, and so when those delegates are executed is beyond our control. In fact if you put a Console.WriteLine or a breakpoint inside of the AtomEntry constructor, it won't get called until the resulting IEnumerable is enumerated, specifically the following line in the first code sample:

foreach (Entry objEntry in lstRecentPosts) {

So that's delayed execution. But understanding how it works and how to use it are completely different things.

The Inelegant Solution

Getting back to my code sample you may have picked up that my where clause is the mistake. I implemented it like this because RSS and Atom have different field names for the published date. But the way I wrote it I'd have to make two changes if I wanted to change which entries to download. Ok, big deal, I'm extremely unlikely to make changes to that where clause right? Or I wasn't until I wanted functionality to set some defaults based on the average length of posts prior to downloading posts. Basically:

public static Feed CreateFeed(string strUrl, int intDisplayOrder) {
    IEnumerable<Entry> lstRecentEntries = feed.GetEntries(rssfeed);
    double intAveragePostSize = lstRecentEntries.Average(
        i => i.Description.Length);
    // if the feeds posts are typically small then include the
    // description field in the summary and download the content
    // for the main article from the link
   
if (intAveragePostSize < 1000) {
        ...
    } else {
        ...
    }
}

Except this now ties me to the were clause, when what I'd really like to do is just get the average post size for the last couple of posts. The problem is that GetEntries() isn't generic enough.

The Elegant Solution

The The solution is then to normalize out (excuse the database terminology) the where clause into the two methods that use GetEntries(). So GetEntries() becomes simple:

public override IEnumerable<Entry> GetEntries(XDocument rssfeed) {
      return from item in rssfeed.Descendants(_atomNamespace + "entry")
               select (Entry)new AtomEntry(item, this);
}

And then Feed.CreateFeed() and Feed.DownloadRecentEntries() become more complicated

public abstract class Feed {
    public
abstract IEnumerable<Entry> GetEntries(XDocument rssfeed);

    public
static Feed CreateFeed(string strUrl, int intDisplayOrder) {
        IEnumerable<Entry> lstEntries = feed.GetEntries(rssfeed);
        // get the five most recent posts
        IEnumerable<Entry> lstRecentEntries =
            from entry in lstEntries.Take(5)
            select entry;

        double
intAveragePostSize = lstRecentEntries.Average(
            i => i.Description.Length);
        if (intAveragePostSize < 1000) {
            ...
        } else {
            ...
        }
    }

    public
void DownloadRecentEntries(...) {
        XDocument xdocFeed = XDocument.Load(Url);
        IEnumerable<Entry> lstEntries = GetEntries(xdocFeed);
        // get newly published posts
        IEnumerable<Entry> lstRecentPosts = from entry in lstEntries
            where (entry.Published >= this.LastDownloaded)
                || this.LastDownloaded == null
            select entry;


        foreach (Entry objEntry in lstRecentPosts) {
            objEntry.Download(...)
        }
    }
}

Note that we now have a second LINQ statement that runs against the results of the LINQ statement in GetEntries(). But since nothing's been executed yet we're just building out the statement that we will eventually run when the resulting IEnumerable if enumerated. So we've now spread our LINQ statements across an inheriting and a base class, and in process we've made GetEntries() extremely generic.

Conclusion

So what's the big deal? The big deal is that we can spread our data access statements across multple classes and because of deferred execution we don't need to worry about the performance of generic methods that are closer to the data that don't contain a "where" clause. This may not be a huge deal in this example, but it becomes extremely powerful when the user interface tier can tack on "order by" statements or "filters" BEFORE anything is executed against your data store. And that, for me, is at the heart of the beauty of LINQ.

Posted by lrichard Feb 20 2008, 09:45:58 AM EST
20080214 Thursday February 14, 2008
Daily RSS Download

I published Daily RSS Download, my first open source project on CodePlex* today. It's not going to change the world, but if you have a need for it there is a decided lack of decent products that perform this functionality. In this post I'll give a little background about why I wrote it and explain what it does and how to use it. Besides needing this functionality I also wrote it to learn LINQ to Objects and LINQ to XML, but I'll cover the more interesting implementation details in a later post.

Why I Wrote It

For Christmas I received an IRex Iliad which is an e-book reader combined with a Wacom tablet. It's an awesome product that allows reading PDF's (among other formats) and writing on them. It's pricey, but the ability to jot notes on technical documents (in addition to recipes and guitar tablature, etc) as you read is invaluable for me. I now read about twice as much as I did before. It supports Wi-Fi, and in particular can connect to a computer on a regular basis to download files you put in a specific directory.

So theoretically it could download a customized newspaper every morning for me, right? I could have today's world news, national news, local news, technical news, weather, and my RSS feeds like Scott Gunthrie all in one place while I eat my cereal! And then I could cancel my Washington Post subscription and after about 7.5 years I would have recouped the costs of the Iliad. Sweet.

The problem is that the product doesn't come with any way to download RSS feeds. Well, you can use software from MobiPocket, but it's a pain to setup, and use, and I couldn't figure out how to have it automatically run on a daily basis. And furthermore it can't grab the real content from the website if the RSS feed only contains an abstract (e.g. washingtonpost.com). I searched and there was some software out there, but none of it did what I liked. And of course none of it generated a manifest.xml file which is an Iliad specific file that links HTML pages together and gives names to groups of content (i.e. grouping the files in a directory to make a “book" called “My Daily News for February 13").

So what a great opportunity to write it myself and learn LINQ to XML and LINQ to Objects in the process.

What It Does

The end result (or the index page anyway) looks something like this:

The images are local, the links go to a full page of content, and on the Iliad, because Daily RSS Download generates a manifest.xml, the next and previous buttons can move you to the next or previous article and you can see at a glance how many articles there are.

If you want to recreate the screenshot above, first head over to the Releases page of Daily RSS Download, where you can download the msi and install the application. When you open “Daily RSS Download Config" you can view a home page like this:

You can type in an RSS or Atom URL and click Add Feed. The application will try to connect to the website, download the title, and set some configuration options based on the average length of posts (specifically if you put in a feed from the washingtonpost.com website it will detect that the average post size is small and determine that it should download the main content from the website).

You can click on any of the feeds you've added and you'll get a Feed Settings page like below:

The fields are mostly self explanatory, but here are three of the more interesting settings:

Summary Source Values:

This setting determines where the abstract (summary) on the index page should come from. There are three options:

No Summary – Does not display a summary on the index.html page. This is what Scott Gunthrie's feed was set to in the first screenshot.

Extract from the content – This takes the first 300 characters from the main content as the summary. This was set for the washingtonpost.com feed in the first screenshot (although Use the RSS description field would actually have been more appropriate).

Use the RSS description field – This uses the entire description field from the RSS (or Atom) feed. This is what the weatherbug feed was set to in the first screenshot. Obviously this is a bad choice for a Scott Gunthrie type of RSS entry since he posts everything in the description field.

Content Source Values:

This setting determines where the main content page should get it's value. There are thee options:

No content, summary only – If you set a feed to this, then Daily RSS Download won't generate a content file. This would be a good choice for the weather feed in the example.

Use the RSS description field – The content file will be created from the RSS description field. This would be a good choice for a Scott Gunthrie type of feed.

Download from the referenced web page –Daily RSS Download will download the page referenced by the RSS or Atom feed. This would be a good option for a washingtonpost.com type of feed.

Content Start/End Markers

These are regular expressions that are used if you set content source to download the referenced web page. You can leave them blank or you can set them if you want to try to strip out header, footer, navigation bars, etc. The content start marker in the screenshot:

\<div id=\"article_body\"[^\>]*\>

Says match ‘<div id="article_body"' up through to the next ‘>'. Both markers are exclusive (the thing your matching on won't be included in the results).

Customizing the CSS

So that's it for the general settings and use. You can click “Download Now" on the main config page to download your feeds, and you can set it up to run on a recurring basis (it will only download new content) by setting a recurring task to run “DailyRssDownload.exe DownloadNow". The only other thing of interest is to make the content more pretty.

The generated HTML is CSS customizable, so in order to get the two column look above (and/or make it look pretty on an Iliad) you can customize the CSS as below:

h1
{
      margin-top: 0px;
      /* A pretty linux script font since the Iliad has a linux kernel */
      font-family: Zapf Chancery;
      font-size: 30pt;
      margin-bottom: 0px;
}
h2
{
}
.NewsHeader
{
      border-bottom: solid 1px black;
      text-align: center;
}
.DailyRss_Date
{
      text-align: center;
}
.DailyRss_Feed
{
}
.DailyRss_Entry
{
}
.DailyRss_EvenEntry
{
}
.DailyRss_OddEntry
{
}

/* LEFT COLUMN */
#ScottGusBlog
{
      float: left;
      width: 49%;
      border-right: solid 1px gray;
}
#washingtonpostcom-TodaysHighlights
{
      clear: both;
      float: left;
      width: 49%;
      border-right: solid 1px gray;
}

/* RIGHT COLUMN */
#WeatherBugLocalWeatherfor20190
{
      margin-left: 50%;
}

So basically just use the old float left, width 50%, margin-left 50% trick to get the pretty two-column look (without tables).

Conclusion

I hope you find the Daily RSS Download open source project useful. Please feel free to submit suggestions, feature requests, defects or preferably defects AND patches on the project's CodePlex home page.

* In case you aren't aware CodePlex is an open source project hosting website from Microsoft. It's similar to Source Forge, except there is no approval process for new projects and it integrates nicely with Visual Studio.

Posted by lrichard Feb 14 2008, 11:29:00 AM EST
20070830 Thursday August 30, 2007
Create Data Disaster: Avoid Unique Indexes – (Mistake 3 of 10)

I really enjoyed Seth Schroeder’s critique of the last post in my ten part data modeling mistake series: Surrogate vs Natural Primary Keys. His argument regarding data migration in particular sheds light on a major shortcoming of using surrogate keys: they lead data modelers to a false sense of security regarding the uniqueness of data. Specifically if modelers ignore uniqueness constraints they allow duplicate data. And as Seth points out this has a nasty side effect of disallowing any clear way to compare data between systems. But there are other problems too.