Near Infinity

Performance: LINQ to XML vs XmlDocument vs XmlReader

By Joe Ferner

May 01, 2008

I recently had a project where I needed to ingest large XML documents using C# so I was curious which XML reader technology would be the fastest. So I coded up a quick benchmark that would compare LINQ to XML, XmlDocument.Load, and XmlReader against each other.

The Test Data

I generated a very simple XML file before each run of a test. The id's were random and the number of "child" nodes varied based on the run. The following is an example of the test data I used.
<span class="kwrd">&lt;</span><span class="html">root</span><span class="kwrd">&gt;</span>
  <span class="kwrd">&lt;</span><span class="html">child</span> <span class="attr">id</span><span class="kwrd">='123'</span><span class="kwrd">/&gt;</span>
  <span class="kwrd">&lt;</span><span class="html">child</span> <span class="attr">id</span><span class="kwrd">='234'</span><span class="kwrd">/&gt;</span>
  ...
<span class="kwrd">&lt;/</span><span class="html">root</span><span class="kwrd">&gt;</span>

The Test

As I said before I wanted to compare LINQ to XML, XmlDocument.Load, and XmlReader against each other. I ran each of these technologies using 1, 10, 100, 1000, 10,000, 100,000 "child" nodes. I also ran each against a XML document using UTF-8, ASCII, and UTF-32 encodings. Each iteration was run 100 times to reduce anomalies. In each of the tests I call the method "ProcessId" which simulates the processing of the "id" attribute.

XmlDocument.Load

I thought the code for XmlDocument.Load was the cleanest and easiest to understand, although I must admit I like XPath. XmlDocument does have some security concerns but that's another post. Here is the code I used to load and search the document:
<span class="kwrd">private</span> <span class="kwrd">static</span> <span class="kwrd">void</span> XmlDocumentReader(<span class="kwrd">string</span> fileName) {
    XmlDocument doc = <span class="kwrd">new</span> XmlDocument();
    doc.Load(fileName);
    XmlNodeList nodes = doc.SelectNodes(<span class="str">"//child"</span>);
    <span class="kwrd">if</span> (nodes == <span class="kwrd">null</span>) {
        <span class="kwrd">throw</span> <span class="kwrd">new</span> ApplicationException(<span class="str">"invalid data"</span>);
    }
    <span class="kwrd">foreach</span> (XmlNode node <span class="kwrd">in</span> nodes) {
        <span class="kwrd">string</span> id = node.Attributes[<span class="str">"id"</span>].Value;
        ProcessId(id);
    }
}

LINQ to XML

LINQ to XML was also very easy to read and understand code. I did find that even though LINQ to XML is supposed to use XmlReaders under the covers calling XDocument.Load does read the whole document into memory before returning. So if you are looking for data at the top of middle of a very large document this could be a concern. Here is the code I used to load and search the document:
<span class="kwrd">private</span> <span class="kwrd">static</span> <span class="kwrd">void</span> XDocumentReader(<span class="kwrd">string</span> fileName) {
    XDocument doc = XDocument.Load(fileName);
    <span class="kwrd">if</span> (doc == <span class="kwrd">null</span> | doc.Root == <span class="kwrd">null</span>) {
        <span class="kwrd">throw</span> <span class="kwrd">new</span> ApplicationException(<span class="str">"invalid data"</span>);
    }
    <span class="kwrd">foreach</span> (XElement child <span class="kwrd">in</span> doc.Root.Elements(<span class="str">"child"</span>)) {
        XAttribute attr = child.Attribute(<span class="str">"id"</span>);
        <span class="kwrd">if</span> (attr == <span class="kwrd">null</span>) {
            <span class="kwrd">throw</span> <span class="kwrd">new</span> ApplicationException(<span class="str">"invalid data"</span>);
        }
        <span class="kwrd">string</span> id = attr.Value;
        ProcessId(id);
    }
}

XmlReader

XmlReader, specifically XmlTextReader was the hardest to write and understand. With it's quirks of being a forward only reader you need to take what you need while you have it because you can't rewind.
<span class="kwrd">private</span> <span class="kwrd">static</span> <span class="kwrd">void</span> XmlReaderReader(<span class="kwrd">string</span> fileName) {
    <span class="kwrd">using</span> (XmlReader reader = <span class="kwrd">new</span> XmlTextReader(fileName)) {
        <span class="kwrd">while</span> (reader.Read()) {
            <span class="kwrd">if</span> (reader.NodeType == XmlNodeType.Element) {
                <span class="kwrd">if</span> (reader.Name == <span class="str">"child"</span>) {
                    reader.MoveToAttribute(<span class="str">"id"</span>);
                    <span class="kwrd">string</span> id = reader.Value;
                    ProcessId(id);
                }
            }
        }
    }
}

The Results

The following results are in milliseconds for each run. I took the total time to run and divided it by 100.

UTF8Encoding

1101001,00010,000100,000
XmlDocument0.15678000.17134500.38886201.981648022.8049260459.8570340
XmlReader0.14674600.14395800.23005000.85344007.577164076.8635690
LINQ to XML0.14995300.15006400.27787201.461673015.7719020208.9360300

ASCIIEncoding

1101001,00010,000100,000
XmlDocument0.16593500.19220800.34331401.984633022.5484690482.8699720
XmlReader0.13768400.14537300.21998100.87682607.918738077.7760560
LINQ to XML0.13459000.15733400.28484201.488993015.1504500214.9338990

UTF32Encoding

1101001,00010,000100,000
XmlDocument0.16723700.17997800.41562502.718837030.6423960543.4604540
XmlReader0.13868200.15038700.28674001.498107014.4428430152.7660780
LINQ to XML0.13170600.18666100.53859402.363129021.4566290274.3280280

Conclusion

XmlReader beats LINQ to XML in almost every run except for very small XML documents. What's interesting is how the numbers scale between the encodings. XmlReader is over twice as slow when reading UTF-32 documents verse UTF-8 or ASCII encoded XML, yet LINQ to XML and XmlDocument slowed down by a much smaller amount. If you need speed when reading XML documents stick with XmlReader. If you need readability and maintainability of your code go with LINQ to SQL or XmlDocument.