More on XML parsing using XmlSerializer

In my previous post, I talked about how we could use System.Xml.Serialization.XmlSerializer and System.Runtime.Serialization.DataContractSerializer to parse XML into an object tree. I pointed out that DataContractSerializer in my mind has some advantages in that types, properties, and members to use with the deserialization do not need to be public. On the downside, DataContractSerializer puts a quite limiting constraint on the XML format in that it cannot parse XML attributes. This is as far as I know, an absolute constraint that cannot easily be circumvented. Thus, regrettably, if we are not in control of the XML format, DataContractSerializer is sometimes useless.

In those situations, we can still use XmlSerializer. In order to achieve the same encapsulation with XmlSerializer, we have to adjust our model a bit. Here’s one suggestion on how to do this:

My approach to this is inspired by Josh Bloch‘s Builder pattern. The idea is changing the classes used for deserialization from being domain objects to being builder objects that build domain objects. This has another advantage in that our domain objects are not “polluted” with attributes and interfaces related to deserialization and are 100% plain old CLR objects (POCOs). So, lets first do a change our “deserialization” class (formerly ‘Country’) to a builder object, like so:

public class CountryBuilder
{
   [XmlElement(ElementName = "name")]
   public string Name;

   [XmlElement(ElementName = "iso-3166-alpha-2-code")]
   public string Code;

   public Country Build()
   {
      return new Country(Name, Code);
   }
}

Note here that our builder object has a Build() method which returns the domain object ‘Country’. This is the object that we will pass on to our clients. The Country class now represents our domain object:

public class Country
{
   private readonly string _name, _code;
            
   internal Country(string name, string code)
   {
      _name = name;
      _code = code;
   }

   public string Name { get { return _name; } }
   public string Code { get { return _code; } }
}

We have now restricted access to the creation of Country objects in that the constructor is internal, and it is also immutable (cannot change state once created) though making its fields readonly. It only exposes getters for its internal state. We can then do the same to our list of countries. The builder object for countries would look like this:

[XmlRoot("countries")]
public class Countries
{
   [XmlElement(ElementName="country")]
   public CountryBuilder[] countries;

   public IEnumerable<Country> Build()
   {
      return countries.Select<CountryBuilder , Country>(x => x.Build());
   }
}

The code for doing the serialization will then look like this:

string xml = ...;
XmlSerializer xmlSerializer = new XmlSerializer(typeof(CountriesBuilder));
var builder = xmlSerializer.Deserialize(new StringReader(inputXml)) as CountriesBuilder;
IEnumerable<Country> cs = builder.Build();

What we have achieved now is that we now can control the accessibility and encapsulation of our domain model. The builder objects are still public to anyone, but that really does not matter much in my mind. Thus, a relatively swift parsing of XML of various formats into a well designed object model.

Parsing XML using XmlSerializer or DataContractSerializer

If you want to parse XML in .NET, you have a lot of options to choose from. You can use XmlDocument to parse the XML into a DOM tree, you can use the XmlReader to write an efficient “pull” parser, or you can leverage some of the features provided with various serialization APIs.

Given the case where you have a fairly straightforward XML document (not too deep document tree, not too complex set of attributes and elements) that maps pretty well to your domain model, the serialization options is in my mind a good choice that requires little coding. Compared with this approach, using XmlDocument seems to be a bit of an overkill if you don’t need advanced traversal of the document, and writing a parser by hand using XmlReader seems to require quite a bit of coding.

So, given the following sample XML document, I will investigate the serialization options:

<countries>
   <country>
      <iso-3166-alpha-2-code>AF</iso-3166-alpha-2-code>
      <name>Afghanistan</name>
   </country>
   <country>
      <iso-3166-alpha-2-code>AX</iso-3166-alpha-2-code>
      <name>Åland Islands</name>
   </country>
   <country>
      <iso-3166-alpha-2-code>AL</iso-3166-alpha-2-code>
      <name>Albania</name>
   </country>
</countries>

Using System.Xml.XmlSerializer

The first option that came to mind, was to use the XmlSerializer object to deserialize the XML into C# (or VB for that matter) objects. It first requires that I annotate my object model in order to tell the serializer how to deserialize the XML:

[XmlRoot("countries")]
public class Countries
{
   [XmlElement(ElementName="country")]
   public Country[] countries;
}

public class Country
{
   [XmlElement(ElementName = "name")]
   public string Name;

   [XmlElement(ElementName = "iso-3166-alpha-2-code")]
   public string Code;
}

Then, I can use the serializer to deserialize the code:

string xml = ...;

XmlSerializer xmlSerializer = new XmlSerializer(typeof(Countries));
Countries c = xmlSerializer.Deserialize(new StringReader(xml)) as Countries;

Pretty sweet,  heh? Definitely. However, this has some drawbacks. If I want my Country class to be a well designed domain object that follows good OO design principles, I probably would like to encapsulate my data. Furthermore, I might want to restrict the creation of such objects from other parts of the code. In order for XmlSerializer to create my object, it requires that my types are public and that all properties or fields to set are public as well. What to do if I want to enforce my objects to be immutable once handed off to other parts of the code?

Using System.Runtime.Serialization.DataContractSerializer

Luckily, the serialization API that come with Windows Communication Framework has some neat features that fit like a glove. When defining my data model, it does not require that the types, neither the properties nor fields to set are public. Actually, I can restrict access to the type, its default constructor, and any of the properties or fields that I want to be deserialized! w00t!

So, this is what the Country class will looks like:

[DataContract(Name="country", Namespace="")]
internal class Country : IExtensibleDataObject
{
   private Country() { }

   [DataMember(Name="name")]
   public string Name { get; private set; }

   [DataMember(Name = "iso-3166-alpha-2-code")]
   public string Code { get; private set; }

   public ExtensionDataObject ExtensionData { get; set; }
}

The XML file contains a list of countries, and luckily, we have the CollectionDataContractAttribute to denote an element that is a list of elements. It supports generics, so that we can define our class as a strongly typed list:

[CollectionDataContract(Name="countries", Namespace="")]
internal class Countries : List<Country>, IExtensibleDataObject
{
   public ExtensionDataObject ExtensionData { get; set; }
}

And that’s it. Now we can deserialize:

string xml = ...;

DataContractSerializer ser = new DataContractSerializer(typeof(Countries));
using (StringReader stringReader = new StringReader(xml))
{
   using (XmlReader xmlReader = XmlReader.Create(stringReader))
   {
      Countries countries = (Countries)ser.ReadObject(xmlReader);
   }
}

Alternatively, our result could be typed as a list of countries:

IList<Country> countries = (IList<Country>)ser.ReadObject(xmlReader);

Note that there is a limitation in the latter method in that deserializing XML attributes is not supported. Thus, an XML document like the following would not work:

   <country iso-3166-alpha-2-code="AF">
      <name>Afghanistan</name>
   </country>
   <country iso-3166-alpha-2-code="AX">
      <name>Åland Islands</name>
   </country>
   <country iso-3166-alpha-2-code="AL">
      <name>Albania</name>
   </country>
</countries>

This will, however, work using the XmlSerializer.