Tag Archives: datasets

DataSets – thanks, but no thanks

For reasons previously unclear to me, I have not really felt comfortable with ADO.NET DataSets. With regards to topics like testability, object orientation, and encapsulation they always left a bitter taste in my mouth. Furthermore, I have not come across any really good use for them, which nourished my mistrust even more. (I am not saying that there aren’t any good uses, though). So, the other day I started to look deeper into the matter to try to find some more solid arguments.

The first clue I got from David Veeneman‘s article “ADO.NET for the Object-Oriented Programmer – Part One“, where he claims that “ADO.NET doesn’t work with object designs because it’s not supposed to work with objects!” and that the best way to use ADO.NET in an object-oriented design is not to use it. Basically, using ADO.NET “all the way” – including DataSets – will result in a data-driven application rather than an object-oriented application. But I want object-oriented…

In my application, I would like to have my data in business objects, and not in DataSets. Basic concept of encapsulation. I want to place data and operations on that data in my class, hiding the nitty-gritty details from the outside world. As Jeremy D. Miller points out, you cannot embed any real logic in a DataSet, and you have to be careful about duplication of logic. Another point he makes, which I think is very important, is that DataSets are clumsy to use inside an automated tests in terms of test setup. This is exactly the same as I have experienced. Easy testability is something you should look for in your application/library/technology/gizmo.

So, is there any time DataSet should be used? According to Scott Mitchell, (in his article ‘Why I Don’t Use DataSets in My ASP.NET Applications‘) Data sets should only be used in

  1. In a desktop, WinForms application
  2. For sending/receiving remote database information or for allowing communication between disparate platforms

Scott then goes on to conclude that he generally recommends using DataReaders in web applications rather than DataSets. As he points out, you might be tempted to use DataSets to cache data from the database, but argues that you’d probably be better off storing custom objects instead as it is more efficient, and removes the tight coupling to database tables.

Lo, the conclusion. If I were you, I would hesitate using DataSets for anything than small, data driven applications. Here’s why:

  • Results in a non-object oriented application, which in turn hurts lowers cohesion, tightens coupling, and hurts encapsulation
  • Testability suffers. DataSets are very awkward to test in unit tests. Forget about TDD.
  • Tight coupling between database design and the rest of the applications. Makes change cumbersome.
  • YAGNI – DataSets offer a lot of functionality that you probably don’t need. It boils down to design/develop-time efficiency vs. run-time efficiency (pointed out in More On Why I Don’t Use DataSets in My ASP.NET Applications (also by Scott Mitchell). Personally, I think that when the application grows larger, the design/develop-time efficiency is more or lest lost, and maintainability problems with DataSets set in.

You are a Web application developer for a company…

I sometimes wonder how software certification test questions are created. In many occations, the questions seem like they are first and foremost made to show off new and fancy functionality in the framework. In other cases, the questions seem to have been thrown in there just to make the test harder to pass, rather than making sure the candidate has actual insight. Funny thing is that when trying the latter, the question may reveal really poor API design. Take, for instance the following example.

You are a Web application developer for a company. Which of the following would you use to define that column2 has a foregin key relationship with column1:

  1. DataColumn column2 = DataSet1.Tables["Orders"].Columns["CustID"];
    DataColumn column1 = DataSet1.Tables["Customers"].Columns["CustID"];
    DataRelation relCustOrder = new DataRelation("CustomersOrders", column1, column2);
  2. DataColumn column1 = DataSet1.Tables["Customers"].Columns["CustID"];
    DataColumn column2 = DataSet1.Tables["Orders"].Columns["CustID"];
    DataRelation relCustOrder = new DataRelation("CustomersOrders", column2, column1);

So, the difficulty here lies in remembering the correct parameter order for the DataRelation constructor. First of all, the question of remembering this aspect of the API is not a question of insight. Secondly, when considering the user-friendlyness of the API, this reveals a rediciously poor design. Why? If I, for instance, were to review the code above, the code gives me no clue about the direction of the relationship. Which is the primary key column, and which is the foreign key column? Either I would have to remember the constructor parameter order, look at the database design, or look at some other documentation. Preferably, the code should be able to tell me. Consider the following (fantasy) code:

DataColumn column1 = DataSet1.Tables["Customers"].Columns["CustID"];
DataColumn column2 = DataSet1.Tables["Orders"].Columns["CustID"];
DataRelation relCustOrder = column2.isForeginKeyFor(column1);

Much better. Now, the DataColum class has an instance method I can use to define the relationship in a manner that is both much easier to rembember, and easier to read. But still, this has potential for improvement. Why do I need the last line of code? Columns column1 and column2 belong to a DataTable, and that DataTable belong to the DataSet, then the isForeginKeyFor method should be able to propagate the new relation to the DataSet on its own, like so:


Then, we would get rid of an extra line of boilerplate code, and the code snippet would be like so:

DataColumn column1 = DataSet1.Tables["Customers"].Columns["CustID"];
DataColumn column2 = DataSet1.Tables["Orders"].Columns["CustID"];