Results 1 to 7 of 7
  1. #1
    Join Date
    May 2006
    Posts
    6

    When to use XML instead of database tables for detailed structures

    I'm looking for links that discuss using XML instead of multiple tables for detailed, key-value-specific data structures.

    I'm seeing certain control structures in my DB design that form little hierarchies of values. These could be rendered in the DB using several "small" tables, or in an XML document that's stored in a single field in the parent records of these structures.

    I'd like to know if there are any general guidelines or lessons learned about when to stop building smaller table structures and when to use XML in these cases.

    Any guidance on this would be appreciated.

    Thanks!

  2. #2
    Join Date
    Sep 2003
    Location
    The extremely Royal borough of Kensington, London
    Posts
    778
    Yuck!

    http://thedailywtf.com/forums/60879/ShowPost.aspx

    Not only do you encounter the problems that are discussed in the article to which the above link points, but most likely your database will not have a single unique constraint nor any foreign key constraints.
    Bessie Braddock: Winston, you are drunk!
    Churchill: And Madam, you are ugly. And tomorrow, I'll be sober, and you will still be ugly.

  3. #3
    Join Date
    May 2006
    Posts
    6
    Thanks for the input. The url definitely landed right on the problem (though it seems to have revealed some "religious" sentiments on the subject).

    I've concluded that the question is largely motivated by my inexperience (and thus trepidation) about the relational model, and that in the detailed use cases it's going to fall out in favor of tables most of the time.

    From pure SQL processing there's no question that XPATH is a headache. I'm planning to use a lot of DataTables in memory, though (perhaps another theological discussion :0), where I could encapsulate the XPATH processing in subroutines....but even then, I'd be making the DB harder to access if there are other uses in the future.

    Maybe that's a strong determining factor. If XML looks "easier" now, would it be "easier" for extensions of the application in the future? Sure, "those programmers" could learn and use XPATH, but isn't that just another barrier to extensibility of the app?

    I'd say that insight right there weighs heavily against the XML alternative, unless there's a partcularly strong reason to choose it.

    Thanks again for the prompting on this.

  4. #4
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    I've been down that path many times. Sometimes things go in favor of the database, sometimes in favor of application based processing (where the code "insures" the data integrity).

    I actually did a comparative study three years ago, against several systems that had similar lifespans but drastically different data volumes. The system with the highest data volume used tables almost exclusively, and had referential integrity. We estimate that it cost about 13% more per function to code using RI, but there was almost zero cost for maintaining data integrity (there was some cost due to having to figure out how to match other systems that did not have DRI, so they had "bad data"). The applications that had application enforced integrity cost 300 to 5000 times as much for ongoing data maintenance (note that is times, not percent!).

    The short answer is that if you're going to have very small numbers of transactions (a few hundred per year), by a very small group of users (mid-teens), then having the application handle the integrity can be worthwhile to save coding costs. If you are going to have any significant number of transactions (a thousand or more per year) or a significant number of users (more than twenty), you will probably save money in the long run by using tables with declared referential integrity.

    -PatP

  5. #5
    Join Date
    May 2006
    Posts
    6
    Thanks for the input. That's a very useful perspective. I'm going to be doing the "application processing" using VS2005 Datasets, which offer to provide data integrity...I'll be taking that offer much more seriously, based on your comments.

  6. #6
    Join Date
    Sep 2003
    Location
    The extremely Royal borough of Kensington, London
    Posts
    778
    A very wise move would be to develop all of the data processes in the database, given that data processes interact with data that would be stored in the database.

    It still amazes me that people believe applications should handle data processes. Funny, I've seen this happen in reality and the resulting systems are horrendous - extremely slow, extremely difficult to maintain, and full of corrupt data.

    Further a change to a data process as defined from someone non-technical, which is how it should work, can be compared in terms of development costs by considering the following:

    1 line amendment to a stored procedure

    vs

    1000 line amendment to some C# code

    The stored procedure can be tested very easily and quickly, where in contrast the application change would take days to test at minimum.

    Regarding DataSets, I fail to see how these can add any value to your application at all. Simply return a cursor to the result set in the database and use the strengths of a web language like ASP.NET correctly, which are certainly not data processing.
    Bessie Braddock: Winston, you are drunk!
    Churchill: And Madam, you are ugly. And tomorrow, I'll be sober, and you will still be ugly.

  7. #7
    Join Date
    May 2006
    Posts
    6
    Re: Datasets

    I'm a newbie to contemporary DB design. Perhaps the biggest uncertainty I face is actual performance metrics (response time to the user), which always seem to get down to "it depends".

    The overall scenario for my app is described here:

    http://www.dbforums.com/showthread.php?t=1217910

    Datasets are presented in the literature on ASP.NET 2.0 as being a representation of SQL Server tables, with relationships, in memory, in a tool set that includes TableAdapters (DataAdapters, at the DataSet level only, in ASP.NET 1.x) and relationships (which again can offer referential integrity). Their purpose (again from the literature, and this is my interpretation) is to provide an "intelligent" caching layer. TableAdapters also (for me) provide an "interface" mechanism between code and stored procedures. For me, this means that the SQL commands can be encapsulated, and that the application writing can focus on business logic. TableAdapters also automagically generate update, delete and insert commands, which is an unqualified plus for me, not being that familiar with SQL.

    The I/O requirements for my app are mostly 1) present a selection of rows in a Gridview, and 2) insert rows back into the table. In the first release there won't be a lot of need for unions, joins, distincts and so on ("and so on" = "I don't know this stuff very well"). However, I do want to allow for complex operations on the DB in the future...that was the deciding reason to adopt the principle of "relational" over "xml" for the control tables.

    I've made two major assumptions here about what Microsoft is doing with Datasets. 1) They've anticipated the problems of referential integrity and "data processing" code maintenance, and 2) they think that there is some performance benefit to using cached datatables (which is a strong enough benefit to outweigh any lost performance due to the TableAdapter interface). I haven't been able to net-find anything about performance issues with TableAdapters...maybe they're too new, but I would think that would have come out.

    Since (from the overal scenario description above) the app is going to serve "slices" of the overall SQL table set to groups of users, with lots of refreshing of the row set views (in the ASP.NET 2.0 Gridview UI control), using DataTables seemed like a natural fit. From an ease of use standpoint, being a relative newbie to this type of coding, it's an attractive choice.

    Going back to performance, one point I haven't validated one way or the other is whether I could get the caching provided by the DataTable/TableAdapter approach through built in, internal features of SQL Server. I'm assuming the DB engine is doing a lot of intelligent internal caching and indexing based on "what the users are asking for", so there might be some redundancy in using DataTables. On the other hand, since DataTables and TableAdapters are very closely integrated with SQL Server 2005, I'm assuming that MS has anticipated that and offered DataTables etc. as a genuine additional benefit on top of internal SQL Server caching.

    There are lots of reasons to question Microsoft technology, and plenty of examples of MS products that have issues, but MS put a whole lot of work into the DataTable/TableAdapter/SQL Server techonology (and I'm sure I've barely scratched the surface in terms of using it), and they know they're competing head to head with JSEE + every other DB on the planet. I assume they tried to make it work as well as they could.

    So, when an internet distributed app requires HTML, JS, CSS, VB or C-something, XML, SQL, and what-did-I-leave-out, for a non-professional programmer like myself, datasets (meaning DataTables etc.) and the integration they provide look like a pretty good deal.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •