Results 1 to 10 of 10
  1. #1
    Join Date
    Jun 2009
    Posts
    11

    How do I normalize this without resorting to dynamic SQL?

    I have an existing database design which I would like to normalize. Essentially, the table causing me problems is one with about 150 columns. The table holds live data for a natural gas well. So, there are columns for the various temperatures, pressures, etc. The data types of these columns vary which is what is causing me grief, but normalizing the table would allow additional data points to be added dynamically. My initial thought in normalizing was to create a table with (among other columns) a single column to hold the values for the temperatures, pressures, etc. The problem of course is the variety of data types. So, I would have to store them all in a varchar or something and cast them coming out with dynamic SQL. Given the possible performance issues related to casting everything & the drawbacks inherent in using dynamic SQL, I was hoping to find a better solution. Has anyone come across a similar situation and come up with a better solution?

  2. #2
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Sounds like you are referring to the infamous OTLT (One True Lookup Table) anti-pattern.
    Not a good idea.
    Create separate tables for your lookups.
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  3. #3
    Join Date
    Jun 2009
    Posts
    11
    Thanks for pointing me in the right direction. It looks like what I'm describing specifically is EAV.

    Essentially, our existing table to hold live data for each well has roughly 150 columns. The issue we encounter is that each customer's needs for data gathered are a bit different. So a typical customer will only require 20 of those columns, but will need 5 more that we don't have. So, our predicament is that we must add new columns for each customer we encounter, and each customer does not use the majority of the existing columns. We are trying to come up with a design that does not require modifications to the database & supporting code for each new customer.

    I've done some reading on EAV & there seems to be mixed thoughts. Many people say to avoid it, but it seems in some circumstances, it is the only way to accomplish the goal I've stated above. I've also read that a 6NF database is the way to truly do EAV correctly, but that seems pretty complicated and information is scarce.

  4. #4
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    As long as you don't care about performance or code stability, either EAV or XML will make it very easy to code for this kind of problem. Both tools have their place (I can and do use them when it is appropriate), but they play havok with support, stability, and performance when you use them.

    If you think of it from another perspective it might help you to see and understand the issue. Let's suppose that you have customers that buy/sell/trade puppies, cars, food, and hair care services... Those customers have to decide to either use a common structure (such as SAP, PeopleSoft, QuickBooks, Great Plains, etc) and get the benefits of well tested and well supported code, or they need to decide to "write your own" system that will do exectly what they need for their business, but at a much higher cost than an "off the shelf" system.

    If you can create a basic schema that will serve the needs of most of your customers, then you can write, test, and maintain one system. If not, you need to at the very least add EAV or XML. then support the chaos that will create. Either solution is viable, depending on how many man-hours you have and what your customers are willing to pay.

    -PatP
    In theory, theory and practice are identical. In practice, theory and practice are unrelated.

  5. #5
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    IMO, XML is a better solution than EAV.
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  6. #6
    Join Date
    Jun 2009
    Posts
    11
    Thank you guys for the replies. Are EAV & XML the only viable options for something like this?

    Also, which would perform better in terms querying the data? In this application, queries like the following would be common:
    - select 25 datapoints per well for 300 wells & display them in a tabular format.
    - aggregate (sum, avg, etc.) 15 datapoints for the 10 - 1000 wells in a given geographical area
    - pull the values for 3 or 4 datapoints for the past month for trending

    Would these queries be possible, let alone performant in either EAV or XML? or is there another approach?

  7. #7
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    EAV or XML should only be used when they data elements you need to store are not defined. This should be due to the dynamic nature of the data, and not because of a lack of thorough requirements analysis. ONLY the undefinable elements should be stored in EAV/XML. All fixed and common attributes should be stored in standard relational format.
    My preference would still be XML. Though there is a learning curve involved in using the Microsoft proprietary commands, I suspect that performance would be better. EAV is notoriously difficult for coding, maintaining referential integrity, and performance.
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  8. #8
    Join Date
    Jun 2009
    Posts
    11
    Thank you both. I appreciate the advice. Looks like I'll go learn more about working with XML.

  9. #9
    Join Date
    Jun 2004
    Location
    Arizona, USA
    Posts
    1,848
    IMO, you would want to start with a well table, containing data pertaining only to the well as a whole. Physical data (ID, Description, Location, depth, diameter, casing info, etc, etc, etc.) One of these data fields might contain the number of instrument readings associated with the well.

    A child table if the well table could then contain instrument-specific data (Well ID (FK to the Well table,) Instrument Tag, Instrument Type (Pressure, Temperature, Flow, etc,) Recording Interval, Signal type, upper/lower limits, upper/lower alarm limits (Hi/Lo/HiHi/LoLo,) etc. This table contains INSTRUMENT specific data; not measurements, and should contain one instrument 'header' record for each instrument associated with a well.

    Finally, a child table of the Instrument Header would hold the actual instrument data readings. The measurements. The fields would include the Instrument Tag (FK to the Instrument Header,) TimeStamp, Measurement Validity, and Measured Value.

    If you need a system with non-unique Instrument tags, then your instrument header would need a surrogate PK, and the Measured Data table would reference the header's surrogate key instead of the instrument tag. (A surrogate key would undoubtedly save much disk space over a CHAR or VARCHAR instrument tag, as well.)

    The validity flag is important; modern instrumentation is often bus-based, and the measurement validity is available. (A defective temperature sensor can be indicated as a reading at the upper end of the temp scale or at the lower end of the scale, depending upon the transmitter...)
    Last edited by loquin; 03-08-11 at 14:07.
    Lou
    使大吃一惊
    "Lisa, in this house, we obey the laws of thermodynamics!" - Homer Simpson
    "I have my standards. They may be low, but I have them!" - Bette Middler
    "It's a book about a Spanish guy named Manual. You should read it." - Dilbert


  10. #10
    Join Date
    Jun 2009
    Posts
    11
    Lou, thanks for the input! Sounds like you've done some work in Oil & Gas .

    I agree the DB design is certainly more complicated than a single table to hold all data points. Sometimes the data to well ratio isn't 1 to 1 either. Sites with multiple wells per RTU or multiple RTUs per site make life difficult as well. But, for this post I wanted to focus on the database issue of a database design which would accomodate the addition of new data points without alterations to the database itself, i.e. adding columns.

    It looks like your example accomplishes that by storing measurement values in the "Measured Value" column of the "Measurements" table (using your example). Correct me if I'm wrong, but it appears that your solution fits in the EAV category. At some point when pulling the data out, it would have to be pivoted, and data types would need to be cast because the measured value column would have to generic enough (varchar or something) to handle strings, dates, reals, ints, etc.

    My database design so far is actually very similar to yours. That is the predicament I found myself in when I started this thread. Pivoting & casting everything as it comes out is slow & it requires dynamic SQL which is difficult to troubleshoot. I have seen other aproaches to EAV which do not require casting, but the queries are equally complicated & slow. The issue with both XML & EAV seems to be that the queries become much more complicated & difficult to maintain. But, I have yet to find a better solution so maybe it just comes with the territory.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •