Results 1 to 15 of 15
  1. #1
    Join Date
    Jan 2005
    Location
    Green Bay
    Posts
    201

    Unhappy Unanswered: Database Design Theory

    I have been tasked with creating a Data Warehouse.

    Problem is that old storage vs reporting debate.

    I have determined that the data that I will recieve and store will be like follows (simplified form) for expandability

    KEY FldKEy FldData DateTime AuditTrail

    Daily I will use this data based on use input process this data into the following format and say
    if fldkey/ flddata open a cycle.
    populate row with null close date
    if fldkey/ flddata closes cycle
    update row with date

    If fldkey/ flddata changes a cutable value
    update row

    if fldkey/ flddata changes a cutable value (type 2 table)
    insert a row into detail update value and obsolete previous row.


    KEY DateStart DateEnd FLDDATA1 FLDDATE2 Op_Cl_IND HEADER Record

    KEY EFFdate OBSDATE FLDdata3 FLDData4 Detail Records
    KEY EFFdate OBSDATE FLDdata3 FLDData4
    KEY EFFdate OBSDATE FLDdata3 FLDData4

    Problem: FLDKey is a finite count however the max is undefined.

    IS there any way to solve the problem of not being able to nail down users to tell you what they want to cut by. What I have been instructed by mgr (old IDMS) is that they wish to see all on the FldData and have the ability to cut by all of it. However the Flddata could be anything (cannot be indexed).

    400,000,000 rows at least.

    Do I need to nail the users down or am I am missing something.

    Sorry if so cryptic


  2. #2
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Provided Answers: 1
    Too cryptic, and too much jargon. (Cut?, Cycle?)

    But this sure doesn't look like a data warehouse project to me.
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  3. #3
    Join Date
    Jan 2005
    Location
    Green Bay
    Posts
    201

    Post

    Thought so.

    User requirement store a undetermined amount of data about an item

    Cannot create a structure with
    Key Make Model Year Cost Sold Junked
    1 Ford Taurus 97 10000 1/1/98 2/28/06
    2 Ford Ranger 97 20000 1/12/99 null


    Need to create
    Key Fieldname data
    1 Make Ford
    1 Model Taurus
    1 Year 97
    1 Cost 10000
    1 Sold 1/1/98
    1 Junked 2/28/06

    As the Number of fields captured is not fixed.

    Now I need to sum the that in the data field by a cut of that data.
    Total Cost of All 1997 Ford Cars and grouped by the age in months of the auto.

    Work of a small table to do this however 400 m rows of data over 18 months Tring to find the data 1997 and Ford in the data field then determining the total dollars of those

    Well.

    Question is there some thing that I am missing or do I need to get the user to define a more static structure. And transform this into a star schema.

    The table structure to store the data (Audit trail)

    perhaps datamart is a more appropriate term.

  4. #4
    Join Date
    Nov 2002
    Location
    Jersey
    Posts
    10,322
    So you're trying to do the logical vertical table thingee....

    Why?

    There are sooo many bad reasons for doing this...

    The concept is that you want to be able to add new "columns" on the fly

    I'll tell you this, if you do go this route, you better make sure to use a partitioned view where the tables are in their own file groups and distributed across many physical disks...

    I wonder what you would partition on though
    Brett
    8-)

    It's a Great Day for America everybody!

    dbforums Yak CorralRadio 'Rita
    dbForums Member List
    I'm Good Once as I ever was

    The physical order of data in a database has no meaning.

  5. #5
    Join Date
    Jan 2005
    Location
    Green Bay
    Posts
    201
    That is my thought that this a bad idea.

    I need to convince my MGR and Users to give me more Defined specs.

    Was just asking if there is something that I am missing.

    The Goal to the Audit Trail is to capture every poteitial detail of a thing. And to get that info If you have that things ID you can do this. However any and look at every detail about this. the other reason for the data structure of the audit trail is scalability of data storage.

    If the users can define the key of a thing and details about that I can use the daily audit file to transpose this information.

    I just wanted to make sure that my thoughts were not clouded by preconcieved notions or by lack of knowledge about new techniques.

    Thank you

  6. #6
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Provided Answers: 1
    What you are talking about is known as an EAV (Entity/Attribute/Value) model, and while it is occasionally appropriate, it is very difficult to write code around and I would NEVER recommend it for a "data warehouse".

    If you need an unstructured data schema, consider developing in SQL Server 2005 and using the new XML datatype instead of an EAV design.
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  7. #7
    Join Date
    Jan 2005
    Location
    Green Bay
    Posts
    201
    Thank You BlindMan for putting a name to my problem. (EAV)
    Thank You Brett

    I am only using this struture to store an audit trail of a dynamic data structure.

    I will persuade the users better to define there business needs so that I may make a true Data Warehouse.

    I will use the Audit trail data and evaluate the rows of data that I recieve nighlty to populate the Warehouse. From predifined specs.

    I cannot force a change in the data entering the Database.

    But once here it is mine all mine.. AH HA HA HA

  8. #8
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Provided Answers: 1
    I think you need to rethink your audit trail. If you are trying to use a single table to store all data changes from any other table, you are going to find the resulting dataset quite unwieldy. Either just store a description of the change that occured, or create separate archive tables for each production to table to store a history of modifications.
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  9. #9
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    Quote Originally Posted by rbackmann
    Thank You BlindMan for putting a name to my problem. (EAV)
    have a look at tony andrews' article OTLT and EAV: the two big design mistakes all beginners make

    tony's another dbforums stallwart
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  10. #10
    Join Date
    Nov 2002
    Location
    Jersey
    Posts
    10,322
    To tell the truth...for me, doing audits...simpler is better

    Just have an exact copy of the table.

    Add 3 columns, HIST_ADD_BY, HIST_ADD_DT, HIST_ADD_TYPE

    Create a trigger that inserts into this table for all UPDATE and DELETE Operations and move the entire row

    Put no indexes on this table

    Boom, audit done

    If you want performance on analysis, I'd reccomend that you bcp the data out and load it to a table with indexes (applied after the load)

    I would not want to interfere with the triggers at all

    MOO
    Brett
    8-)

    It's a Great Day for America everybody!

    dbforums Yak CorralRadio 'Rita
    dbForums Member List
    I'm Good Once as I ever was

    The physical order of data in a database has no meaning.

  11. #11
    Join Date
    Jan 2005
    Location
    Green Bay
    Posts
    201
    Thanks all for the reference to the column.

    The EAV I am forced to use is a decision of My Management and cannot push them off of that concept. Oh the fate of a peon. I have explained the pros and cons of this decision but this is what has been decided.

    The other option that they suggested was to dynamically evaluate the key fields and if a new one appears generate the DDL to add a field to a table of 800+ fields and transpose this data.

    They wish to make the data flexible for any forseeable senario without further IT involvment.

    If I had total control .....

  12. #12
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    Quote Originally Posted by rbackmann
    They wish to make the data flexible for any forseeable senario without further IT involvment.
    lol
    I've been enjoying the evolution of this thread. I love it when threads like this end with the management concluding that they can conceive of a model that will require no "further IT involvment".

    I've never had the misfortune to run into a full blown EAV model (appropriately implemented or not) in the flesh but it's pretty clear with only a little reading and even a substandard imagination like mine that this is precisly where they do not lead you.

    Pro - can stick any data in the db
    Con - can stick any data in the db
    Con - Try getting it out again

    "for any forseeable senario" lol.

    Ah well - you tried. Best of luck
    Testimonial:
    pootle flump
    ur codings are working excelent.

  13. #13
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Provided Answers: 1
    Put it in writing/e-mail right now that you think XML is the way to go with this. Then save a copy of the document in your CYA file.
    Some very complex coding is ahead of you.
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  14. #14
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    committing to writing that XML is the way to go may backfire on ya...

    not that i've got any XML experience, but i've heard horror stories about performance, and since XML has such cachet with management, they may take you up on it and then you could be cooked
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  15. #15
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Provided Answers: 1
    Yes, the performance sucks compared with a standard normalized database, but is probably no worse than an EAV design and with a helluva lot less programming. I would not place data in an XML column unnecessarily, but would reserve it only for data that could not be predefined in the business model.
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •