Results 1 to 7 of 7
  1. #1
    Join Date
    Mar 2009
    Posts
    14

    Balancing Normalization And Immutable Data

    Hi All,

    Just wanted to get some people's thoughts on how to handle immutable data and what level they tend to take it too?

    Say you have a database with a Customers, Addresses, Orders, OrderLines and Products tables.

    Customers (CustomerId, Name, Description, AddressId etc...)
    Addresses (AddressId, Number, Line1, Line2, etc...)
    Products (ProductId, Code, Description etc...)
    Orders (OrderId, CustomerId, etc...)
    OrderLines (LineId, OrderId, ProductId etc...)

    Now the design above is fully normalized. But you run into a problem if a customer asks for a copy of the old order. ie. If the customer has changed their name or address... if the product's description has changed etc.

    The more normalized you keep things the more you lose history. If you want to keep this data it leads to a lot of data being stored in the orders and orderLines tables (you could say that this history data is dependent on the order and not the customer or product anymore so everything is still normalized.) This leads to the order and orderLines having a lot more fields though. I was interested to find what level people take this too in there own systems.

    1) Does your invoice and order tables etc end up with fields for almost every bit of history data... ie. They don't really have any relation to the original customer or product tables anymore?

    2) Would you make the address table immutable in the system so that when a customer changes address they get a completely new addressId and the old one always remains. This will help having to store all the individual address fields in the order itself. But would also lead to a address table that contains a lot of data is not being used anymore?

    3) Where do you draw the line? Obviously the product price would be stored in the orderLines table. We want the price at that point in time. But what about the customer name in the order table? Would you want the order to reflect the customer's name at that point in time or do you care that the order loses that information when the customer changes it?

    Obviously a lot of this is subjective and dependent on the customers business and system design. I was interested to know what other people currently find themselves doing so that I can make the best recommendations?

    Cheers

  2. #2
    Join Date
    Mar 2009
    Posts
    14
    Quick addendum: It occurs to me this question is really about normalization but rather to what level people go to store temporal data and how they go about it. Sorry for the mislead...

  3. #3
    Join Date
    Nov 2004
    Location
    out on a limb
    Posts
    13,692
    the problem lies in the way you model the current situation and a specific point in time.
    there is an argument that your invoice is a separate section of data. it has its own address
    it shas its own order items (including prices, taxes and so on) and so on. its a separate legal document. ie if requested you should be able to reproduce that document as proof of sale, whether that request comes from your auditors, customers or tax collectors. you shouldn't be using the price from the product fle on a referenced basis. when the sale is conducted you should copy the price from the file in my books. yes you can store price histories if you so wish but that gets complicated and easy to screw up. the cost of storing the data is bobbins so why not store the data as a separate invoice.

    as to whether it breaks normalisation rules.. that's up to you, it depends on how you view the model.
    I'd rather be riding on the Tiger 800 or the Norton

  4. #4
    Join Date
    Dec 2007
    Location
    Richmond, VA
    Posts
    1,328
    Or change your model, so that info is retained in another way. Such as:

    Customers (CustomerId, Description, etc...)
    Cust Name (CustomerId, NameId(make a smallint and increment by 1 for each customer), Name, startdate, enddate, etc...)
    Cust_Addresses (CustomerId, AddressId(make a smallint and increment by 1 for each customer), Number, Line1, Line2, Startdate, Enddate etc...)
    Healdem, makes a good point about your invoice though.

    Dave

  5. #5
    Join Date
    Jun 2007
    Location
    London
    Posts
    2,527
    The address you're wanting to store is the address the product was shipped to and should be treated entirely separately to the customers address even if the data is just copied from there. I'd do the same for any customer facing data or pricing info.

  6. #6
    Join Date
    Mar 2009
    Posts
    14
    I agree with everyone here. It's not really a normalization issue as it is determining which data needs to be kept for historical purposes for certain transactions such as invoices.

  7. #7
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Purchase Order, Invoices, legal contracts, etc, are snapshots of the data at the time the agreement was finalized. Therefore, the elements of these entities are no longer related to the original parent elements, and becomes exclusively owned by the inheriting entity.
    Long story short, these elements should not be stored as foreign keys in your PO, Invoices, etc records. Updates to the original referenced records should never cascade to a record that represents a point in time, and it is not redundant or a violation of normalization to store them separately.
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •