Page 1 of 2 12 LastLast
Results 1 to 15 of 20
  1. #1
    Join Date
    Jan 2004
    Posts
    145

    combining year based tables into one table

    Before getting into too much detail I have a question about how to go about storing yearly data. I need to be able to go back in time and see a snapshot of the data at that time. I was using Access where table size was an issue, but now I'm going to use SQL Server 2005 Express where it is not, to an extent.

    In Access I used separate tables for each year. It sounds like the best practice would be to combine the tables into one and add a year field which makes sense for my purchases table. However it doesn't seem so simple for my products table. Each year our product table changes, not by much, but a little none the less. If I go back to 2003 I need only products available that year to be listed. This is easy with separate tables and I guess would be easy too if I combined the product tables and used a year field.

    My problem with this is there would be so much duplicate data except for the year field. For example Jim's Candy might be a staple in our product line and doesn't change. There would be several fields, mostly text, that wouldn't change. So for the next 10 years Jim's Candy will be in the table 10 times with nothing changing except the year field. To fix this I thought I'd just make a table that had the product ID and then a year field. It would still contain 10 entries for Jim's Candy, but only two fields instead all the fields in the product table. In reality this intermediate table would hold several other fields that could change on a yearly basis.

    Does this sound like the best method? My overall problem is a bit more complicated, but I'd like to start here.

  2. #2
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    disk space is cheap, properly indexed tables are extremely efficient, go ahead and store the products for every year, it will make your queries a lot simpler too
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  3. #3
    Join Date
    Dec 2003
    Location
    Canada
    Posts
    710
    Rudy's right. As an alternative you could give your products a year range using a start year column and an end year column. You then select all the records where the start year is less than or equal to the snapshot year and where the end year is greater than or equal to the snapshot year.

  4. #4
    Join Date
    Jan 2004
    Posts
    145
    Thanks for the replies. Certus I can't use your suggestion because it is not guarenteed that years a product is available will be contiguous.

    So now for the complicated part. We have suppliers that make multiple products and some that only make one. We need to be able to view data based on the supplier and individual products both. So my thought is to have parent products and child product(s). There would be a parent table and a child table linked by an intermediate table. For each parent there must be at least one child product. Technically the parent would not be a product it would be the supplier. All the products would be in the child table. So using the 'years' table idea I figure I need four tables. One to hold the static info for the parents, one to hold the years and non-static info for the parents, one to hold the static info for the children, one to hold the 'years' and non-static info for the child. The parent would be linked to the child in the last table mentioned. I'll diagram the relationships below.

    Parents:
    + +-ParentID
    | | ParentDesc
    | |
    | | ParentsYears:
    | |_ParentID
    | ParentYear
    | NonStaticField1
    | NonStaticField2
    |
    | +-Child:
    | | ChildID
    | | ChildDesc
    | |
    | | ChildYears:
    | |_ChildID
    | ChildYear
    | NonStaticField1
    | NonStaticField2
    |__ParentID


    Please note I simplified the tables a bit as far as the amount of fields. Basically I'm looking for opinions. I have this setup and it works based on the little testing I've done. However now I have four tables replacing the one table per year used in the old structure. It makes queries a bit more complicated. But I can do other tasks much easier using the new structure. I can't/won't go back to the old method of one table per year so I guess I'm looking to see if there is a better method similar to the new structure above.

    Thanks,
    GG

  5. #5
    Join Date
    Dec 2003
    Location
    Canada
    Posts
    710
    gwgeller, it doesn't matter if the product availability is contiguous. What you can do in that case is have the following:

    Year(YearID, YearName, ...)
    Product(ProductID, ProductName, ...)
    ProductYear(YearID, ProductID)

    This is an intersection. A year can have multiple products and a product can have multiple years.

    You can also create an intersection for your products and suppliers.

    Product(ProductID, ProductName, ...)
    Supplier(SupplierID, SupplierName, ...)
    ProductSupplier(ProductID, SupplierID)

    In this way a supplier can have multiple products and a product can have multiple suppliers.

    In this way there is no redundancy in years, in products or in suppliers.

    The work would then be not in the database, which is normalized, but in the application where you would have to come up with a simple means of maintaining the year, product and supplier tables and a simple means of relating years to products and products to suppliers. And that is not rocket science either.

  6. #6
    Join Date
    Jan 2004
    Posts
    145
    Certus your other examples seem to be pretty close to what I posted except you use a years table where as I use the year directly. You need an intersection between the suppliers and years as well.

    My Parent table, your Supplier table, intersects my Child table, your Product table, via my ChildYears table, your ProductYear table.

    When you say there will be no redundancy are you excluding the ProductYear and ProductSupplier tables? That is where all the redundancy is right?

    Regardless I have tested my structure and it isn't as confusing as I thought. Unless I'm missing something in Certus' example I am going to go with my posted structure.

  7. #7
    Join Date
    Dec 2003
    Location
    Canada
    Posts
    710
    The intersections only contain surrogate keys. The actual data is not duplicated. This is considered good normalization.

    You can find suppliers for the year with the following query

    SELECT A.SupplierName
    FROM Supplier A, SupplierProduct B, ProductYear C, Year D
    WHERE A.SupplierID = B.SupplierID
    AND B.ProductID = C.ProductID
    AND C.YearID = D.YearID
    And D.YearName = "1999";
    Attached Thumbnails Attached Thumbnails YearProductSupplier.JPG  
    Last edited by certus; 10-05-06 at 16:57.

  8. #8
    Join Date
    Jan 2004
    Posts
    145
    Certus - thanks for the graphic. I believe what complicates this is that the supplier has data that will change from year to year. So the suppliers need their own years table. While looking at my original diagram I noticed that there will not be any non-static data in the products table other than the supplier ID. I've redone my diagram using your naming scheme.

    Suppliers(table):
    + +-SupplierID
    | | SupplierDesc
    | |
    | | SupplierYears(table):
    | |_SupplierID
    | SupplierYear
    | NonStaticField1
    | NonStaticField2
    |
    | Product(table):
    | +- ProductID
    | | ProductDesc
    | |
    | | ProductYears(table):
    | |_ProductID
    | ProductYear
    |__SupplierID
    GG
    Code:
    On Error Goto Hell
    
    Hell:
        Msgbox("An error occurred, but was sent to Hell. You may continue.")

  9. #9
    Join Date
    Dec 2003
    Location
    Canada
    Posts
    710
    In that case see the attached diagram.

    (like your signature by the way.)
    Attached Thumbnails Attached Thumbnails YearProductSupplierYear.JPG  

  10. #10
    Join Date
    Jan 2004
    Posts
    145
    Certus I think we are on the same page on the structure. Again the only difference is the years table which I'd like to expand upon.

    You use a yearID where I just use the year itself. Is this merely conceptual on your part? The way I see it is that the yearID is a LONG INT and a year is only an INT or even could be a LONG INT. So as far as performance or structure goes there wouldn't be any significant difference would there? I mean everywhere you have a yearID I would have a physical year.

    To add to this my purchase table, right now, has a field for a 4 character distributor code and a 10 character product code. This done because the text codes are easier to work with than ID numbers. Reports are entered into the DB using these codes. But really I should convert the code to the proper ID when entering it into purchase table right? Each distributor and product has an ID and a text code.


    PS. thanks on the signature. I saw another member who had an if/else block and I thought it was funny. I've been using the Goto Hell in my code for a couple years.
    GG
    Code:
    On Error Goto Hell
    
    Hell:
        Msgbox("An error occurred, but was sent to Hell. You may continue.")

  11. #11
    Join Date
    Dec 2003
    Location
    Canada
    Posts
    710
    gwgeller, I have a confession to make. I am partial to surrogate keys. What you are using are natural keys. A surrogate key is hidden by the application and never seen by the user. The application uses it to maintain referential integrity because the surrogate key values once assigned never change. In the case of Year, SupplierCode and ProductCode, you are dealing with natural keys. Natural keys are seen by the user and the user knows what those Codes represent. If a natural key used as a primary key and is changed you have to cascade the change to all the child tables that use the natural key as a foreign key. This is acceptable in most cases.

    I should also point out that if you use a surrogate key you can still use the natural key, but you have to create a unique index on the natural key to protect against duplicates.
    Last edited by certus; 10-06-06 at 15:27.

  12. #12
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    i have to be honest and admit i haven't been following this thread too closely, but i did perk up when the subject turned to natural keys

    did i understand this correctly, certus -- you used a surrogate key for the year?

    under which circumstances do you foresee the powers that be deciding to change the name of 2006 from 2006 to something else?

    a surrogate key for the year, in my humble opinion, is ludicrous
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  13. #13
    Join Date
    Dec 2003
    Location
    Canada
    Posts
    710
    I'm a ludicrous guy, rudy. My example is purist. I'm not insisting it be used for a real application.

  14. #14
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    i would rather not post that flim-flammery with a surrogate key for the year, even if you think that surrogate keys are the cat's pajamas, because the recipient of your largess isn't up to speed enough to determine that you are actually making a sly joke

    be clear, be open, be frank, and please don't use a surrogate key for the year
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  15. #15
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    Code:
    INSERT INTO Year_2(YearID, YearName)
    VALUES (2006, 'Two Thousand and Six')
    Testimonial:
    pootle flump
    ur codings are working excelent.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •