Results 1 to 14 of 14
  1. #1
    Join Date
    Oct 2005
    Posts
    15

    DB design - Rows with different fields.

    I wondered if anyone could give me some advice on a database design.

    The database is going to store product information which is essentially the same for each record but the records will be imported from different sources so the number of columns and column names will be different.

    For example one batch of records may have these columns:

    prod_id
    prod_name
    sku
    price
    style

    but another batch may have:

    id
    name
    product_code
    sales price
    design
    size
    option
    cost_price

    So I'm looking at ways to combine this data into one schema but have a few issues. Some columns such as 'name' and 'prod_name' are simply different names for effectively the same column data.

    I could create one column and map data from each of these differently named columns into that column so I create a column named 'product_id' and import the data from 'name' and 'prod_name' into that one column.

    This would make querying by product_name easy as I'd just need to search one column but I also need to output the data in the form it was originally imported so if the record returned originally had 8 columns and the product name was in the 'prod_name' then I want to output all 8 columns with their original column names.

    I could create a hybrid table with all the columns that the incoming data has so the table would have both 'prod_name' and 'name' columns for example. This seems like a bad idea though as I'd then have to search two columns when doing a product name search and each record would have redundant columns which only apply to certain batches of records.

    Another option seem to be importing each batch of records into their own tables with their own unique columns but that means querying for product name would mean querying multiple tables and there's so many different sources of data that I'd end up with a lot of tables and need to add a new table each time a different format of record comes along.

    So the way I'm thinking at the moment is I need to create a table with my own unique columns (so only one 'product_name' field) and relate this to a table of column labels so I can search just one column for product name but output it labelled as 'prod_name' or 'name' depending on the originally imported record's format.

    I know it sounds complicated and hope I've explained it well enough!

    Thanks,

    Steve.

  2. #2
    Join Date
    Jul 2009
    Location
    Michigan
    Posts
    125
    I would first import the data just as it is into tables that look like the original source. Then I would create a table that has all of the columns that you want to search by (the ones they have in common) and replicate the data to this table along with a pointer to the original source.

    My table would look something like this:

    SourceCode -- what batch the data came from
    SourceID -- the id within that batch
    ProductCode -- the sku/product code field
    ProductName -- the name of the product

    The first two columns would tell you how to find the original data, the remaining columns would be the columns that all or most of the sources have in common. This table could have several subtables based on the product type (cars might have a different common fields than craft items).

    Hope that made sense, it is early morning and I haven't had my caffeine yet.

  3. #3
    Join Date
    Oct 2005
    Posts
    15
    Thanks Mark,

    I'll have a go at that - my only concern with that approach was that I'd probably end up with 8 or 9 tables and if a new format of data to be imported came along I'd need another table and I'd then have to brush up my PHP skills to cope with selects, updates and inserts of data in different tables!

    I guess rather than the fixed SQL queries that create recordsets to display/update the data I'd have to create a dynamic query which looked up the source code and then looked up the table and column names to generate a query specific to each different table. A bit more complicated but should be do-able - I think that might be a problem for another forum and another cup of coffee though :-)

  4. #4
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    Quote Originally Posted by infimp
    ... I want to output all 8 columns with their original column names.
    there's the source of all your trouble

    get rid of this requirement and you will be a happy camper

    keep it, and you will be coding, coding, coding...
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  5. #5
    Join Date
    Oct 2005
    Posts
    15
    Unfortunately showing the data in it's original format is something we need to do.

    I'm thinking now that go through the different data formats I'll be using and create a table which has a column for every unique column in the batches - that's to say if one batch has 'prod_name' and one just has 'name' then provided they're actually the same field I can just have one column in my table for product_name.

    If a particular batch has 8 columns and another batch has 10 columns then my table is going to have 10 columns - yes there'll be a some redundant columns for each record but maybe I can work that out later through normalisation.

    Then instead of relating this key table to sub tables with the batch data I just relate it to a single table which stores a mapping of key table columns to original column names. Then the data can be searched easily through one table but when it is displayed I can look up the column mappings and show the correct label for each column and hide any columns that that record shouldn't contain (which would be null anyway).

    This way I think I can simplify the coding as I'm not adding a new table every time a different batch comes along - I just need to add a new column mapping record in the mappings table. Only problem I can see is if a new batch has extra columns not in my key table then I would have to alter the table and possibly some coding.

    I've not really thought this through completely so I'm going to have to test it out - I could be completely wrong!

  6. #6
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    Quote Originally Posted by infimp
    Unfortunately showing the data in it's original format is something we need to do.

    I'm thinking now that go through the different data formats I'll be using and create a table which has a column for every unique column in the batches - that's to say if one batch has 'prod_name' and one just has 'name' then provided they're actually the same field I can just have one column in my table for product_name.

    If a particular batch has 8 columns and another batch has 10 columns then my table is going to have 10 columns - yes there'll be a some redundant columns for each record but maybe I can work that out later through normalisation.

    Then instead of relating this key table to sub tables with the batch data I just relate it to a single table which stores a mapping of key table columns to original column names. Then the data can be searched easily through one table but when it is displayed I can look up the column mappings and show the correct label for each column and hide any columns that that record shouldn't contain (which would be null anyway).

    This way I think I can simplify the coding as I'm not adding a new table every time a different batch comes along - I just need to add a new column mapping record in the mappings table. Only problem I can see is if a new batch has extra columns not in my key table then I would have to alter the table and possibly some coding.

    I've not really thought this through completely so I'm going to have to test it out - I could be completely wrong!
    coding, coding, coding...

    perhaps you haven't thought it through enough

    a strategy that involves coding, coding, coding is weaker than a strategy that doesn't

    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  7. #7
    Join Date
    Oct 2005
    Posts
    15
    Well there's less coding needed this way and possibly no extra coding at all provided I cover all possible columns in my design to start with.

  8. #8
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Sorry Dude, but I have to agree with Rudy. This is one of the more ridiculous requirements I have ever seen.
    What is the reason for it?
    What is the business problem you are trying to solve?
    Coding, coding, coding....
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  9. #9
    Join Date
    Jul 2009
    Location
    Michigan
    Posts
    125
    Quote Originally Posted by infimp
    Unfortunately showing the data in it's original format is something we need to do.

    I'm thinking now that go through the different data formats I'll be using and create a table which has a column for every unique column in the batches - that's to say if one batch has 'prod_name' and one just has 'name' then provided they're actually the same field I can just have one column in my table for product_name.
    This might work, but the problem I see coming eventually is where one batch keeps size as "2x4" and another keeps it as 4.75. Now you need two size columns, one a string and one a numeric. All sorts of conversion issues are possible with multiple batches. This may or may not happen, but if it does you will have to change your design and I am a big believer in designing things once.

    Instead of having a table to map the original names to the columns I would go with multiple views to recreate the original tables, one view per batch.

  10. #10
    Join Date
    Oct 2005
    Posts
    15
    Quote Originally Posted by blindman
    Sorry Dude, but I have to agree with Rudy. This is one of the more ridiculous requirements I have ever seen.
    What is the reason for it?
    What is the business problem you are trying to solve?
    Coding, coding, coding....
    The problem is we're dealing with historic data which over the years and depending on the source has changed in format.

    I know if we didn't have to do this it would be a breeze but it is vital to the integrity of the information that the data is presented with the same number of columns and column names as it originally had.

    It's not good me knowing that the data that's now in the 'style' column was in the 'design' column if the data came from one source 10 years ago, was the 'type' column if the data came from a different source 10 years ago and would have been the 'opt' column if it came from a third source 7 years ago - The end user isn't going to know this so I need to present the data exactly as was so that it can be viewed and compared to paper copies and there's no confusion over how it was originally entered.

    I'll try the different approaches with some sample data tomorrow and report back!

  11. #11
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    Quote Originally Posted by infimp
    Another option seem to be importing each batch of records into their own tables with their own unique columns but that means querying for product name would mean querying multiple tables
    this is starting to look like a very attractive option, given that the "need to add a new table each time a different format of record comes along" should be limited since we're talking about historic tables that shouldn't actually "be coming along" since you can uncover them all now
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  12. #12
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Then is sounds as if you problem is not how to store the data, but how to report on it. In which case, Trombley's suggestion to use views is probably your best bet.
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  13. #13
    Join Date
    Dec 2007
    Location
    London, UK
    Posts
    741
    Quote Originally Posted by infimp View Post
    it is vital to the integrity of the information that the data is presented with the same number of columns and column names as it originally had.
    Sure, but it doesn't need to be stored that way. Store all the data in a single, conformed schema. Output it in whatever combination of columns and column names you want. You just need to preserve enough information for each row so that you know which output format to use for that product (an "origin type" attribute).

  14. #14
    Join Date
    Oct 2005
    Posts
    15
    That's the conclusion I've come to too - I'm going to create a single table and store a source_id in each row then have a table of sources with their column names so I basically just look up the labels to use whenever I display a record.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •