If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > General > Database Concepts & Design > DB design - Rows with different fields.

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 10-28-09, 06:44
infimp infimp is offline
Registered User
 
Join Date: Oct 2005
Posts: 10
DB design - Rows with different fields.

I wondered if anyone could give me some advice on a database design.

The database is going to store product information which is essentially the same for each record but the records will be imported from different sources so the number of columns and column names will be different.

For example one batch of records may have these columns:

prod_id
prod_name
sku
price
style

but another batch may have:

id
name
product_code
sales price
design
size
option
cost_price

So I'm looking at ways to combine this data into one schema but have a few issues. Some columns such as 'name' and 'prod_name' are simply different names for effectively the same column data.

I could create one column and map data from each of these differently named columns into that column so I create a column named 'product_id' and import the data from 'name' and 'prod_name' into that one column.

This would make querying by product_name easy as I'd just need to search one column but I also need to output the data in the form it was originally imported so if the record returned originally had 8 columns and the product name was in the 'prod_name' then I want to output all 8 columns with their original column names.

I could create a hybrid table with all the columns that the incoming data has so the table would have both 'prod_name' and 'name' columns for example. This seems like a bad idea though as I'd then have to search two columns when doing a product name search and each record would have redundant columns which only apply to certain batches of records.

Another option seem to be importing each batch of records into their own tables with their own unique columns but that means querying for product name would mean querying multiple tables and there's so many different sources of data that I'd end up with a lot of tables and need to add a new table each time a different format of record comes along.

So the way I'm thinking at the moment is I need to create a table with my own unique columns (so only one 'product_name' field) and relate this to a table of column labels so I can search just one column for product name but output it labelled as 'prod_name' or 'name' depending on the originally imported record's format.

I know it sounds complicated and hope I've explained it well enough!

Thanks,

Steve.
Reply With Quote
  #2 (permalink)  
Old 10-28-09, 07:19
MarkATrombley MarkATrombley is offline
Registered User
 
Join Date: Jul 2009
Location: Michigan
Posts: 125
I would first import the data just as it is into tables that look like the original source. Then I would create a table that has all of the columns that you want to search by (the ones they have in common) and replicate the data to this table along with a pointer to the original source.

My table would look something like this:

SourceCode -- what batch the data came from
SourceID -- the id within that batch
ProductCode -- the sku/product code field
ProductName -- the name of the product

The first two columns would tell you how to find the original data, the remaining columns would be the columns that all or most of the sources have in common. This table could have several subtables based on the product type (cars might have a different common fields than craft items).

Hope that made sense, it is early morning and I haven't had my caffeine yet.
Reply With Quote
  #3 (permalink)  
Old 10-28-09, 07:37
infimp infimp is offline
Registered User
 
Join Date: Oct 2005
Posts: 10
Thanks Mark,

I'll have a go at that - my only concern with that approach was that I'd probably end up with 8 or 9 tables and if a new format of data to be imported came along I'd need another table and I'd then have to brush up my PHP skills to cope with selects, updates and inserts of data in different tables!

I guess rather than the fixed SQL queries that create recordsets to display/update the data I'd have to create a dynamic query which looked up the source code and then looked up the table and column names to generate a query specific to each different table. A bit more complicated but should be do-able - I think that might be a problem for another forum and another cup of coffee though :-)
Reply With Quote
  #4 (permalink)  
Old 10-28-09, 08:37
r937 r937 is offline
SQL Consultant
 
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,524
Quote:
Originally Posted by infimp
... I want to output all 8 columns with their original column names.
there's the source of all your trouble

get rid of this requirement and you will be a happy camper

keep it, and you will be coding, coding, coding...
__________________
r937.com | rudy.ca
please visit Simply SQL and buy my book
Reply With Quote
  #5 (permalink)  
Old 10-28-09, 11:27
infimp infimp is offline
Registered User
 
Join Date: Oct 2005
Posts: 10
Unfortunately showing the data in it's original format is something we need to do.

I'm thinking now that go through the different data formats I'll be using and create a table which has a column for every unique column in the batches - that's to say if one batch has 'prod_name' and one just has 'name' then provided they're actually the same field I can just have one column in my table for product_name.

If a particular batch has 8 columns and another batch has 10 columns then my table is going to have 10 columns - yes there'll be a some redundant columns for each record but maybe I can work that out later through normalisation.

Then instead of relating this key table to sub tables with the batch data I just relate it to a single table which stores a mapping of key table columns to original column names. Then the data can be searched easily through one table but when it is displayed I can look up the column mappings and show the correct label for each column and hide any columns that that record shouldn't contain (which would be null anyway).

This way I think I can simplify the coding as I'm not adding a new table every time a different batch comes along - I just need to add a new column mapping record in the mappings table. Only problem I can see is if a new batch has extra columns not in my key table then I would have to alter the table and possibly some coding.

I've not really thought this through completely so I'm going to have to test it out - I could be completely wrong!
Reply With Quote
  #6 (permalink)  
Old 10-28-09, 11:56
r937 r937 is offline
SQL Consultant
 
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,524
Quote:
Originally Posted by infimp
Unfortunately showing the data in it's original format is something we need to do.

I'm thinking now that go through the different data formats I'll be using and create a table which has a column for every unique column in the batches - that's to say if one batch has 'prod_name' and one just has 'name' then provided they're actually the same field I can just have one column in my table for product_name.

If a particular batch has 8 columns and another batch has 10 columns then my table is going to have 10 columns - yes there'll be a some redundant columns for each record but maybe I can work that out later through normalisation.

Then instead of relating this key table to sub tables with the batch data I just relate it to a single table which stores a mapping of key table columns to original column names. Then the data can be searched easily through one table but when it is displayed I can look up the column mappings and show the correct label for each column and hide any columns that that record shouldn't contain (which would be null anyway).

This way I think I can simplify the coding as I'm not adding a new table every time a different batch comes along - I just need to add a new column mapping record in the mappings table. Only problem I can see is if a new batch has extra columns not in my key table then I would have to alter the table and possibly some coding.

I've not really thought this through completely so I'm going to have to test it out - I could be completely wrong!
coding, coding, coding...

perhaps you haven't thought it through enough

a strategy that involves coding, coding, coding is weaker than a strategy that doesn't

__________________
r937.com | rudy.ca
please visit Simply SQL and buy my book
Reply With Quote
  #7 (permalink)  
Old 10-28-09, 12:03
infimp infimp is offline
Registered User
 
Join Date: Oct 2005
Posts: 10
Well there's less coding needed this way and possibly no extra coding at all provided I cover all possible columns in my design to start with.
Reply With Quote
  #8 (permalink)  
Old 10-28-09, 12:37
blindman blindman is offline
World Class Flame Warrior
 
Join Date: Jun 2003
Location: Ohio
Posts: 11,726
Sorry Dude, but I have to agree with Rudy. This is one of the more ridiculous requirements I have ever seen.
What is the reason for it?
What is the business problem you are trying to solve?
Coding, coding, coding....
__________________
If it's not practically useful, then it's practically useless.

blindman
www.chess.com: "sqlblindman"
Reply With Quote
  #9 (permalink)  
Old 10-28-09, 12:59
MarkATrombley MarkATrombley is offline
Registered User
 
Join Date: Jul 2009
Location: Michigan
Posts: 125
Quote:
Originally Posted by infimp
Unfortunately showing the data in it's original format is something we need to do.

I'm thinking now that go through the different data formats I'll be using and create a table which has a column for every unique column in the batches - that's to say if one batch has 'prod_name' and one just has 'name' then provided they're actually the same field I can just have one column in my table for product_name.
This might work, but the problem I see coming eventually is where one batch keeps size as "2x4" and another keeps it as 4.75. Now you need two size columns, one a string and one a numeric. All sorts of conversion issues are possible with multiple batches. This may or may not happen, but if it does you will have to change your design and I am a big believer in designing things once.

Instead of having a table to map the original names to the columns I would go with multiple views to recreate the original tables, one view per batch.
Reply With Quote
  #10 (permalink)  
Old 10-28-09, 15:12
infimp infimp is offline
Registered User
 
Join Date: Oct 2005
Posts: 10
Quote:
Originally Posted by blindman
Sorry Dude, but I have to agree with Rudy. This is one of the more ridiculous requirements I have ever seen.
What is the reason for it?
What is the business problem you are trying to solve?
Coding, coding, coding....
The problem is we're dealing with historic data which over the years and depending on the source has changed in format.

I know if we didn't have to do this it would be a breeze but it is vital to the integrity of the information that the data is presented with the same number of columns and column names as it originally had.

It's not good me knowing that the data that's now in the 'style' column was in the 'design' column if the data came from one source 10 years ago, was the 'type' column if the data came from a different source 10 years ago and would have been the 'opt' column if it came from a third source 7 years ago - The end user isn't going to know this so I need to present the data exactly as was so that it can be viewed and compared to paper copies and there's no confusion over how it was originally entered.

I'll try the different approaches with some sample data tomorrow and report back!
Reply With Quote
  #11 (permalink)  
Old 10-28-09, 15:19
r937 r937 is offline
SQL Consultant
 
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,524
Quote:
Originally Posted by infimp
Another option seem to be importing each batch of records into their own tables with their own unique columns but that means querying for product name would mean querying multiple tables
this is starting to look like a very attractive option, given that the "need to add a new table each time a different format of record comes along" should be limited since we're talking about historic tables that shouldn't actually "be coming along" since you can uncover them all now
__________________
r937.com | rudy.ca
please visit Simply SQL and buy my book
Reply With Quote
  #12 (permalink)  
Old 10-28-09, 15:19
blindman blindman is offline
World Class Flame Warrior
 
Join Date: Jun 2003
Location: Ohio
Posts: 11,726
Then is sounds as if you problem is not how to store the data, but how to report on it. In which case, Trombley's suggestion to use views is probably your best bet.
__________________
If it's not practically useful, then it's practically useless.

blindman
www.chess.com: "sqlblindman"
Reply With Quote
  #13 (permalink)  
Old 10-29-09, 06:03
dportas dportas is offline
Registered User
 
Join Date: Dec 2007
Location: London, UK
Posts: 732
Quote:
Originally Posted by infimp View Post
it is vital to the integrity of the information that the data is presented with the same number of columns and column names as it originally had.
Sure, but it doesn't need to be stored that way. Store all the data in a single, conformed schema. Output it in whatever combination of columns and column names you want. You just need to preserve enough information for each row so that you know which output format to use for that product (an "origin type" attribute).
Reply With Quote
  #14 (permalink)  
Old 10-29-09, 07:16
infimp infimp is offline
Registered User
 
Join Date: Oct 2005
Posts: 10
That's the conclusion I've come to too - I'm going to create a single table and store a source_id in each row then have a table of sources with their column names so I basically just look up the labels to use whenever I display a record.
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On