If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > General > Database Concepts & Design > To have a separate table with attributes for each entity?

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 08-23-04, 00:32
roman2 roman2 is offline
Registered User
 
Join Date: Aug 2004
Posts: 17
To have a separate table with attributes for each entity?

This question may seem similar to my first question on this forum, but it's not. Here's what it is. Suppose that I have N customers and M products, where N and M grow with time and hopefully will reach millions. I want to record preferences of each customer for different products. Theoretically, the number of data points can be N * M, but it will probably be N * (M - X), where X is probably large. Still, the number of data points can be millions.

I will need to access the preference data for each customer from time to time. My goal is to speed up query. I can think of two methods of storing this information: in a table(customerID, productID, preferenceCode) or in a separate table(productID, preferenceCode) for each customer. I don't know if modern DBMSs can handle that many tables (thousands or millions) in one database. Is the latter approach rather insane?

With expected N and M, would I experience long query execution times? I've never worked with databases containing this many records and have no idea how much time it will take on a relatively fast machine (whatever machines are currently used for relatively cheap hosting -- in sub $10/month range; http://www.10-cheapwebhosting.com/). Answering my question is complicated by the fact that these machines are shared among potentially many people concurrently accessing different sites hosted on these machines.

Thanks in advance

Last edited by roman2; 08-23-04 at 14:57.
Reply With Quote
  #2 (permalink)  
Old 08-23-04, 05:04
jwab jwab is offline
Registered User
 
Join Date: Jul 2004
Location: UK
Posts: 43
Quote:
where N and M grow with time and hopefully will reach millions
More of N than M I hope.

Very bad to have a table for each customer, this way you treat each customer as a seperate entity. Not good imo. This is the direction of OODB systems.

What is the preference? Are we talking Customer buys X with X having colour preference R and size prefernse K?
Reply With Quote
  #3 (permalink)  
Old 08-23-04, 05:40
andrewst andrewst is offline
Moderator.
 
Join Date: Sep 2002
Location: UK
Posts: 5,171
I agree with Jwab. A table per customer is never the correct solution for somethign like this. Imagine if you wanted to answer questions like "what percentage of customers expressed a preference for product P?" You couldn't, realistically.

Good modern DBMSs have various strategies for handling large quantities of data, such as partitioning.
__________________
Tony Andrews
http://tinyurl.com/tonyandrews
Reply With Quote
  #4 (permalink)  
Old 08-23-04, 10:46
roman2 roman2 is offline
Registered User
 
Join Date: Aug 2004
Posts: 17
Quote:
Originally Posted by jwab
What is the preference?
Just whether customer X likes product Y.

Quote:
Originally Posted by jwab
this way you treat each customer as a seperate entity
I only recently started (restarted is more correct) the database theory, so this question may appear silly to you. I thought that each customer WAS a separate entity. Is that wrong? What would be an entity then? A generic customer, analagous to a customer class as opposed to a customer instance, speaking in OO terms?

Quote:
Originally Posted by andrewst
Imagine if you wanted to answer questions like "what percentage of customers expressed a preference for product P?" You couldn't, realistically.
I guess not. However, if I created a preference table for each customer upon adding the customer to a customer table, I could add an indicator whether a preference has been expressed, and this would allow me to answer the question you said because I could always count on presence of a preference table for each customer and could refer to it.

Are there more reasons why this approach isn't good?

Thanks
Reply With Quote
  #5 (permalink)  
Old 08-23-04, 11:01
jwab jwab is offline
Registered User
 
Join Date: Jul 2004
Location: UK
Posts: 43
Quote:
I thought that each customer WAS a separate entity. Is that wrong? What would be an entity then? A generic customer, analagous to a customer class as opposed to a customer instance, speaking in OO terms?
A customer entity would be a description of a customer, eg Name, Address... If you seperate customers into their own entities you are effectivly saying Customer A is very different to Customer B in the same sense that a Product entity is different from a transaction entity.

Quote:
Just whether customer X likes product Y.
Probably a good approach would be customer_product_preference(customer_id, product_id, preference_id)

Then you can have your millions of products and customers with not too many tables.
Reply With Quote
  #6 (permalink)  
Old 08-23-04, 11:05
andrewst andrewst is offline
Moderator.
 
Join Date: Sep 2002
Location: UK
Posts: 5,171
Quote:
Originally Posted by roman2
I thought that each customer WAS a separate entity. Is that wrong? What would be an entity then?
Actually, the term "entity" wasn't quite correct; it should have been "entity class" or "entity type". One table per "entity type", e.g. customers. Not 1 table per entity instance, like customer_jones, customer_smith.

Quote:
Originally Posted by roman2
I guess not. However, if I created a preference table for each customer upon adding the customer to a customer table, I could add an indicator whether a preference has been expressed, and this would allow me to answer the question you said because I could always count on presence of a preference table for each customer and could refer to it.
No, that would tell you only that customer X has expressed some preference(s). It would not tell you whether customer X has expressed a prreference for product P in particular. To answer my question, you would have to do something like:
Code:
select count(*) from 
(select 1 from cust1_prefs where product_code = 'P'
 UNION
 select 1 from cust2_prefs where product_code = 'P'

 ... 

 UNION
 select 1 from cust945364_prefs where product_code = 'P'
)
Quote:
Originally Posted by roman2
Are there more reasons why this approach isn't good?
1) It's an administrative headache: every time you get a new customer, you have to create a new table, create primary/referential constraints on it, grant privileges on it to users, etc.
2) Programs have to use "dynamic SQL" to select from these tables.

And there are no compensating advantages to this approach!
__________________
Tony Andrews
http://tinyurl.com/tonyandrews
Reply With Quote
  #7 (permalink)  
Old 08-23-04, 11:20
andrewst andrewst is offline
Moderator.
 
Join Date: Sep 2002
Location: UK
Posts: 5,171
Going back to your original post:
Quote:
Originally Posted by roman2
With expected N and M, would I experience long query execution times?
No, because you would use indexes to enable the DBMS to query efficiently:
1) (customerID, productID) would be indexed, and in fact are the primary key. This means queries by customerID will be fast.
2) (productID) should probably also be indexed separately, and this will make queries by productID fast - e.g. my "what percentage of customers expressed a preference for product P" query.
__________________
Tony Andrews
http://tinyurl.com/tonyandrews
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On