Results 1 to 7 of 7
  1. #1
    Join Date
    Aug 2004
    Posts
    17

    To have a separate table with attributes for each entity?

    This question may seem similar to my first question on this forum, but it's not. Here's what it is. Suppose that I have N customers and M products, where N and M grow with time and hopefully will reach millions. I want to record preferences of each customer for different products. Theoretically, the number of data points can be N * M, but it will probably be N * (M - X), where X is probably large. Still, the number of data points can be millions.

    I will need to access the preference data for each customer from time to time. My goal is to speed up query. I can think of two methods of storing this information: in a table(customerID, productID, preferenceCode) or in a separate table(productID, preferenceCode) for each customer. I don't know if modern DBMSs can handle that many tables (thousands or millions) in one database. Is the latter approach rather insane?

    With expected N and M, would I experience long query execution times? I've never worked with databases containing this many records and have no idea how much time it will take on a relatively fast machine (whatever machines are currently used for relatively cheap hosting -- in sub $10/month range; http://www.10-cheapwebhosting.com/). Answering my question is complicated by the fact that these machines are shared among potentially many people concurrently accessing different sites hosted on these machines.

    Thanks in advance
    Last edited by roman2; 08-23-04 at 15:57.

  2. #2
    Join Date
    Jul 2004
    Location
    UK
    Posts
    43
    where N and M grow with time and hopefully will reach millions
    More of N than M I hope.

    Very bad to have a table for each customer, this way you treat each customer as a seperate entity. Not good imo. This is the direction of OODB systems.

    What is the preference? Are we talking Customer buys X with X having colour preference R and size prefernse K?

  3. #3
    Join Date
    Sep 2002
    Location
    UK
    Posts
    5,171
    I agree with Jwab. A table per customer is never the correct solution for somethign like this. Imagine if you wanted to answer questions like "what percentage of customers expressed a preference for product P?" You couldn't, realistically.

    Good modern DBMSs have various strategies for handling large quantities of data, such as partitioning.

  4. #4
    Join Date
    Aug 2004
    Posts
    17
    Quote Originally Posted by jwab
    What is the preference?
    Just whether customer X likes product Y.

    Quote Originally Posted by jwab
    this way you treat each customer as a seperate entity
    I only recently started (restarted is more correct) the database theory, so this question may appear silly to you. I thought that each customer WAS a separate entity. Is that wrong? What would be an entity then? A generic customer, analagous to a customer class as opposed to a customer instance, speaking in OO terms?

    Quote Originally Posted by andrewst
    Imagine if you wanted to answer questions like "what percentage of customers expressed a preference for product P?" You couldn't, realistically.
    I guess not. However, if I created a preference table for each customer upon adding the customer to a customer table, I could add an indicator whether a preference has been expressed, and this would allow me to answer the question you said because I could always count on presence of a preference table for each customer and could refer to it.

    Are there more reasons why this approach isn't good?

    Thanks

  5. #5
    Join Date
    Jul 2004
    Location
    UK
    Posts
    43
    I thought that each customer WAS a separate entity. Is that wrong? What would be an entity then? A generic customer, analagous to a customer class as opposed to a customer instance, speaking in OO terms?
    A customer entity would be a description of a customer, eg Name, Address... If you seperate customers into their own entities you are effectivly saying Customer A is very different to Customer B in the same sense that a Product entity is different from a transaction entity.

    Just whether customer X likes product Y.
    Probably a good approach would be customer_product_preference(customer_id, product_id, preference_id)

    Then you can have your millions of products and customers with not too many tables.

  6. #6
    Join Date
    Sep 2002
    Location
    UK
    Posts
    5,171
    Quote Originally Posted by roman2
    I thought that each customer WAS a separate entity. Is that wrong? What would be an entity then?
    Actually, the term "entity" wasn't quite correct; it should have been "entity class" or "entity type". One table per "entity type", e.g. customers. Not 1 table per entity instance, like customer_jones, customer_smith.

    Quote Originally Posted by roman2
    I guess not. However, if I created a preference table for each customer upon adding the customer to a customer table, I could add an indicator whether a preference has been expressed, and this would allow me to answer the question you said because I could always count on presence of a preference table for each customer and could refer to it.
    No, that would tell you only that customer X has expressed some preference(s). It would not tell you whether customer X has expressed a prreference for product P in particular. To answer my question, you would have to do something like:
    Code:
    select count(*) from 
    (select 1 from cust1_prefs where product_code = 'P'
     UNION
     select 1 from cust2_prefs where product_code = 'P'
    
     ... 
    
     UNION
     select 1 from cust945364_prefs where product_code = 'P'
    )
    Quote Originally Posted by roman2
    Are there more reasons why this approach isn't good?
    1) It's an administrative headache: every time you get a new customer, you have to create a new table, create primary/referential constraints on it, grant privileges on it to users, etc.
    2) Programs have to use "dynamic SQL" to select from these tables.

    And there are no compensating advantages to this approach!

  7. #7
    Join Date
    Sep 2002
    Location
    UK
    Posts
    5,171
    Going back to your original post:
    Quote Originally Posted by roman2
    With expected N and M, would I experience long query execution times?
    No, because you would use indexes to enable the DBMS to query efficiently:
    1) (customerID, productID) would be indexed, and in fact are the primary key. This means queries by customerID will be fast.
    2) (productID) should probably also be indexed separately, and this will make queries by productID fast - e.g. my "what percentage of customers expressed a preference for product P" query.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •