Page 1 of 2 12 LastLast
Results 1 to 15 of 16
  1. #1
    Join Date
    Apr 2012
    Posts
    9

    Smile I need help to design of a history table

    Hello,

    I need some advises in an task regarding design and modeling of a history entity. the table / tables will be in MySql.
    I have the following task for my company: to keep an history of all customers regarding their scoring. The scoring is calculated monthly. The fields needed to keep it are the following:
    - country_code
    - customer_id
    - scoring_amount
    - scoring_code
    - scoring_profile
    - scoring_month

    The first 2 fields will be PK in the new table.

    My question is: in which form is recommended to keep this history?
    1. one hugeee table. There are around 4 millions records to insert / month, multiply by year will become 48 millions.
    moreover each month i'll have to do some comparisons between the customers. What scoring code has now the customer with the scoring = X from past month. (what scoring became the scoring X from past month). I hope I'm enough clear.
    In this case the query would take to much time (the same huge table will be interrogate 2 times). Because of that I'm thinking at the next solution (solution 2).
    2. one table for each month, named history_2012_03, history_2012_04, etc...
    the query will be more faster in this case:
    Code:
    select past.customers_scoring as scoring_past
      , count(curr.customers_id) nb_customers_current
      , curr.customers_scoring scoring_current
    from temp_history_2012_04 curr
      , temp_history_2012_03 past
    where curr.customers_id = past.customers_id  
    group by past.customers_scoring
      , curr.customers_scoring
    The physical size of all tables will be too high? Could be that an issue for 4 millions records monthly?

    3.One table per year with the columns:
    - country_code
    - customer_id
    - 2011_01_amount
    - 2011_01_profile
    - 2011_01_code
    - 2011_02_amount
    - 2011_02_profile
    - 2011_02_code
    ...
    and each month I'll add 3 new fields.
    In this case the time needed for insert is longer (must be verified if the customers exists or not in the history table, upsert, but as i know there is no command for upasert on MySql, or?)

    What solution is recommended of the 3 mentioned by me?

    Thank you in advance!

  2. #2
    Join Date
    Jan 2007
    Location
    UK
    Posts
    11,434
    Option 1 for sure.

    You may wish to consider creating a second table within which you can store the aggregated/calculated data at the end of each month (assuming the historic data won't be changed?).

    I would also consider changing the scoring_month column as it doesn't take in to account any year values and if you were to group by month only then, for example, January 2010 would be aggregated with January 2011 & 2012! Consider using a simple datetime data type.
    George
    Home | Blog

  3. #3
    Join Date
    Apr 2012
    Posts
    9
    Quote Originally Posted by gvee View Post
    Option 1 for sure.

    You may wish to consider creating a second table within which you can store the aggregated/calculated data at the end of each month (assuming the historic data won't be changed?).

    I would also consider changing the scoring_month column as it doesn't take in to account any year values and if you were to group by month only then, for example, January 2010 would be aggregated with January 2011 & 2012! Consider using a simple datetime data type.
    Thank you for the answer. Of course the historic data won't be updated, just the new data will be inserted.
    Initially i thought at the same solution. But for my query that doesn't help. It's not about aggregations. In fact is a join with the same table on different month (and for 4 millions rows * 12) it takes more time. Because of this I would have opted for the solution 2.

    select past.scoring_code as scoring_past
    , count(curr.customers_id) nb_customers_current
    , curr.scoring_code scoring_current
    , curr.country_code
    from scoring_history curr
    , scoring_history past
    where curr.customers_id = past.customers_id
    and past.scoring_month = '2012-03'
    and curr.scoring_month = '2012-04'
    and curr.country_code= past.country_code
    group by past.scoring_code
    , curr.scoring_code
    , curr.country_code

    And which is the maximum number of rows allowed on mysql. there is any limitation?

  4. #4
    Join Date
    Aug 2004
    Location
    Dallas, Texas
    Posts
    831
    This depends on physical disk space. Although, once you are around 25% remaining disk capacity problems start to occur.
    Quote Originally Posted by maria_b View Post

    And which is the maximum number of rows allowed on mysql. there is any limitation?

  5. #5
    Join Date
    Apr 2012
    Location
    near Dallas, Tx
    Posts
    7

    Indexes.

    Maria, the first strategy is always best. There are a couple things IO have learned from tinkering with MySQL.

    1 joins seem to run faster than using where statements. Try restructuring your SQL TO:


    select past.scoring_code as scoring_past,
    count(curr.customers_id) nb_customers_current,
    curr.scoring_code scoring_current,
    curr.country_code
    from scoring_history curr
    join scoring_history past
    on curr.customers_id= past.customers_id
    and curr.country_code= past.country_code
    where curr.scoring_month = '2012-04'
    and past.scoring_month = '2012-03'
    order by past.scoring_code ,
    curr.scoring_code , curr.country_code;

    For this to be efficient, you need to add indexes:

    alter table scoring_code add index (scoring_month);

    Make darn sure the primary key is defined with both fields. That will make the PK an index.

    2. I believe it's better to search on the primary table before the second. It may not matter and just be a quirk I have developed myself.

  6. #6
    Join Date
    Jun 2003
    Location
    Toronto, Canada
    Posts
    5,516
    I'm surprised noone has mentioned range partitioning yet -- I believe partitioning the table by scoring_month is the most effective approach in this situation. You will avoid the overhead of maintaining multiple tables while taking the advantage of smaller physical structures for each month's worth of data.

  7. #7
    Join Date
    Apr 2012
    Location
    near Dallas, Tx
    Posts
    7

    Range Partitioning

    Educate me. would that help in queries at all? It may make the file sizes more manageable, but if the queries are not optimized, my understanding is it won't help.

  8. #8
    Join Date
    Jun 2003
    Location
    Toronto, Canada
    Posts
    5,516
    Quote Originally Posted by russthom View Post
    Educate me.
    Search for "partition elimination" or "partition pruning".

  9. #9
    Join Date
    Apr 2012
    Location
    near Dallas, Tx
    Posts
    7

    Thanks

    Thanks -- rwt

  10. #10
    Join Date
    Apr 2012
    Posts
    9
    Quote Originally Posted by russthom View Post
    Maria, the first strategy is always best. There are a couple things IO have learned from tinkering with MySQL.

    1 joins seem to run faster than using where statements. Try restructuring your SQL TO:


    select past.scoring_code as scoring_past,
    count(curr.customers_id) nb_customers_current,
    curr.scoring_code scoring_current,
    curr.country_code
    from scoring_history curr
    join scoring_history past
    on curr.customers_id= past.customers_id
    and curr.country_code= past.country_code
    where curr.scoring_month = '2012-04'
    and past.scoring_month = '2012-03'
    order by past.scoring_code ,
    curr.scoring_code , curr.country_code;

    For this to be efficient, you need to add indexes:

    alter table scoring_code add index (scoring_month);

    Make darn sure the primary key is defined with both fields. That will make the PK an index.

    2. I believe it's better to search on the primary table before the second. It may not matter and just be a quirk I have developed myself.
    thank you for the hint with the joins (1.)
    Of course indexes on scoring_month and the PK are very important, I thought already at them.

  11. #11
    Join Date
    May 2012
    Location
    Armenia
    Posts
    1
    Maria I want to help you , but don't know how to do it .

  12. #12
    Join Date
    Apr 2012
    Posts
    9
    I've chosen solution no. 1. My next question is: how could I optimize this query? It must return all new customers (customers whom exist in March of 2012 but didn't exist in February 2012 )

    Code:
    select 'new customers' new_customers
       , curr.scoring_code current_scoring
       , count(curr.customers_id) nb_customers
    from scoring_history
    where curr.scoring_month = '2012-03' -- current month
      and scoring_code = 'XY'
      and curr.customers_id not in (select past.customers_id from scoring_history past 
      							 where past.scoring_month = '2012-02'
      							    and past.country_code = curr.country_code)                                                           
    group by curr.scoring_code

  13. #13
    Join Date
    Apr 2012
    Location
    near Dallas, Tx
    Posts
    7

    History

    I don't think you can make a significant improvement on this. However. let me make a few of observations. I am assuming you did not design this system and I am going to be frank.

    Who in their right mind (A: nobody) designed a system that doesn't make customer number unique on the planet? Unless there has been a corporate merger or a particular customer can exist in two countries with different scores, the non-uniqueness of the customer number is. frankly, inane. I can just hear the conversation between customer service and a customer (CS: Can I have your customer number and country code? C: My costumer number is {blah} and I haven't a clue my country code! CS: Well that customer number exists in America, England and Liberia.). IMHO, this is not a database problem this is a business logic problem, and the Sorry Old Businessperson that designed this should be drawn and quartered, or put on the Customer Service lines, the latter being much meaner.

    if the customer number is unique, drop the country code check.

    The way you have written your sub-select allows for old customers that didn't have a reporting score in Feb to be considered a new customer. This may not be logically feasible from the business standpoint, but I noticed that.

    Meddling a bit more, what is the environment this is being run in. Are there other reports of new customers being run in the same database session? If there are, I would seriously consider pulling the new customer list to a temporary table and using that for a join, e.g.:


    create temporary table newcust as
    Select customer_id (, country_id ???)
    max(scoring_month) as month,
    count(*) as count
    from scoring_history
    group by customer_id (, country_id ???)
    having month = '2012-04-01'
    and count = 1;


    Barring typos, that should give you a temporary table with all the new customers in a temp file you could use for multiple reports. I use this technique for my client contact file on single reports. I highly recommend if you group new customer reports in a single session. If you do multiple reports over multi sessions, plead with the DBA for space to keep that temp file and regen it each month.

    RWT, the opinions expressed are my own and do not reflect rationality

  14. #14
    Join Date
    Jun 2003
    Location
    Toronto, Canada
    Posts
    5,516
    Quote Originally Posted by maria_b View Post
    how could I optimize this query?
    Optimize according to what criteria?

    The following might perform a bit better:

    Code:
    select 'new customers' new_customers
       , curr.scoring_code current_scoring
       , count(curr.customers_id) nb_customers
    from scoring_history
    where curr.scoring_month = '2012-03' -- current month
      and scoring_code = 'XY'
      and not exists (
         select 1 from scoring_history past 
         where past.scoring_month = '2012-02'
         and past.country_code = curr.country_code
         and past.customer_id = curr.customer_id
    )                                                           
    group by curr.scoring_code

  15. #15
    Join Date
    Apr 2012
    Location
    near Dallas, Tx
    Posts
    7

    Learn something new every day

    Thanks n__i, you taught me something new (exists). The quick, rudimentary I did showed little difference between exists and using an in clause. If fact, it tended to favor the in clause,

    It doesn't seem to like me posting a link in here - hmmm. It gives me **** in the url. I wonder if that is against rules
    Attached Files Attached Files

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •