Results 1 to 15 of 15
  1. #1
    Join Date
    Feb 2009
    Posts
    8

    Table partitioning to improve performancce

    Hi all,
    firstly, let me introduce myself: I am 2nd year Software Engineering student from London and this year I have been given the task to develop a system which records the processes in a photo lab: the receptionists accept new orders by recording them into a db, the lab technicians query the db about the next task which needs to be processed and record the start and the completion event, etc.

    question 1:
    Because the db will hold all tasks, the ones in process and the ones completed, to me it makes sense to have 2 tables - one, to hold the tasks which havent been completed and once they are completed they will be moved into a other table. This should have some effect on the performance because most of the queries will be about uncompleted tasks. My question is, does this make sense to you and does it worth the effort if about 1000 tasks are precessed per day?

    question 2:
    In the db I have a table that holds info about customer accounts:
    account (accountNo{pk}, balance, creditLimit)
    Additionally, every account can have one of the three types of discount plans. Every type has different attributes:
    discPlan1 (accountNo{fk,pk}, rate)
    dicsPlan2 ((accountNo{fk}, productId){pk}, rate)
    discPlan3 (accountNo{fk,pk}, ordersFrom, ordersTo, rate)

    My thoughts are that there is a one-to-one relationship between account and each type of discount. However, an account cannot have 2 discount plans at the same time. With my current design this is possible, moreover, in order to find the discount plan of an account, I need to join 4 tables, which is not very efficient. Any ideas on how I could improve this?

    Best,
    Peter

  2. #2
    Join Date
    Jun 2007
    Location
    London
    Posts
    2,527
    Quote Originally Posted by glarus
    to me it makes sense to have 2 tables - one, to hold the tasks which havent been completed and once they are completed they will be moved into a other table
    This is a poor way of doing things. It's better to just have a single table and use the completed time to distinguish between the two types of record. You're doubling the amount of code required to access this data, increasing the complexity and you're not really improving the performance.

    Quote Originally Posted by glarus
    My thoughts are that there is a one-to-one relationship between account and each type of discount. However, an account cannot have 2 discount plans at the same time. With my current design this is possible, moreover, in order to find the discount plan of an account, I need to join 4 tables, which is not very efficient. Any ideas on how I could improve this?
    I'd not have the discount tables but just have a few nullable extra fields in the account table ie discountType, rate, ordersFrom*, ordersTo*, productId*. The fields with * would be the nullable fields. Must admit I'm unclear how the productId comes into this. For completeness you'd want a lookup table for the type values.

    Quote Originally Posted by glarus
    Any ideas on how I could improve this?
    I believe the above should lead to a simpler processing, easier code and better performance. I suppose the easiest way to test this is by running through your design by hand and seeing how easy it is to find any given job and how to process it. The only slight danger is that the resulting design might look a bit "simple" for a project - in which case I'd document your original design and point out the small issues and how you come to a simpler design.

  3. #3
    Join Date
    Feb 2009
    Posts
    8
    Hey Mike,
    thanks for your reply.
    ProductId is a fk from another table called...product. The idea is that if given a type of discount, the account holder gets different discount rate for each product.
    OrdersFrom and OrdersTo is a range of worth of orders placed and the account holder receives different discount for each range (e.g. 1-1000 - 0%, 1001-2000 - 2%, etc.). That's why I cannot put these into the account table. Sorry for not mentioning these before

    About the partitioning, I read that this can be done by using views. No need to mention that I dont know exactly how to do it but I was wondering if you can tell me if this would be an efficient solution?

  4. #4
    Join Date
    Jun 2007
    Location
    London
    Posts
    2,527
    Quote Originally Posted by glarus
    Hey Mike,
    thanks for your reply.
    ProductId is a fk from another table called...product. The idea is that if given a type of discount, the account holder gets different discount rate for each product.
    OrdersFrom and OrdersTo is a range of worth of orders placed and the account holder receives different discount for each range (e.g. 1-1000 - 0%, 1001-2000 - 2%, etc.). That's why I cannot put these into the account table. Sorry for not mentioning these before
    That changes things so you'll need to disregard my points about the discount tables. I'll have to think again.

  5. #5
    Join Date
    Jun 2007
    Location
    London
    Posts
    2,527
    Quote Originally Posted by glarus
    About the partitioning, I read that this can be done by using views. No need to mention that I dont know exactly how to do it but I was wondering if you can tell me if this would be an efficient solution?
    Why do you think it would be helpful to have your tables partitioned? is it to make the SQL easier to write? Depending on the RDBMS you use it might be more efficient to use a fixed date in the future (ie '3000-01-01') rather than null. Example SQL :
    Code:
    # to get next task to do (MySQL)
    select * 
    from Tasks
    where completed_time is null 
    order by requested_time
    limit 1
    The SQL to provide the discount rates is slightly more complex but if you give me the table names and fields you have then I can give some examples for you.

  6. #6
    Join Date
    Dec 2007
    Location
    London, UK
    Posts
    741
    Quote Originally Posted by glarus
    My thoughts are that there is a one-to-one relationship between account and each type of discount. However, an account cannot have 2 discount plans at the same time. With my current design this is possible
    See if this example helps:

    David Portas' Blog : Distributed Keys and Disjoint Subtypes

  7. #7
    Join Date
    Sep 2002
    Location
    UK
    Posts
    5,171
    Quote Originally Posted by glarus
    About the partitioning, I read that this can be done by using views.
    I imagine what you read was that if you partition a table into smaller tables like TASKS_IN_PROGRESS and TASKS_COMPLETED with the same columns, then you can create a view to present the whole "logical" table like this:
    Code:
    CREATE VIEW all_tasks AS
    SELECT ... FROM tasks_in_progress
    UNION ALL
    SELECT ... FROM tasks_completed
    However, as Mike is telling you already, you are very unlikely to gain any advantage from doing this partitioning. I have only ever worked on one project where table partitioning was seriously contemplated, and this was for a major energy supplier handling nearly a billion rows of historic data, of which it usually only needed to process the latest month's worth.

  8. #8
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    Quote Originally Posted by dportas
    *bookmarked*

    very nice example

    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  9. #9
    Join Date
    Feb 2009
    Posts
    8
    Quote Originally Posted by andrewst
    I imagine what you read was that if you partition a table into smaller tables like TASKS_IN_PROGRESS and TASKS_COMPLETED with the same columns, then you can create a view to present the whole "logical" table like this:
    Code:
    CREATE VIEW all_tasks AS
    SELECT ... FROM tasks_in_progress
    UNION ALL
    SELECT ... FROM tasks_completed
    The other way around Using view to partition one table into manageable chunks:
    Code:
     CREATE VIEW tasks_in_progress AS
    SELECT * FROM all_tasks WHERE time_completed IS null
    However, after further reading on the topic I discarded the idea of using views. I want to partition the tables physically into the design phase. This is, have one table called Tasks_In_progress and another one called Completed_task.
    The columns are as follows:
    Task_In_progress (TaskId{pk}, TaskTypeId{fk}, Location(where task is carried out), timeStarted, timeCompleted, UserId (employee who has carried out the task).
    The idea is to keep a task into the task_in_process table only until its value in timeCompleted IS null. Once the user marks a task completed the record will be moved into the completedTasks table, possibly, with a stored procedure. This way, the performance with the most frequent queries (few every second) like "give me all tasks awaiting processing in Room 1" will be far better because they will query a table which contains only the tasks in process and not the tasks completed in the period between 1994-2009.

    However, it turns to be quite complicated.

    However, as Mike is telling you already, you are very unlikely to gain any advantage from doing this partitioning. I have only ever worked on one project where table partitioning was seriously contemplated, and this was for a major energy supplier handling nearly a billion rows of historic data, of which it usually only needed to process the latest month's worth.
    Yes, I know but how boring life would be if I had a single table called Tasks!

  10. #10
    Join Date
    Feb 2009
    Posts
    8
    Quote Originally Posted by dportas
    Spot on! Thank you very much!

  11. #11
    Join Date
    Sep 2002
    Location
    UK
    Posts
    5,171
    Quote Originally Posted by glarus
    The other way around Using view to partition one table into manageable chunks:
    Code:
     CREATE VIEW tasks_in_progress AS
    SELECT * FROM all_tasks WHERE time_completed IS null
    Which is perfectly sensible as a convenience to querying, but has no bearing on performance whatsoever.

    ... This way, the performance with the most frequent queries (few every second) like "give me all tasks awaiting processing in Room 1" will be far better because they will query a table which contains only the tasks in process and not the tasks completed in the period between 1994-2009.
    Are you sure about that; have you measured the difference in performance?

    However, it turns to be quite complicated.
    Complicated, and quite possibly of no significant benefit!

    Yes, I know but how boring life would be if I had a single table called Tasks!
    Enjoy yourself

  12. #12
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Quote Originally Posted by glarus
    Yes, I know but how boring life would be if I had a single table called Tasks!
    While you are in this sort of mood, may I suggest a nice hair shirt and tawse to increase your particular data architecting pleasure? Maybe you can work your way up to a full EAV (Entity-Atribute-Value) implimentation if that hair shirt isn't adequate.

    -PatP

  13. #13
    Join Date
    Feb 2009
    Posts
    8
    Quote Originally Posted by andrewst
    Are you sure about that; have you measured the difference in performance?
    No, I haven't measured the performance. Actually, I am searching for a book or article on sql complexity but have found nothing so far. My thoughts about performance are intuitive. Table task_in_process will grow and shrink so the issue with the gradually decreasing performance due to growth of the data is eliminated. I know that horizontal partitioning is nothing new, but the difference in partitioning, lets say, sales to salesJanuary, salesFebruary, etc. is that the data stay in these tables because it is sorted according to a static condition, i.e. sale will always be made in January no matter what, while I want to move my tasks across the tables according to a dynamic condition. The question is: what is the cost of inserting and deleteding a raw?

  14. #14
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    if your tables are properly indexed, the difference in performance in retrieving specific rows from a table, as compared with retrieving same rows from a table with ten times the data, is counter-intuitively not very much different

    that's because performance of properly indexed queries does not increase linearly with table size
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  15. #15
    Join Date
    Feb 2009
    Posts
    8
    Quote Originally Posted by r937
    if your tables are properly indexed, the difference in performance in retrieving specific rows from a table, as compared with retrieving same rows from a table with ten times the data, is counter-intuitively not very much different

    that's because performance of properly indexed queries does not increase linearly with table size
    OK! I will just use one table then. Could you recommend me a good book on complexity of database operations? Thank you!

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •