If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > General > Database Concepts & Design > Table partitioning to improve performancce

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 02-11-09, 08:09
glarus glarus is offline
Registered User
 
Join Date: Feb 2009
Posts: 8
Table partitioning to improve performancce

Hi all,
firstly, let me introduce myself: I am 2nd year Software Engineering student from London and this year I have been given the task to develop a system which records the processes in a photo lab: the receptionists accept new orders by recording them into a db, the lab technicians query the db about the next task which needs to be processed and record the start and the completion event, etc.

question 1:
Because the db will hold all tasks, the ones in process and the ones completed, to me it makes sense to have 2 tables - one, to hold the tasks which havent been completed and once they are completed they will be moved into a other table. This should have some effect on the performance because most of the queries will be about uncompleted tasks. My question is, does this make sense to you and does it worth the effort if about 1000 tasks are precessed per day?

question 2:
In the db I have a table that holds info about customer accounts:
account (accountNo{pk}, balance, creditLimit)
Additionally, every account can have one of the three types of discount plans. Every type has different attributes:
discPlan1 (accountNo{fk,pk}, rate)
dicsPlan2 ((accountNo{fk}, productId){pk}, rate)
discPlan3 (accountNo{fk,pk}, ordersFrom, ordersTo, rate)

My thoughts are that there is a one-to-one relationship between account and each type of discount. However, an account cannot have 2 discount plans at the same time. With my current design this is possible, moreover, in order to find the discount plan of an account, I need to join 4 tables, which is not very efficient. Any ideas on how I could improve this?

Best,
Peter
Reply With Quote
  #2 (permalink)  
Old 02-11-09, 08:35
mike_bike_kite mike_bike_kite is offline
vaguely human
 
Join Date: Jun 2007
Location: London
Posts: 2,519
Quote:
Originally Posted by glarus
to me it makes sense to have 2 tables - one, to hold the tasks which havent been completed and once they are completed they will be moved into a other table
This is a poor way of doing things. It's better to just have a single table and use the completed time to distinguish between the two types of record. You're doubling the amount of code required to access this data, increasing the complexity and you're not really improving the performance.
Quote:
Originally Posted by glarus
My thoughts are that there is a one-to-one relationship between account and each type of discount. However, an account cannot have 2 discount plans at the same time. With my current design this is possible, moreover, in order to find the discount plan of an account, I need to join 4 tables, which is not very efficient. Any ideas on how I could improve this?
I'd not have the discount tables but just have a few nullable extra fields in the account table ie discountType, rate, ordersFrom*, ordersTo*, productId*. The fields with * would be the nullable fields. Must admit I'm unclear how the productId comes into this. For completeness you'd want a lookup table for the type values.
Quote:
Originally Posted by glarus
Any ideas on how I could improve this?
I believe the above should lead to a simpler processing, easier code and better performance. I suppose the easiest way to test this is by running through your design by hand and seeing how easy it is to find any given job and how to process it. The only slight danger is that the resulting design might look a bit "simple" for a project - in which case I'd document your original design and point out the small issues and how you come to a simpler design.
Reply With Quote
  #3 (permalink)  
Old 02-11-09, 09:02
glarus glarus is offline
Registered User
 
Join Date: Feb 2009
Posts: 8
Hey Mike,
thanks for your reply.
ProductId is a fk from another table called...product. The idea is that if given a type of discount, the account holder gets different discount rate for each product.
OrdersFrom and OrdersTo is a range of worth of orders placed and the account holder receives different discount for each range (e.g. 1-1000 - 0%, 1001-2000 - 2%, etc.). That's why I cannot put these into the account table. Sorry for not mentioning these before

About the partitioning, I read that this can be done by using views. No need to mention that I dont know exactly how to do it but I was wondering if you can tell me if this would be an efficient solution?
Reply With Quote
  #4 (permalink)  
Old 02-11-09, 09:07
mike_bike_kite mike_bike_kite is offline
vaguely human
 
Join Date: Jun 2007
Location: London
Posts: 2,519
Quote:
Originally Posted by glarus
Hey Mike,
thanks for your reply.
ProductId is a fk from another table called...product. The idea is that if given a type of discount, the account holder gets different discount rate for each product.
OrdersFrom and OrdersTo is a range of worth of orders placed and the account holder receives different discount for each range (e.g. 1-1000 - 0%, 1001-2000 - 2%, etc.). That's why I cannot put these into the account table. Sorry for not mentioning these before
That changes things so you'll need to disregard my points about the discount tables. I'll have to think again.
Reply With Quote
  #5 (permalink)  
Old 02-12-09, 09:54
mike_bike_kite mike_bike_kite is offline
vaguely human
 
Join Date: Jun 2007
Location: London
Posts: 2,519
Quote:
Originally Posted by glarus
About the partitioning, I read that this can be done by using views. No need to mention that I dont know exactly how to do it but I was wondering if you can tell me if this would be an efficient solution?
Why do you think it would be helpful to have your tables partitioned? is it to make the SQL easier to write? Depending on the RDBMS you use it might be more efficient to use a fixed date in the future (ie '3000-01-01') rather than null. Example SQL :
Code:
# to get next task to do (MySQL)
select * 
from Tasks
where completed_time is null 
order by requested_time
limit 1
The SQL to provide the discount rates is slightly more complex but if you give me the table names and fields you have then I can give some examples for you.
Reply With Quote
  #6 (permalink)  
Old 02-12-09, 10:33
dportas dportas is offline
Registered User
 
Join Date: Dec 2007
Location: London, UK
Posts: 732
Quote:
Originally Posted by glarus
My thoughts are that there is a one-to-one relationship between account and each type of discount. However, an account cannot have 2 discount plans at the same time. With my current design this is possible
See if this example helps:

David Portas' Blog : Distributed Keys and Disjoint Subtypes
Reply With Quote
  #7 (permalink)  
Old 02-12-09, 10:36
andrewst andrewst is offline
Moderator.
 
Join Date: Sep 2002
Location: UK
Posts: 5,171
Quote:
Originally Posted by glarus
About the partitioning, I read that this can be done by using views.
I imagine what you read was that if you partition a table into smaller tables like TASKS_IN_PROGRESS and TASKS_COMPLETED with the same columns, then you can create a view to present the whole "logical" table like this:
Code:
CREATE VIEW all_tasks AS
SELECT ... FROM tasks_in_progress
UNION ALL
SELECT ... FROM tasks_completed
However, as Mike is telling you already, you are very unlikely to gain any advantage from doing this partitioning. I have only ever worked on one project where table partitioning was seriously contemplated, and this was for a major energy supplier handling nearly a billion rows of historic data, of which it usually only needed to process the latest month's worth.
__________________
Tony Andrews
http://tinyurl.com/tonyandrews
Reply With Quote
  #8 (permalink)  
Old 02-12-09, 11:30
r937 r937 is offline
SQL Consultant
 
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,524
Quote:
Originally Posted by dportas
*bookmarked*

very nice example

__________________
r937.com | rudy.ca
please visit Simply SQL and buy my book
Reply With Quote
  #9 (permalink)  
Old 02-14-09, 01:46
glarus glarus is offline
Registered User
 
Join Date: Feb 2009
Posts: 8
Quote:
Originally Posted by andrewst
I imagine what you read was that if you partition a table into smaller tables like TASKS_IN_PROGRESS and TASKS_COMPLETED with the same columns, then you can create a view to present the whole "logical" table like this:
Code:
CREATE VIEW all_tasks AS
SELECT ... FROM tasks_in_progress
UNION ALL
SELECT ... FROM tasks_completed
The other way around Using view to partition one table into manageable chunks:
Code:
 CREATE VIEW tasks_in_progress AS
SELECT * FROM all_tasks WHERE time_completed IS null
However, after further reading on the topic I discarded the idea of using views. I want to partition the tables physically into the design phase. This is, have one table called Tasks_In_progress and another one called Completed_task.
The columns are as follows:
Task_In_progress (TaskId{pk}, TaskTypeId{fk}, Location(where task is carried out), timeStarted, timeCompleted, UserId (employee who has carried out the task).
The idea is to keep a task into the task_in_process table only until its value in timeCompleted IS null. Once the user marks a task completed the record will be moved into the completedTasks table, possibly, with a stored procedure. This way, the performance with the most frequent queries (few every second) like "give me all tasks awaiting processing in Room 1" will be far better because they will query a table which contains only the tasks in process and not the tasks completed in the period between 1994-2009.

However, it turns to be quite complicated.

Quote:
However, as Mike is telling you already, you are very unlikely to gain any advantage from doing this partitioning. I have only ever worked on one project where table partitioning was seriously contemplated, and this was for a major energy supplier handling nearly a billion rows of historic data, of which it usually only needed to process the latest month's worth.
Yes, I know but how boring life would be if I had a single table called Tasks!
Reply With Quote
  #10 (permalink)  
Old 02-14-09, 02:16
glarus glarus is offline
Registered User
 
Join Date: Feb 2009
Posts: 8
Quote:
Originally Posted by dportas
Spot on! Thank you very much!
Reply With Quote
  #11 (permalink)  
Old 02-14-09, 06:31
andrewst andrewst is offline
Moderator.
 
Join Date: Sep 2002
Location: UK
Posts: 5,171
Quote:
Originally Posted by glarus
The other way around Using view to partition one table into manageable chunks:
Code:
 CREATE VIEW tasks_in_progress AS
SELECT * FROM all_tasks WHERE time_completed IS null
Which is perfectly sensible as a convenience to querying, but has no bearing on performance whatsoever.

Quote:
... This way, the performance with the most frequent queries (few every second) like "give me all tasks awaiting processing in Room 1" will be far better because they will query a table which contains only the tasks in process and not the tasks completed in the period between 1994-2009.
Are you sure about that; have you measured the difference in performance?

Quote:
However, it turns to be quite complicated.
Complicated, and quite possibly of no significant benefit!

Quote:
Yes, I know but how boring life would be if I had a single table called Tasks!
Enjoy yourself
__________________
Tony Andrews
http://tinyurl.com/tonyandrews
Reply With Quote
  #12 (permalink)  
Old 02-14-09, 07:25
Pat Phelan Pat Phelan is offline
Resident Curmudgeon
 
Join Date: Feb 2004
Location: In front of the computer
Posts: 12,605
Quote:
Originally Posted by glarus
Yes, I know but how boring life would be if I had a single table called Tasks!
While you are in this sort of mood, may I suggest a nice hair shirt and tawse to increase your particular data architecting pleasure? Maybe you can work your way up to a full EAV (Entity-Atribute-Value) implimentation if that hair shirt isn't adequate.

-PatP
Reply With Quote
  #13 (permalink)  
Old 02-14-09, 07:31
glarus glarus is offline
Registered User
 
Join Date: Feb 2009
Posts: 8
Quote:
Originally Posted by andrewst
Are you sure about that; have you measured the difference in performance?
No, I haven't measured the performance. Actually, I am searching for a book or article on sql complexity but have found nothing so far. My thoughts about performance are intuitive. Table task_in_process will grow and shrink so the issue with the gradually decreasing performance due to growth of the data is eliminated. I know that horizontal partitioning is nothing new, but the difference in partitioning, lets say, sales to salesJanuary, salesFebruary, etc. is that the data stay in these tables because it is sorted according to a static condition, i.e. sale will always be made in January no matter what, while I want to move my tasks across the tables according to a dynamic condition. The question is: what is the cost of inserting and deleteding a raw?
Reply With Quote
  #14 (permalink)  
Old 02-14-09, 07:46
r937 r937 is offline
SQL Consultant
 
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,524
if your tables are properly indexed, the difference in performance in retrieving specific rows from a table, as compared with retrieving same rows from a table with ten times the data, is counter-intuitively not very much different

that's because performance of properly indexed queries does not increase linearly with table size
__________________
r937.com | rudy.ca
please visit Simply SQL and buy my book
Reply With Quote
  #15 (permalink)  
Old 02-15-09, 01:05
glarus glarus is offline
Registered User
 
Join Date: Feb 2009
Posts: 8
Quote:
Originally Posted by r937
if your tables are properly indexed, the difference in performance in retrieving specific rows from a table, as compared with retrieving same rows from a table with ten times the data, is counter-intuitively not very much different

that's because performance of properly indexed queries does not increase linearly with table size
OK! I will just use one table then. Could you recommend me a good book on complexity of database operations? Thank you!
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On