| |
|
If this is your first visit, be sure to check out the FAQ by clicking the link above.
You may have to register before you can post: click the register link above to proceed.
To start viewing messages, select the forum that you want to visit from the selection below.
|
 |

02-11-09, 08:09
|
|
Registered User
|
|
Join Date: Feb 2009
Posts: 8
|
|
|
Table partitioning to improve performancce
|
|
Hi all,
firstly, let me introduce myself: I am 2nd year Software Engineering student from London and this year I have been given the task to develop a system which records the processes in a photo lab: the receptionists accept new orders by recording them into a db, the lab technicians query the db about the next task which needs to be processed and record the start and the completion event, etc.
question 1:
Because the db will hold all tasks, the ones in process and the ones completed, to me it makes sense to have 2 tables - one, to hold the tasks which havent been completed and once they are completed they will be moved into a other table. This should have some effect on the performance because most of the queries will be about uncompleted tasks. My question is, does this make sense to you and does it worth the effort if about 1000 tasks are precessed per day?
question 2:
In the db I have a table that holds info about customer accounts:
account (accountNo{pk}, balance, creditLimit)
Additionally, every account can have one of the three types of discount plans. Every type has different attributes:
discPlan1 (accountNo{fk,pk}, rate)
dicsPlan2 ((accountNo{fk}, productId){pk}, rate)
discPlan3 (accountNo{fk,pk}, ordersFrom, ordersTo, rate)
My thoughts are that there is a one-to-one relationship between account and each type of discount. However, an account cannot have 2 discount plans at the same time. With my current design this is possible, moreover, in order to find the discount plan of an account, I need to join 4 tables, which is not very efficient. Any ideas on how I could improve this?
Best,
Peter
|
|

02-11-09, 08:35
|
|
vaguely human
|
|
Join Date: Jun 2007
Location: London
Posts: 2,519
|
|
Quote:
|
Originally Posted by glarus
to me it makes sense to have 2 tables - one, to hold the tasks which havent been completed and once they are completed they will be moved into a other table
|
This is a poor way of doing things. It's better to just have a single table and use the completed time to distinguish between the two types of record. You're doubling the amount of code required to access this data, increasing the complexity and you're not really improving the performance.
Quote:
|
Originally Posted by glarus
My thoughts are that there is a one-to-one relationship between account and each type of discount. However, an account cannot have 2 discount plans at the same time. With my current design this is possible, moreover, in order to find the discount plan of an account, I need to join 4 tables, which is not very efficient. Any ideas on how I could improve this?
|
I'd not have the discount tables but just have a few nullable extra fields in the account table ie discountType, rate, ordersFrom*, ordersTo*, productId*. The fields with * would be the nullable fields. Must admit I'm unclear how the productId comes into this. For completeness you'd want a lookup table for the type values.
Quote:
|
Originally Posted by glarus
Any ideas on how I could improve this?
|
I believe the above should lead to a simpler processing, easier code and better performance. I suppose the easiest way to test this is by running through your design by hand and seeing how easy it is to find any given job and how to process it. The only slight danger is that the resulting design might look a bit "simple" for a project - in which case I'd document your original design and point out the small issues and how you come to a simpler design.
|
|

02-11-09, 09:02
|
|
Registered User
|
|
Join Date: Feb 2009
Posts: 8
|
|
|
|
Hey Mike,
thanks for your reply.
ProductId is a fk from another table called...product. The idea is that if given a type of discount, the account holder gets different discount rate for each product.
OrdersFrom and OrdersTo is a range of worth of orders placed and the account holder receives different discount for each range (e.g. 1-1000 - 0%, 1001-2000 - 2%, etc.). That's why I cannot put these into the account table. Sorry for not mentioning these before
About the partitioning, I read that this can be done by using views. No need to mention that I dont know exactly how to do it but I was wondering if you can tell me if this would be an efficient solution?
|
|

02-11-09, 09:07
|
|
vaguely human
|
|
Join Date: Jun 2007
Location: London
Posts: 2,519
|
|
Quote:
|
Originally Posted by glarus
Hey Mike,
thanks for your reply.
ProductId is a fk from another table called...product. The idea is that if given a type of discount, the account holder gets different discount rate for each product.
OrdersFrom and OrdersTo is a range of worth of orders placed and the account holder receives different discount for each range (e.g. 1-1000 - 0%, 1001-2000 - 2%, etc.). That's why I cannot put these into the account table. Sorry for not mentioning these before 
|
That changes things so you'll need to disregard my points about the discount tables. I'll have to think again.
|
|

02-12-09, 09:54
|
|
vaguely human
|
|
Join Date: Jun 2007
Location: London
Posts: 2,519
|
|
Quote:
|
Originally Posted by glarus
About the partitioning, I read that this can be done by using views. No need to mention that I dont know exactly how to do it but I was wondering if you can tell me if this would be an efficient solution?
|
Why do you think it would be helpful to have your tables partitioned? is it to make the SQL easier to write? Depending on the RDBMS you use it might be more efficient to use a fixed date in the future (ie '3000-01-01') rather than null. Example SQL :
Code:
# to get next task to do (MySQL)
select *
from Tasks
where completed_time is null
order by requested_time
limit 1
The SQL to provide the discount rates is slightly more complex but if you give me the table names and fields you have then I can give some examples for you.
|
|

02-12-09, 10:33
|
|
Registered User
|
|
Join Date: Dec 2007
Location: London, UK
Posts: 732
|
|
Quote:
|
Originally Posted by glarus
My thoughts are that there is a one-to-one relationship between account and each type of discount. However, an account cannot have 2 discount plans at the same time. With my current design this is possible
|
See if this example helps:
David Portas' Blog : Distributed Keys and Disjoint Subtypes
|
|

02-12-09, 10:36
|
|
Moderator.
|
|
Join Date: Sep 2002
Location: UK
Posts: 5,171
|
|
Quote:
|
Originally Posted by glarus
About the partitioning, I read that this can be done by using views.
|
I imagine what you read was that if you partition a table into smaller tables like TASKS_IN_PROGRESS and TASKS_COMPLETED with the same columns, then you can create a view to present the whole "logical" table like this:
Code:
CREATE VIEW all_tasks AS
SELECT ... FROM tasks_in_progress
UNION ALL
SELECT ... FROM tasks_completed
However, as Mike is telling you already, you are very unlikely to gain any advantage from doing this partitioning. I have only ever worked on one project where table partitioning was seriously contemplated, and this was for a major energy supplier handling nearly a billion rows of historic data, of which it usually only needed to process the latest month's worth.
|
|

02-12-09, 11:30
|
|
SQL Consultant
|
|
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,524
|
|
Quote:
|
Originally Posted by dportas
|
*bookmarked*
very nice example

|
|

02-14-09, 01:46
|
|
Registered User
|
|
Join Date: Feb 2009
Posts: 8
|
|
Quote:
|
Originally Posted by andrewst
I imagine what you read was that if you partition a table into smaller tables like TASKS_IN_PROGRESS and TASKS_COMPLETED with the same columns, then you can create a view to present the whole "logical" table like this:
Code:
CREATE VIEW all_tasks AS
SELECT ... FROM tasks_in_progress
UNION ALL
SELECT ... FROM tasks_completed
|
The other way around  Using view to partition one table into manageable chunks:
Code:
CREATE VIEW tasks_in_progress AS
SELECT * FROM all_tasks WHERE time_completed IS null
However, after further reading on the topic I discarded the idea of using views. I want to partition the tables physically into the design phase. This is, have one table called Tasks_In_progress and another one called Completed_task.
The columns are as follows:
Task_In_progress (TaskId{pk}, TaskTypeId{fk}, Location(where task is carried out), timeStarted, timeCompleted, UserId (employee who has carried out the task).
The idea is to keep a task into the task_in_process table only until its value in timeCompleted IS null. Once the user marks a task completed the record will be moved into the completedTasks table, possibly, with a stored procedure. This way, the performance with the most frequent queries (few every second) like "give me all tasks awaiting processing in Room 1" will be far better because they will query a table which contains only the tasks in process and not the tasks completed in the period between 1994-2009.
However, it turns to be quite complicated.
Quote:
|
However, as Mike is telling you already, you are very unlikely to gain any advantage from doing this partitioning. I have only ever worked on one project where table partitioning was seriously contemplated, and this was for a major energy supplier handling nearly a billion rows of historic data, of which it usually only needed to process the latest month's worth.
|
Yes, I know but how boring life would be if I had a single table called Tasks! 
|
|

02-14-09, 02:16
|
|
Registered User
|
|
Join Date: Feb 2009
Posts: 8
|
|
Quote:
|
Originally Posted by dportas
|
Spot on! Thank you very much!
|
|

02-14-09, 06:31
|
|
Moderator.
|
|
Join Date: Sep 2002
Location: UK
Posts: 5,171
|
|
Quote:
|
Originally Posted by glarus
The other way around  Using view to partition one table into manageable chunks:
Code:
CREATE VIEW tasks_in_progress AS
SELECT * FROM all_tasks WHERE time_completed IS null
|
Which is perfectly sensible as a convenience to querying, but has no bearing on performance whatsoever.
Quote:
|
... This way, the performance with the most frequent queries (few every second) like "give me all tasks awaiting processing in Room 1" will be far better because they will query a table which contains only the tasks in process and not the tasks completed in the period between 1994-2009.
|
Are you sure about that; have you measured the difference in performance?
Quote:
|
However, it turns to be quite complicated.
|
Complicated, and quite possibly of no significant benefit!
Quote:
Yes, I know but how boring life would be if I had a single table called Tasks!
|
Enjoy yourself 
|
|

02-14-09, 07:25
|
|
Resident Curmudgeon
|
|
Join Date: Feb 2004
Location: In front of the computer
Posts: 12,605
|
|
Quote:
|
Originally Posted by glarus
Yes, I know but how boring life would be if I had a single table called Tasks! 
|
While you are in this sort of mood, may I suggest a nice hair shirt and tawse to increase your particular data architecting pleasure? Maybe you can work your way up to a full EAV (Entity-Atribute-Value) implimentation if that hair shirt isn't adequate.
-PatP
|
|

02-14-09, 07:31
|
|
Registered User
|
|
Join Date: Feb 2009
Posts: 8
|
|
Quote:
|
Originally Posted by andrewst
Are you sure about that; have you measured the difference in performance?
|
No, I haven't measured the performance. Actually, I am searching for a book or article on sql complexity but have found nothing so far. My thoughts about performance are intuitive. Table task_in_process will grow and shrink so the issue with the gradually decreasing performance due to growth of the data is eliminated. I know that horizontal partitioning is nothing new, but the difference in partitioning, lets say, sales to salesJanuary, salesFebruary, etc. is that the data stay in these tables because it is sorted according to a static condition, i.e. sale will always be made in January no matter what, while I want to move my tasks across the tables according to a dynamic condition. The question is: what is the cost of inserting and deleteding a raw?
|
|

02-14-09, 07:46
|
|
SQL Consultant
|
|
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,524
|
|
if your tables are properly indexed, the difference in performance in retrieving specific rows from a table, as compared with retrieving same rows from a table with ten times the data, is counter-intuitively not very much different
that's because performance of properly indexed queries does not increase linearly with table size
|
|

02-15-09, 01:05
|
|
Registered User
|
|
Join Date: Feb 2009
Posts: 8
|
|
Quote:
|
Originally Posted by r937
if your tables are properly indexed, the difference in performance in retrieving specific rows from a table, as compared with retrieving same rows from a table with ten times the data, is counter-intuitively not very much different
that's because performance of properly indexed queries does not increase linearly with table size
|
OK! I will just use one table then. Could you recommend me a good book on complexity of database operations? Thank you!
|
|
| Thread Tools |
Search this Thread |
|
|
|
| Display Modes |
Linear Mode
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
|
|