If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > Database Server Software > MySQL > Database Novice – Improve performance by splitting database?

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 11-17-11, 04:27
Spadez Spadez is offline
Registered User
 
Join Date: Nov 2011
Posts: 9
Database Novice – Improve performance by splitting database?

Hi,

This is my first post so go easy . I am looking to create a system which will be very database and search dependant.

When items are posted, they become “searchable” for one month, and then they are achieved. In their achieved state, their information can still be retrieved by the creator of the item, but it cannot be found in the search results. A bit like ebay. The number of “current” items compared to “achieved items will be tiny.

Now, it is my understanding that, if the database is smaller, I can complete my queries quicker. So, in that vain, is there a way to exclude the achieved items from being queried in the search?

What would be the best way of achieving this, to use a prefix for the achieved items, move the achieved items so a separate database, or is there some kind of “flag” system within the database to exclude them?

Any input would be much appreciated!

James
Reply With Quote
  #2 (permalink)  
Old 11-17-11, 04:44
healdem healdem is online now
Jaded Developer
 
Join Date: Nov 2004
Location: out on a limb
Posts: 9,250
Quote:
Now, it is my understanding that, if the database is smaller, I can complete my queries quicker
...based on?

unless you know you have a performance problem I wouldn't be making such decisions just yet. the only way you are going to know you have a problem is if you have already experienced such issues or have been advised by someone its a problem. I would expect that with good design your table should eb able to handle multi million rows without significant issues. there's no harm in looking at ways of designing out problems but I wouldn't split data off into an archive table unless I knew it was an issue or there was a good design reason to do (such as security / privacy)

yes you could use a flag or prefix
mind you could also index the date an article was posted and exclude results that are more than n whatevers from posting. look at the MySQL date time functions for assitance

if you want to use a flag then use a boolean datatype. The where clause is
where myflagcolumn = somestate ' where somestate is true or false
you could call your flag column IsArchived or IsActive, and the value to use in the where clause depends on what you have called the column and what you are looking for

say it was called IsArchived so true would mean the record is no longer active, false means active
then
WHERE IsArchived = false 'for current records
OR
WHERE IsArchived = true 'for inactive records

I wouldnt' move items to another db, unless you have so many items that it swamps the server. or there is no need for the data to remain visible to all. it would be perfectly legit in my mind to shift dead records to another table after a period of time, assuming that there is no further need to see that 'stuff' adjacent to other stuff. ferinstance if you were running a an auction site
you could argue that items shouldn't be on the main auction section after a period of time, say six months. after the six month period you could prune these values from the db.
__________________
I'd rather be riding my Versys or my Tiger 800 let alone the Norton

Last edited by healdem; 11-17-11 at 04:54.
Reply With Quote
  #3 (permalink)  
Old 11-17-11, 04:55
Spadez Spadez is offline
Registered User
 
Join Date: Nov 2011
Posts: 9
Please excuse my ignorance, this really isn’t my field but im trying to get by. Anyway, the database is going to get potentially quite massive, so how about this for a nightly cron job:

If item entry is more than 6 months old, remove. (I shouldn’t need it at all after 6 months)
If item entry is more than 1 month only, set IsArchived = true (like you suggested).

My concern is this. If there is a million rows and 900,000 of them are set to “Inactive”, then when a search is conducted will the entire 1M rows still not be checked for their “IsArchieved” state, and will this not take time? Is there an advantage in terms of performance of checking IsArchieved = True rather than checking IsDate < 1 month?

Thank you for the input.
Reply With Quote
  #4 (permalink)  
Old 11-17-11, 05:27
healdem healdem is online now
Jaded Developer
 
Join Date: Nov 2004
Location: out on a limb
Posts: 9,250
careful selection of indexes should avoid the SQL engine having to sacn through all 1m, 10m, 10billion rows.

if you index the posting / completion date you don't need a flag at all
just exclude in the where clause rows which don't match your criteria

if you have performance concerns then do some volumetric tests. insert as many rows as you think you need then try to see if their is a performance bottleneck.

if you index on IsArchived then the SQL engine will use the index to exclude rows not matching the criteira, so you will not be doing a tablescan. if you don't use an index then the SQL engione will have to do a tablescan.
http://dev.mysql.com/doc/refman/5.0/...able-scan.html

look at the explain / explain extended to see how MySQL will try to use the indexes at its disposal
__________________
I'd rather be riding my Versys or my Tiger 800 let alone the Norton
Reply With Quote
  #5 (permalink)  
Old 11-17-11, 06:39
it-iss.com it-iss.com is offline
Registered User
 
Join Date: Sep 2009
Location: San Sebastian, Spain
Posts: 620
If you want to manage large amounts of data have a look at partitioning. This allows you to partition up the data in "buckets" which is still part of the same table. The partitions can be based on any criteria but in your case if you partition on a monthly basis all records for a particular month go into this "bucket". When you want to remove them simply drop the "bucket" or partition.

But as healdem has said databases are designed for managing huge amounts of data so long as you data model design is good.
__________________
Ronan Cashell
Senior Oracle/MySQL DBA
http://www.it-iss.com
Reply With Quote
  #6 (permalink)  
Old 11-17-11, 06:41
shammat shammat is offline
Registered User
 
Join Date: Nov 2003
Posts: 2,407
Quote:
Originally Posted by Spadez View Post
If there is a million rows and 900,000 of them are set to “Inactive”, then when a search is conducted will the entire 1M rows still not be checked for their “IsArchieved” state
A million rows is not considered "massive" nowadays.
Reply With Quote
  #7 (permalink)  
Old 11-17-11, 10:19
Spadez Spadez is offline
Registered User
 
Join Date: Nov 2011
Posts: 9
Im interested by partitioning. In this case would I need horizontal partitioning (one for “active”, one for “archive”)? Does this currently exist in MYSQL as a lot of information I find online seems to redirect me to sharding.

I guess what I really need to ask in order to get my head around the best option is, how much time would it add to my overall query time, to search what is effectively 1M rows and discount rows with the “Inactive” flag, rather than simply searching the 100,000 “active” rows for the right answer.

Its not about the database being large, but it is about being able to return results quickly, as my business depends on it.
Reply With Quote
  #8 (permalink)  
Old 11-20-11, 16:25
Spadez Spadez is offline
Registered User
 
Join Date: Nov 2011
Posts: 9
Bump if I may.
Reply With Quote
  #9 (permalink)  
Old 11-20-11, 16:54
healdem healdem is online now
Jaded Developer
 
Join Date: Nov 2004
Location: out on a limb
Posts: 9,250
do you have any evidence to suggest that you need to split any/everything.

granted if data is dead / redundant then by all means archive it off.. perhaps to a data warehouse design.
__________________
I'd rather be riding my Versys or my Tiger 800 let alone the Norton
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On