Results 1 to 9 of 9
  1. #1
    Join Date
    Nov 2011
    Posts
    13

    Unanswered: Database Novice – Improve performance by splitting database?

    Hi,

    This is my first post so go easy . I am looking to create a system which will be very database and search dependant.

    When items are posted, they become “searchable” for one month, and then they are achieved. In their achieved state, their information can still be retrieved by the creator of the item, but it cannot be found in the search results. A bit like ebay. The number of “current” items compared to “achieved items will be tiny.

    Now, it is my understanding that, if the database is smaller, I can complete my queries quicker. So, in that vain, is there a way to exclude the achieved items from being queried in the search?

    What would be the best way of achieving this, to use a prefix for the achieved items, move the achieved items so a separate database, or is there some kind of “flag” system within the database to exclude them?

    Any input would be much appreciated!

    James

  2. #2
    Join Date
    Nov 2004
    Location
    out on a limb
    Posts
    13,692
    Provided Answers: 59
    Now, it is my understanding that, if the database is smaller, I can complete my queries quicker
    ...based on?

    unless you know you have a performance problem I wouldn't be making such decisions just yet. the only way you are going to know you have a problem is if you have already experienced such issues or have been advised by someone its a problem. I would expect that with good design your table should eb able to handle multi million rows without significant issues. there's no harm in looking at ways of designing out problems but I wouldn't split data off into an archive table unless I knew it was an issue or there was a good design reason to do (such as security / privacy)

    yes you could use a flag or prefix
    mind you could also index the date an article was posted and exclude results that are more than n whatevers from posting. look at the MySQL date time functions for assitance

    if you want to use a flag then use a boolean datatype. The where clause is
    where myflagcolumn = somestate ' where somestate is true or false
    you could call your flag column IsArchived or IsActive, and the value to use in the where clause depends on what you have called the column and what you are looking for

    say it was called IsArchived so true would mean the record is no longer active, false means active
    then
    WHERE IsArchived = false 'for current records
    OR
    WHERE IsArchived = true 'for inactive records

    I wouldnt' move items to another db, unless you have so many items that it swamps the server. or there is no need for the data to remain visible to all. it would be perfectly legit in my mind to shift dead records to another table after a period of time, assuming that there is no further need to see that 'stuff' adjacent to other stuff. ferinstance if you were running a an auction site
    you could argue that items shouldn't be on the main auction section after a period of time, say six months. after the six month period you could prune these values from the db.
    Last edited by healdem; 11-17-11 at 05:54.
    I'd rather be riding on the Tiger 800 or the Norton

  3. #3
    Join Date
    Nov 2011
    Posts
    13
    Please excuse my ignorance, this really isn’t my field but im trying to get by. Anyway, the database is going to get potentially quite massive, so how about this for a nightly cron job:

    If item entry is more than 6 months old, remove. (I shouldn’t need it at all after 6 months)
    If item entry is more than 1 month only, set IsArchived = true (like you suggested).

    My concern is this. If there is a million rows and 900,000 of them are set to “Inactive”, then when a search is conducted will the entire 1M rows still not be checked for their “IsArchieved” state, and will this not take time? Is there an advantage in terms of performance of checking IsArchieved = True rather than checking IsDate < 1 month?

    Thank you for the input.

  4. #4
    Join Date
    Nov 2004
    Location
    out on a limb
    Posts
    13,692
    Provided Answers: 59
    careful selection of indexes should avoid the SQL engine having to sacn through all 1m, 10m, 10billion rows.

    if you index the posting / completion date you don't need a flag at all
    just exclude in the where clause rows which don't match your criteria

    if you have performance concerns then do some volumetric tests. insert as many rows as you think you need then try to see if their is a performance bottleneck.

    if you index on IsArchived then the SQL engine will use the index to exclude rows not matching the criteira, so you will not be doing a tablescan. if you don't use an index then the SQL engione will have to do a tablescan.
    http://dev.mysql.com/doc/refman/5.0/...able-scan.html

    look at the explain / explain extended to see how MySQL will try to use the indexes at its disposal
    I'd rather be riding on the Tiger 800 or the Norton

  5. #5
    Join Date
    Sep 2009
    Location
    San Sebastian, Spain
    Posts
    880
    If you want to manage large amounts of data have a look at partitioning. This allows you to partition up the data in "buckets" which is still part of the same table. The partitions can be based on any criteria but in your case if you partition on a monthly basis all records for a particular month go into this "bucket". When you want to remove them simply drop the "bucket" or partition.

    But as healdem has said databases are designed for managing huge amounts of data so long as you data model design is good.
    Ronan Cashell
    Certified Oracle DBA/Certified MySQL Expert (DBA & Cluster DBA)
    http://www.it-iss.com
    Follow me on Twitter

  6. #6
    Join Date
    Nov 2003
    Posts
    2,935
    Provided Answers: 12
    Quote Originally Posted by Spadez View Post
    If there is a million rows and 900,000 of them are set to “Inactive”, then when a search is conducted will the entire 1M rows still not be checked for their “IsArchieved” state
    A million rows is not considered "massive" nowadays.

  7. #7
    Join Date
    Nov 2011
    Posts
    13
    Im interested by partitioning. In this case would I need horizontal partitioning (one for “active”, one for “archive”)? Does this currently exist in MYSQL as a lot of information I find online seems to redirect me to sharding.

    I guess what I really need to ask in order to get my head around the best option is, how much time would it add to my overall query time, to search what is effectively 1M rows and discount rows with the “Inactive” flag, rather than simply searching the 100,000 “active” rows for the right answer.

    Its not about the database being large, but it is about being able to return results quickly, as my business depends on it.

  8. #8
    Join Date
    Nov 2011
    Posts
    13
    Bump if I may.

  9. #9
    Join Date
    Nov 2004
    Location
    out on a limb
    Posts
    13,692
    Provided Answers: 59
    do you have any evidence to suggest that you need to split any/everything.

    granted if data is dead / redundant then by all means archive it off.. perhaps to a data warehouse design.
    I'd rather be riding on the Tiger 800 or the Norton

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •