Results 1 to 13 of 13
  1. #1
    Join Date
    May 2005
    Posts
    30

    Unanswered: Table Partitioning Candidate Thoughts Please...

    Hi folks,
    First lets clear one thing up: I'm no DBA.

    Now, with that out of the way, I have a table that is basically a repository for raw data that comes into and out of my system from various up and down stream systems. The imcoming is what I'm mostly interested right here.

    The table acts really as a staging and proof of activity repository. As the former I hold the data there until some component wants to take responsbility for turning it into meaningful business data.
    As the latter there can be no arguments over what systems sent what data and what time because I record everything needed to identify the source, delivery mechanism, times, etc. but most importantly the content of the incoming messages themselves.

    The workflow would involve a core component whose job is something like this:
    message arrives -> insert to the repository -> loop forever.
    and a other components whose job's are something like:
    got notification of message -> read contents -> process, etc. -> update status on message to reflect processing result


    Now, this is a high activity area with millions of entries. It's performance is fine now but I want to ensure the design is scalable to a great degree. To this end I'm thinking that as soon as a message has been flagged as processed I really don't need that in the active dataset any more. The hope would be that the components that query and update would then have to deal with a much smaller dataset.

    so, my questions to you real database folk:

    .the inserts: do these slow as the table grows. I have only have an index on the identity field at present but I would need some more to improve a search facility for raw messages (not the contents necessarily but the source for example).

    .my thought was to change the update sproc to actually move any processed records into a separate table. But would this insert to the new table be just as or more costly that the update on the existing table.

    .would table partitioning help me and if so can you elaborate.

    .would holding the status of a message in a completely separate (messageId, statusId) table improve life or would updating that be just as costly give that it will have exactly as many records as the repository table.


    Any other thoughts, ideas, etc. are most welcome. Thing is I have only a short window to do something with this and don't want to do a stupid.

    tnx

  2. #2
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    How many transactions per second and how long does this processing you do take?

    Without further info I personally would:
    Get row from staging table (deleting at the same time - use the OUTPUT clause)
    Process
    Dump into "processed messages" table

    Table partitioning is, IMHO, overkill.
    If your clustered index is the identity then this is optimal for inserts and no, it will not get worse over time.
    Any further indexes will slow down inserting but could aid retrieval, however if you remove rows from this table as you process them then there should be no need for further indexes.
    Testimonial:
    pootle flump
    ur codings are working excelent.

  3. #3
    Join Date
    Nov 2002
    Location
    Jersey
    Posts
    10,322
    Quote Originally Posted by pootle flump
    Table partitioning is, IMHO, overkill.
    Well that depends....

    With the majority of apps that may be true

    That would not be true for EBay, Amazon, etc
    Brett
    8-)

    It's a Great Day for America everybody!

    dbforums Yak CorralRadio 'Rita
    dbForums Member List
    I'm Good Once as I ever was

    The physical order of data in a database has no meaning.

  4. #4
    Join Date
    Jun 2004
    Location
    Long Island
    Posts
    696
    I see most of the benefit loading data with table partitioning, ie:

    switch partition out
    drop indexes
    load data
    rebuild indexes
    switch partition back in

    Especially when you rebuild the index, it can be only at the partition level, as long as the index(es) are aligned with the partition.

    I used this with a company called LoanPerformance which was loan level data which amounted to a 1 billion row table.

  5. #5
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    Quote Originally Posted by Brett Kaiser
    Well that depends....

    With the majority of apps that may be true

    That would not be true for EBay, Amazon, etc
    Um...I was tailoring my answer to the question. I have a sneaky suspicion this isn't a question about EBay.
    Testimonial:
    pootle flump
    ur codings are working excelent.

  6. #6
    Join Date
    May 2005
    Posts
    30
    How many transactions per second and how long does this processing you do take?
    Hi folks, thanks for taking the time to reply (different timezone so apologies for appearing disappeared).

    Now, with regards to transactions per second we're talking reasonably few at the moment (thousands) but there is a major change on the way which will hammer us with an unknown number probably in the hundred of thousands.

    So, what I'm getting is that the inserts and updates based on the indexed Id column will never degrade in performance no matter how many records are in the single repository table right? Of course the more speedy the operations come the slower things get but this has nothing to do with table size?

    About the idea of deleting from the core repository and inserting into a Processed Repository (although I'd do both those steps together after the processing is deemed successful) in real-time. Wouldn't this be more costly than a single update statement on the original repository? Major goal is not to introduce anything that degrades performance even by a millisecond.


    So my only remaining problems are 1. disk spaces,etc. and 2. query times when for example searching for all failed or unprocessed or etc. messages. This doesn't happen often but nevertheless when it does the response has to be good.

    So, partitioning on Processed field, for example, wouldn't be a good solution?

  7. #7
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    Ok - a rate of hundreds of thousands of inserts per second is beyond what I expected and beyond my experience.

    To address some specifics:
    Quote Originally Posted by jhml
    So, what I'm getting is that the inserts and updates based on the indexed Id column will never degrade in performance no matter how many records are in the single repository table right?
    Insert on a monotonically increasing clustered index (e.g. an identity column) is the very fastest way of inserting, faster than if there is no clustered index. Now - there was no mention of updates in the original post IIRC. This is a serious consideration - the IO on a large table is significant, plus if you are updating variable length columns then you could introduce page splits, which you really don't want. As far as possible I would want to restrict operations to this table to insert and delete only.

    Quote Originally Posted by jhml
    About the idea of deleting from the core repository and inserting into a Processed Repository (although I'd do both those steps together after the processing is deemed successful) in real-time. Wouldn't this be more costly than a single update statement on the original repository? Major goal is not to introduce anything that degrades performance even by a millisecond.
    Not necessarily. Note the above re page splits. Remember also that a select and update is not "free" - there is the IO cost of traversing the clustered and non clustered B-Trees to consider which, in a table that is forever growing as your is, will get more expensive over time.

    Quote Originally Posted by jhml
    So, partitioning on Processed field, for example, wouldn't be a good solution?
    When this gets updated, the row moves from one partition to another. That's pretty close to a delete and insert. However, you are more likely to introduce splits. In effect, it would really be the same as I am suggesting with a higher admin overhead and, I would suspect, less efficient.
    Testimonial:
    pootle flump
    ur codings are working excelent.

  8. #8
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    Other points - if you decide to update in situ you will have to maintain non-clustered index(es) which will increase the insert cost.
    Testimonial:
    pootle flump
    ur codings are working excelent.

  9. #9
    Join Date
    May 2005
    Posts
    30
    Thanks again Pootle Flump. This is really helping me out.

    I did mention the update in the original post:

    and a other components whose job's are something like:
    got notification of message -> read contents -> process, etc. -> update status on message to reflect processing result
    The updates mostly only affect the Processed Status field although if there is a failure I record the failure reason too. Other than that its just a couple of lastmodified-type fields.


    I'm struggling with the Partitioned idea. From the SQLServer documentation there is an example (ms-help://MS.SQLCC.v9/MS.SQLSVR.v9.en/udb9/html/e572b696-3e07-411c-ba65-5bd192bbe891.htm) which states:

    Because the tables are partitioned, the transfer of monthly "chunks" of data between the tables typically takes just seconds, instead of the minutes or hours it took in previous releases. This is because it is a metadata operation only, instead of a physical relocation of the data.
    See, in my (old world) mind I'd be inserting processed messages into a different table and deleting from the main table as part of a transaction. This quote seems to suggest that partitioning would be a faster, better approach.

    Can you fill me in on what I'm missing about partitioning?

    Again, I really appreciate you taking the time out to respond.

  10. #10
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    You are missing nothing - my mistake. All of the partitioned tables I've worked with have fixed partitioning keys so although I would have once read that I confess I didn't know it when I responded!

    My apologies - partitioning looks much more attractive.

    Also, if the update is to a fixed width code there should be no page splits either on update.
    Testimonial:
    pootle flump
    ur codings are working excelent.

  11. #11
    Join Date
    May 2005
    Posts
    30
    Thanks again. I think I need to seek out a DBA but this has been so useful in me getting a handle on what that conversation will be.

  12. #12
    Join Date
    Nov 2002
    Location
    Jersey
    Posts
    10,322
    Wait a minute....

    Hundreds of thousands....a second

    I don't think so


    What? Did Michael Jackson die or something?

    Are you giving out free money?

    What is it that you do?
    Brett
    8-)

    It's a Great Day for America everybody!

    dbforums Yak CorralRadio 'Rita
    dbForums Member List
    I'm Good Once as I ever was

    The physical order of data in a database has no meaning.

  13. #13
    Join Date
    Nov 2004
    Location
    on the wrong server
    Posts
    8,835
    Provided Answers: 6
    my personal benchmarking tests found little increase in read performance.
    “If one brings so much courage to this world the world has to kill them or break them, so of course it kills them. The world breaks every one and afterward many are strong at the broken places. But those that will not break it kills. It kills the very good and the very gentle and the very brave impartially. If you are none of these you can be sure it will kill you too but there will be no special hurry.” Earnest Hemingway, A Farewell To Arms.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •