Results 1 to 4 of 4
  1. #1
    Join Date
    Nov 2009
    Posts
    10

    Unanswered: Primary Key on GUID/UUID column in MDC table

    Hi all,

    We use DB2 9.7 @ Windows XP.

    The table design:

    It's an MDC Table with 2 dimensions (ORGANIZE BY DIMENSIONS (Country,Year)) and 10 further columns.
    The table is marked as volatile.

    Each record has an ID column - that is a GUID (128 Bit value)

    Each ID value will be generated from an external application.

    The ID column is defined as CHAR(16) FOR BIT DATA - to save storage.
    We want to use the ID column as PK.

    Thus, we have the MDC indexes and the PK index.

    Now, when we enter data into that table .

    Starting with an empty table:

    Inserting of 100K until 600k rows in that table is very good - no problem in speed.

    But from then, each INSERT of further new rows becomes slow.

    When we reach 900k rows entered in the table: We have to wait 2x times during each new INSERT operation.

    When we reach 1300k rows entered in the table: We have to wait 4x times during each new INSERT operation.



    As more and more rows are entered - the waiting time becomes very high.

    Our current approaches:

    A) We drop the index and rebuild the index and start the bulk insert again.

    Pro: It helps a bit - but in general the insert speed of the first 600k will not be achieved any more.
    Con: We have to account the Drop/Rebuild time.


    B) We drop the PK index, do the bulk insert and then rebuild the index.

    Pro: The INSERT command is as fast as the first entered 600k rows even when we have 2000k rows already in the table.
    Con: We have to account the Drop/Rebuild time.

    When the DB should go into production is not very practical to drop/rebuild.

    The following 2 queries are the main focus:

    A) Select xyz from mdctable where Country = ? AND Year = ?
    B) Select xyz from mdctable where ID = ?


    Our thinking: The PK ID index on the GUID is not very easy to handle from DB2 maybe much skewed distribution of the keys. We also tried to play with PCTFREE but with results we cannot accept.

    Maybe you have some good recommendations or hints.

    Thank you.

    Cheers,
    2qwerty

  2. #2
    Join Date
    Jun 2003
    Location
    Toronto, Canada
    Posts
    5,516
    Provided Answers: 1
    I guess there are some reasons that prevent you from using LOAD instead...

    If the new PK values are increasing and not random, I don't think increasing PCTFREE will do much good. If they keys are random, try increasing both LEVEL2 PCTFREE and leaf level PCTFREE as much as possible. Also, you may want to try pausing the insert process from time to time and updating the PK index statistics. Based on what you're saying, doing that after every 500K records would make sense.
    ---
    "It does not work" is not a valid problem statement.

  3. #3
    Join Date
    Nov 2009
    Posts
    10
    LOAD ... is no option for us.

    The PK values are always random and unique in time/space - at least until year 3000.

    Remeber the PK is uniform distributed over 128bit value range ...

    If they keys are random, try increasing both LEVEL2 PCTFREE and leaf level PCTFREE as much as possible.
    What is as much as possible ? Is there a way to guess/calc the values 0 ... 100?


    All I know, each index is some BTree with serveral levels.

    How is that handled for char (16) bit data ?

    How many keys can be stored in one leaf, what does it depends on ?

    Is BTree the only way, maybe hashing?

    What if we sort the data on PK keys before executing the INSERT command ?

    Also, you may want to try pausing the insert process from time to time and updating the PK index statistics.
    We did not find any INSERT relevant behavior between updated or not updated statistics. Thus, we marked the table as volatile.

    Based on what you're saying, doing that after every 500K records would make sense.
    Imagine, we would have a table of 25000K rows or more - the drop/rebuild procedure could take a long time. And even in doing so the INSERT was not really faster as seen in our approach A.

    Approach B would means that even when I want to insert little new 1000 rows: I would need to guess/profile the used drop/rebuild-procedure time (based on last run ?) and decide on the amount of the to be inserted rows between doing the dropping procedure or dont.

    This sound very strange to me.

    What if we go for a 'partial' index - maybe using an ALWAYS GENERATED column ala LEFT(PKID, X) and dismiss the PK ?

    Or we use a 2nd table for the GUID keys?

    Only open questions to me - sorry.

    2qwerty

  4. #4
    Join Date
    Jun 2003
    Location
    Toronto, Canada
    Posts
    5,516
    Provided Answers: 1
    INSERTing 25 million rows is just asking for trouble. I think it is obvious that the problem is in having to update the index for each inserted record. If you cannot eliminate that by using LOAD or dropping and re-creating the index, I'd say your only other option is to try and speed that up. Minimize page splits by setting both PCTFREE values to something like 90 (the maximum is 99). May be put the PK index into a separate tablespace with its own bufferpool, large enough to keep the entire index in memory.

    Quote Originally Posted by 2.q.w.e.r.t.y
    How is that handled for char (16) bit data ?

    How many keys can be stored in one leaf, what does it depends on ?
    It depends on the page size, as determined by the tablespace, and the key length (plus some overhead).


    Quote Originally Posted by 2.q.w.e.r.t.y

    What if we sort the data on PK keys before executing the INSERT command ?
    I don't think it will make any difference.
    ---
    "It does not work" is not a valid problem statement.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •