Results 1 to 5 of 5
  1. #1
    Join Date
    Aug 2014
    Posts
    3

    Question Unanswered: HELP! How to use Identity on non identity column (with concurrence)

    Hi folks, I need some help here.

    I'm working with a third-party database (SQL Server 2005) and the problem here is the following:
    - There are a bunch of ETL processes that needs to insert rows on a table (let's call this table T) and at the same time, an ERP (owner of T) is up and running (reading, updating and inserting on T).
    - The PK of T is an Integer.

    Today all ETL processes uses (select max(ID) + 1 from T) to insert new rows, so just picture the scenario. It is a mess! Everyday they get duplicate key error when 2 or more concurrent processes are trying to insert a row (with the max) at the same time.

    Considering that I can't change the PK, what is the best approach to solve this problem?

    To sum up:
    * I need to have processes in parallel inserting on T
    * I can't change anything on T
    * The PK is NOT an Identity

    Someone, please??

    Thanks A LOT!!!

  2. #2
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    Unfortunately I have a lot more questions than answers for you!

    A lot depends on your ETL processes and the ERP system and how tolerant it can be of locking/blocking issues and if it takes optimistic or pessimistic locks. Can your ETL processes "marshal" or are they synchronous? Can the ETL processes use Row_Number() and CTEs?

    The answers to these questions can send this discussion in dozens of different directions... Give us some breadcrumbs and we'll try to give more specific help!

    -PatP
    In theory, theory and practice are identical. In practice, theory and practice are unrelated.

  3. #3
    Join Date
    Aug 2014
    Posts
    3

    Question

    PatP, I have some answers for you but not all of them.

    "A lot depends on your ETL processes and the ERP system and how tolerant it can be of locking/blocking issues and if it takes optimistic or pessimistic locks"

    I actually don't know anything about the ERP running here (it is not one of the market leaders like SAP). But what they have here is about 50 people using it simultaneously. So, if the ERP is prepared for waiting some locks I think that is not a big problem. My concern is: if I use lock on the table, will the ERP broke? I'm not sure of that.

    "Can your ETL processes "marshal" or are they synchronous?"

    If I understood your "marshal" concept, yes it can be done. Actually I'm thinking of serializing the processes (with some optimization it can be done). So in this scenario the concurrency should be restricted to one process and the ERP, this will at least minimize concurrency

    "Can the ETL processes use Row_Number() and CTEs?"

    Man, I have no idea. I don't have much experience on SQL Server, so I don't know what CTEs are.
    And the Row_Number() I can't see how it is useful in my scenario, can you give me an idea?

    Thanks for your attention pal!

  4. #4
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    BEWARE: There are a lot of assumptions in this idea. You need to confirm that it will work in your environment before you go very far! Without knowing a great deal more I can't give you any "iron clad" guidance.

    Assuming that the ERP system is using pessimistic locks, then I'd probably create a staging table for the inserts into T, then use the a CTE with Row_Number() and the TABLOCKX table hint. The CTE will allow you to batch the whole insert in one operation. The Row_Number() will allow that operation to sequentially number the rows (adding the previous Max() will give you fresh sequential PK values). The TABLOCKX is ugly, but it will guarantee that your ETL and the ERP don't fight by causing your ETL to wait until it can exclusively lock the table for the duration of the INSERT operation.

    You need to research the ideas, then test them in your development environment BEFORE you try them in production! Note that there are clickable links that ought to speed your research in the previous paragraph.

    -PatP
    In theory, theory and practice are identical. In practice, theory and practice are unrelated.

  5. #5
    Join Date
    Aug 2014
    Posts
    3
    Hey Pat,

    I got the concepts, but still can't visualize how to use it all together. I understood the following:
    1) First import the rows without the PKs on the staging table ST.
    2) Insert the rows on T (my target table) using an insert-select on a CTE that query ST and calculate the PK inside the CTE.

    So It would be like this:

    After populating the staging table ST:

    WITH CTE_T AS
    (SELECT <PK>,*... FROM ST)

    <PK> = Expression that gives me the PK (here is my doubt)

    BEGIN TRAN;

    INSERT INTO T
    SELECT * FROM CTE_T;

    COMMIT;

    Is it correct? If so, what do I use on the <PK> expression?

    Thanks for your attention!

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •