Results 1 to 6 of 6
  1. #1
    Join Date
    Mar 2003
    Posts
    97

    Unanswered: Do I really need a cursor?

    I've built an application to import transactions into the database. Bad transactions go in a separate table and dupe transactions get updated. Currently, it takes about 2 hours to import ~40K records using the code below. Obviously I'd like this to run as fast as possible and since cursors are a real drag I was wondering if there was a more efficient way to accomplish this.

    DECLARE
    @contact_id int,
    @product_code char(9),
    @status_date datetime,
    @business_code char(4),
    @expire_date datetime,
    @prod_status char(4),
    @transaction_id int,
    @emailAddress varchar(50),
    @journal_id int



    BEGIN TRAN
    DECLARE transaction_import_cursor CURSOR
    FOR SELECT transaction_id, product_code, emailAddress, status_date, business_code, expire_date, prod_status from transactions_batch_tmp
    OPEN transaction_import_cursor
    FETCH NEXT FROM transaction_import_cursor INTO @transaction_id, @product_code, @emailAddress, @status_date, @business_code, @expire_date, @prod_status
    WHILE (@@FETCH_STATUS = 0)
    BEGIN
    SELECT top 1 contacts.contact_id AS contact_id, transactions_batch_tmp.status_date AS status_date, transactions_batch_tmp.product_code AS product_code,
    transactions_batch_tmp.business_code AS business_code, transactions_batch_tmp.expire_date AS expire_date,
    transactions_batch_tmp.prod_status AS product_status
    FROM transactions_batch_tmp INNER JOIN
    journal INNER JOIN
    contacts ON journal.contact_id = contacts.contact_id ON transactions_batch_tmp.emailAddress = contacts.emailAddress AND
    transactions_batch_tmp.product_code = journal.product_code INNER JOIN
    products ON transactions_batch_tmp.product_code = products.product_code
    WHERE rtrim(ltrim(contacts.emailAddress)) = @emailAddress AND journal.product_code = @product_code
    ORDER BY transactions_batch_tmp.status_date desc
    IF @@ROWCOUNT = 0
    BEGIN
    print 'NEW transaction! ' + @product_code + @emailAddress
    insert into journal (contact_id, product_code, status_date, business_code, expire_date, entryTypeID, product_status, date_entered)
    SELECT distinct rtrim(ltrim(contacts.contact_id)) as cid, rtrim(ltrim(products.product_code)), transactions_batch_tmp.status_date,
    rtrim(ltrim(transactions_batch_tmp.business_code)) , transactions_batch_tmp.expire_date, 21, rtrim(ltrim(transactions_batch_tmp.prod_status)), getDate()
    FROM contacts INNER JOIN (transactions_batch_tmp INNER JOIN products ON transactions_batch_tmp.product_code=products.produ ct_code) ON contacts.emailAddress=transactions_batch_tmp.email Address
    WHERE transactions_batch_tmp.transaction_id=@transaction _id
    END
    ELSE
    BEGIN
    --print 'UPDATE transaction! ' + @product_code + @emailAddress
    UPDATE journal
    SET status_date =
    (SELECT max(tmp.status_date)
    FROM transactions_batch_tmp tmp, contacts c, products p, journal j
    WHERE tmp.emailaddress = @emailAddress
    AND tmp.emailaddress = rtrim(c.emailaddress)
    AND c.contact_id = j.contact_id
    AND j.product_code = @product_code
    AND j.product_code = tmp.product_code)
    FROM transactions_batch_tmp tmp, contacts c, products p, journal j
    WHERE tmp.emailaddress = @emailAddress
    AND tmp.emailaddress = rtrim(c.emailaddress)
    AND c.contact_id = j.contact_id
    AND j.product_code = @product_code
    AND j.product_code = tmp.product_code
    END
    FETCH NEXT FROM transaction_import_cursor INTO @transaction_id, @product_code, @emailAddress, @status_date, @business_code, @expire_date, @prod_status
    END
    CLOSE transaction_import_cursor
    DEALLOCATE transaction_import_cursor
    COMMIT TRAN

    /** purge data from temp error table before writing bad records for this batch **/
    truncate table tran_import_error;

    /** write bad records (missing product code or email address) to temp_error table **/
    insert into tran_import_error (transaction_id, product_code, emailAddress, date_entered)
    SELECT DISTINCT transactions_batch_tmp.transaction_id, transactions_batch_tmp.product_code, transactions_batch_tmp.emailAddress, getDate()
    FROM transactions_batch_tmp
    where transactions_batch_tmp.emailaddress not in (select emailaddress from contacts)
    OR
    transactions_batch_tmp.product_code not in (select product_code from products)


    TIA

  2. #2
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Provided Answers: 1
    I don't see anything in your code that requires a cursor. It would run much faster as set-based INSERT and UPDATE statements.
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  3. #3
    Join Date
    Mar 2003
    Posts
    97
    Well, how would I handle the update part without a cursor? I need to make sure that *only* unique contact_id-product_code values exist in the journal table.


    Thanks.

  4. #4
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Provided Answers: 1
    Add some bit flag and notes columns to your import table. Then you can run data checks against the records prior to importing them. Flag any duplicates or bad records and add a note as to why they were flagged. Then import only the non-flagged records. Delete the non-flagged records when you are done, and you are left with a list of bad records that you can review or discard.
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  5. #5
    Join Date
    Mar 2003
    Posts
    97
    blindman - Thanks for your help.

    I'm almost there (I hope), but was wondering if there was a more efficient way to delete the dupe records than having to write two separate queries. I need to keep the most recent product_code-status_date transaction for *each* person. This runs after I insert ALL the records in the journal table.

    --delete dupe trans with status_date as the flag
    DELETE journal FROM journal, contacts
    JOIN
    (select product_code, contact_id, max(status_date) as max_status_date
    from journal
    group by product_code, contact_id) AS G
    ON G.[contact_id] = contacts.[contact_id]
    WHERE journal.[status_date] < G.[max_status_date]
    AND G.[product_code] = journal.[product_code];

    --delete dupe trans with journal_id as the flag
    DELETE journal FROM journal, contacts
    JOIN
    (select product_code, contact_id, max(journal_id) as maxID
    from journal
    group by product_code, contact_id) AS G
    ON G.[contact_id] = contacts.[contact_id]
    WHERE journal.[journal_id] < G.[MaxID]
    AND G.[product_code] = journal.[product_code];


    Thanks again.

  6. #6
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Provided Answers: 1
    The contact table has nothing to do with your delete, except to limit the deleted records to those that have a contact_id. I assume that contact_id is part of journal's natural key and that all records have a valid contact_id, so drop if from both your queries. (If you do need it for filtering, join it in the subquery.)

    --delete dupe trans with status_date as the flag
    DELETE
    FROM journal
    INNER JOIN
    (select product_code, contact_id, max(status_date) as max_status_date
    from journal
    group by product_code, contact_id) AS G
    ON journal.[product_code] = G.[product_code]
    and journal.[contact_id] = G.[contact_id]
    and journal.[status_date] < G.[max_status_date]

    --delete dupe trans with journal_id as the flag
    DELETE journal
    FROM journal
    INNER JOIN
    (select product_code, contact_id, max(journal_id) as maxID
    from journal
    group by product_code, contact_id) AS G
    ON journal.[product_code] = G.[product_code]
    and journal.[contact_id] = G.[contact_id]
    and journal.[journal_id] < G.[MaxID]

    It also appears that the first query should handle all product_code/contact_id duplicates except those with that share exactly the same status_date. If status_date stores only whole-date values, then I guess I see the point of the second delete statement, but otherwise I wouldn't expect you to get a high rowcount from it.

    Now to your question; can this be done as a single SQL statement? Yes, but it would essentially require two nested subqueries, so I don't think you would get a big performance boost from it, and you would certainly have to sacrifice code clarity. I recommend that you leave it as two separate deletes.
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •