Results 1 to 13 of 13
  1. #1
    Join Date
    Mar 2007
    Posts
    212

    Unanswered: How to Retrieve IDs of all inserted rows after using Load Infile

    Is there any way to retrieve the ids of all rows inserted after importing bulk data using the load infile technique?

    I need the ids of all rows that have been inserted so that they can be inserted in to another table as foriegn keys.

  2. #2
    Join Date
    Nov 2004
    Location
    out on a limb
    Posts
    13,692
    Provided Answers: 59
    you cant, or at least you cant in a meaningfull manner, unless you retain another piece of information (say an old unique key), you can then read that value to find the new PK.

    failing that you are going to have to do a data take on.. a quick and dirty program that rites each value individually.

    you may be able to make assumptions that the values will be sequential, so you may be able to take a flyer and say fidn the first new value and use that as your fk, it may work, it may not. there is no way of knowing definitively.
    I'd rather be riding on the Tiger 800 or the Norton

  3. #3
    Join Date
    May 2007
    Posts
    4
    Perhaps you could use a flag, a boolean column in your table, so it states if your have inserted the PK in the other table already... so after you load the file, that column would be false for the new records, and you could process them, and then turn the flag on.

  4. #4
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    let's walk through an example

    suppose your table already has N rows

    now you add P additional rows using LOAD DATA INFILE

    the only circumstance under which you could possibly be asking this question involves an auto_increment primary key

    so the primary key values for the new rows go from N+1 to N+P or something similar

    now, suppose you found a way to get what you're asking for, e.g. the new primary key values went from 4632 to 7683

    can you show how you would use these values? surely this other table that you're going to insert values into does not consist of just one column? there are other columns, right? so, how do you match to the correct row?

    in other words, how is the row for 5784 going to be any different from the row for 5874 and how would you know the difference?

    that's the danger with using auto_increments -- they blind you to the real issues
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  5. #5
    Join Date
    Mar 2007
    Posts
    212
    Quote Originally Posted by r937
    let's walk through an example

    suppose your table already has N rows

    now you add P additional rows using LOAD DATA INFILE

    the only circumstance under which you could possibly be asking this question involves an auto_increment primary key

    so the primary key values for the new rows go from N+1 to N+P or something similar

    now, suppose you found a way to get what you're asking for, e.g. the new primary key values went from 4632 to 7683

    can you show how you would use these values? surely this other table that you're going to insert values into does not consist of just one column? there are other columns, right? so, how do you match to the correct row?

    in other words, how is the row for 5784 going to be any different from the row for 5874 and how would you know the difference?

    that's the danger with using auto_increments -- they blind you to the real issues
    I've figured how to do it! I can add two additional temp data fields to the table am bulk inserting into. This could be date_updated, supplier_id. The file am loading will always have a unique supplier_id.

    After the bulk insert i can do a Select id, supplier_id into tbl_suppliers From tbl_products WHERE date_updated = today AND supplier_id = 1.

    Simple!

  6. #6
    Join Date
    Mar 2007
    Location
    636f6d7075746572
    Posts
    770
    Maybe I haven't thought this through properly, however can't you use a transaction to get the information you're after.
    e.g.

    Code:
    BEGIN
      @cur_id = SELECT MAX(ID) FROM [table];
      LOAD DATA INFILE ...ETC... ;
      INSERT INTO [second_table] 
      SELECT ID FROM [table] WHERE ID > @cur_id;
    END;

  7. #7
    Join Date
    Mar 2007
    Posts
    212
    Quote Originally Posted by aschk
    Maybe I haven't thought this through properly, however can't you use a transaction to get the information you're after.
    e.g.

    Code:
    BEGIN
      @cur_id = SELECT MAX(ID) FROM [table];
      LOAD DATA INFILE ...ETC... ;
      INSERT INTO [second_table] 
      SELECT ID FROM [table] WHERE ID > @cur_id;
    END;
    Your idea would work if it was only new records being inserted however i want to use the REPLACE function with LOAD DATA INFILE to replace any rows which are duplicates and I require the ids for these rows aswell. In other words I require the ids of all rows inserted/updated with LOAD DATA INFILE.


    This brings me to another question. The second table am inserting into has the id of the first table as a foreign key contraint. If i do a LOAD DATA INFILE with a REPLACE will this delete the entire row that has a duplicate value and then create a new one or does it simply replace the field which has the duplicate value. Becuase if its the former then all ids of duplicate rows will disappear as well this will casue a cascade on all related rows in the second table which is not what I want? hmmm... This is more confusing that i thought.

  8. #8
    Join Date
    Mar 2007
    Location
    636f6d7075746572
    Posts
    770
    Ah interesting, didn't know you were replacing. Use update (in your load data infile) instead, it will give you the same result, and I can guarantee that it will cascade correctly, although theoretically this shouldn't matter because you're not updating the auto_increment number and thus it will remain the same. Why are you replacing rows and not just ignoring them instead? Think of all the overhead you don't need
    Also, if you're using IGNORE instead then you don't need to insert the foreign key into your second table because they already exist, and instead you still only need to put in the new ones (from auto_increment onwards).

  9. #9
    Join Date
    Mar 2007
    Posts
    212
    Quote Originally Posted by aschk
    Why are you replacing rows and not just ignoring them instead? Think of all the overhead you don't need
    Also, if you're using IGNORE instead then you don't need to insert the foreign key into your second table because they already exist, and instead you still only need to put in the new ones (from auto_increment onwards).
    The reason why am replacing rows is becuase i need some way to find out if the row already exist in the table - this table is called tbl_products by the way. The second table is called tbl_suppliers. A supplier submits a product catalogue i need to import all these products in that catalogue into tbl_products. If a product is already in the table then i need the id for that product and insert into tbl_suppliers. If a product doesnt exist then a new record needs to be created and the id of this new product needs to be inserted in to tbl_suppliers. suppliers update their catalogue frequently so the second table is updated by deleting all records for that supplier and then i do the above!

    The above is the only way i can do this with a bulk import rather that writing a dirty procedure that checks each record, inserting where appropriate and retriveing ids of new records and duplicate records to insert into the second table.

    If you know of any better way I would be more that glad to hear of it.

  10. #10
    Join Date
    Nov 2004
    Location
    out on a limb
    Posts
    13,692
    Provided Answers: 59
    first off
    there must be enough to make this unique. presumably you must have a supplier ID and what ever makes their products unique (presumably their part/stock number)

    is anything stopping you doing a multi stage process
    first pass delete any existing products that are not in the new supplier update
    then update any existing products that do exist in the supplier update based on what makes the row unique
    then add any new products that don't already exist

    the fundamental approach is you will need some mechanism which uniquely identifies each row, that both your internal system and the external supplier knows about. it could be that you end up parameterising the update queries so that you supply as part of the parameters or as part of the batch / job control file.
    I'd rather be riding on the Tiger 800 or the Norton

  11. #11
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    separate tables

    bulk import the suppliers' catalogs, tagging each product with the supplier id

    maintain your own separate table of products

    then you need a relationship table to link your product rows with supplier product rows

    don't update your product table

    repopulate the relationship table after each supplier upload

    more straightforward than messing about with PKs and FKs and UPDATEs and INSERTs
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  12. #12
    Join Date
    Mar 2007
    Posts
    212
    Quote Originally Posted by r937
    separate tables

    bulk import the suppliers' catalogs, tagging each product with the supplier id

    maintain your own separate table of products

    then you need a relationship table to link your product rows with supplier product rows

    don't update your product table

    repopulate the relationship table after each supplier upload

    more straightforward than messing about with PKs and FKs and UPDATEs and INSERTs
    I think we are all getting confused here. let me start again. this is what I have:

    tbl_products
    pd_id int autoincrement primary key
    pd_name varchar unique
    pd_desc text
    tmp_updated date
    tmp_pdcode varchar
    tmp_supplierid int


    tbl_suppliers
    supplier_id int foriegn key references tbl_supplierdetail
    pd_id int foriegn key references tbl_products
    pd_code varchar

    tbl_products has approx million records. supplier submits a product catalogue for update - approx 10000 products.

    This is what I need to do: Delete exisiting catalogue. This i do by deleting all records in tbl_suppliers for this supplier. I cannot delete the referenced products in tbl_products becuase these products may be supplied by several other suppliers. secondly I import the new supplier catalogue into tbl_products

    The new supplier catalogue (CSV) contains the following fields:
    pd_code
    pd_name
    pd_desc
    supplier_id

    I now need to retrieve all the pd_id, tmp_pdcode, tmp_supplierid from tbl_products and insert into tbl_suppliers so that i have an updated supplier catalogue.

    The only way i can think of doing this is by adding the additional temp feilds in tbl_products and then doing a LOAD DATA INFILE with REPLACE on the pd_name field on tbl_products. Then doing a select statemnt to retrieve all pd_id, tmp_supplierid, tmp_pdcode WHERE tmp_supplierid = 1 and tmp_updated = today then inserting into tbl_suppliers.

    I hope you guys understand what am trying to do now. Any other solutions?

  13. #13
    Join Date
    Nov 2004
    Location
    out on a limb
    Posts
    13,692
    Provided Answers: 59
    have a look at post #10

    if you import your CSV data into a temporary table then run 3 queries as suggested there you should get what you wanted done. SQL experts may well be able to reduce that using some trickery (aka knowledge). arguably you may not want to delete old products merely flag them as no longer supplied, in which case rather than deleting the row you set a flag in the product table. In essence your process is all controlled by the way the JOIN is specified.. the JOIN defines how data in table A is related to table B/C/D or whatever. Its perfectly reasonable SQL to use a join that identifies rows / records which are in table A but not in table B, vice versa or rows that do have matches in A & B

    To me the columns in your product table identifying the supplier ID and their product code are not TMP at all they are external pieces of data and not transient /temporary at all.. they are vital to the maintinaint the data integrity

    so you have a mechanism of relating an external suppliers products to your products, ie where the Supplier ID is the specified one, and the product code matches between the update file and your products table

    your supplier ID may need to be tweaked to make it work, depending on your environment you could do it as part of your loading the CSV into a temproary table, it coudl be pat of the batch file / job control, all that matters is that there is a way of identifying what supplier it is. of course what you could do is not supply a supplier ID, run a SQL query to find the closest match (effectively the highest number of matches between a product code between the two tables and the supplier ID. I wouldn't recommend that approach.. but hey its not my problem, it could be a usefull belt and braces solution to make sure that the supplier is a likely fit. say if the supplier ID provided by you was not the same or similar as the highest number of records with the same product ID and matching supplier.

    I think your model has a potential problem if say your organisation buys products from more than one supplier but are listed as one product.. examples of this could be generic items like say fruit & vegetables, screws / fasteners.
    I'd rather be riding on the Tiger 800 or the Norton

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •