| |
|
If this is your first visit, be sure to check out the FAQ by clicking the link above.
You may have to register before you can post: click the register link above to proceed.
To start viewing messages, select the forum that you want to visit from the selection below.
|
 |

05-11-07, 09:32
|
|
Registered User
|
|
Join Date: Mar 2007
Posts: 194
|
|
|
How to Retrieve IDs of all inserted rows after using Load Infile
|
|
Is there any way to retrieve the ids of all rows inserted after importing bulk data using the load infile technique?
I need the ids of all rows that have been inserted so that they can be inserted in to another table as foriegn keys.
|
|

05-11-07, 12:42
|
|
Jaded Developer
|
|
Join Date: Nov 2004
Location: out on a limb
Posts: 9,262
|
|
you cant, or at least you cant in a meaningfull manner, unless you retain another piece of information (say an old unique key), you can then read that value to find the new PK.
failing that you are going to have to do a data take on.. a quick and dirty program that rites each value individually.
you may be able to make assumptions that the values will be sequential, so you may be able to take a flyer and say fidn the first new value and use that as your fk, it may work, it may not. there is no way of knowing definitively.
__________________
I'd rather be riding my Versys or my Tiger 800 let alone the Norton
|
|

05-11-07, 13:58
|
|
Registered User
|
|
Join Date: May 2007
Posts: 4
|
|
|
|
Perhaps you could use a flag, a boolean column in your table, so it states if your have inserted the PK in the other table already... so after you load the file, that column would be false for the new records, and you could process them, and then turn the flag on.
|
|

05-11-07, 14:52
|
|
SQL Consultant
|
|
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,534
|
|
let's walk through an example
suppose your table already has N rows
now you add P additional rows using LOAD DATA INFILE
the only circumstance under which you could possibly be asking this question involves an auto_increment primary key
so the primary key values for the new rows go from N+1 to N+P or something similar
now, suppose you found a way to get what you're asking for, e.g. the new primary key values went from 4632 to 7683
can you show how you would use these values? surely this other table that you're going to insert values into does not consist of just one column? there are other columns, right? so, how do you match to the correct row?
in other words, how is the row for 5784 going to be any different from the row for 5874 and how would you know the difference?
that's the danger with using auto_increments -- they blind you to the real issues
|
|

05-12-07, 05:50
|
|
Registered User
|
|
Join Date: Mar 2007
Posts: 194
|
|
Quote:
|
Originally Posted by r937
let's walk through an example
suppose your table already has N rows
now you add P additional rows using LOAD DATA INFILE
the only circumstance under which you could possibly be asking this question involves an auto_increment primary key
so the primary key values for the new rows go from N+1 to N+P or something similar
now, suppose you found a way to get what you're asking for, e.g. the new primary key values went from 4632 to 7683
can you show how you would use these values? surely this other table that you're going to insert values into does not consist of just one column? there are other columns, right? so, how do you match to the correct row?
in other words, how is the row for 5784 going to be any different from the row for 5874 and how would you know the difference?
that's the danger with using auto_increments -- they blind you to the real issues
|
I've figured how to do it! I can add two additional temp data fields to the table am bulk inserting into. This could be date_updated, supplier_id. The file am loading will always have a unique supplier_id.
After the bulk insert i can do a Select id, supplier_id into tbl_suppliers From tbl_products WHERE date_updated = today AND supplier_id = 1.
Simple!
|
|

05-14-07, 03:41
|
|
Registered User
|
|
Join Date: Mar 2007
Location: 636f6d7075746572
Posts: 770
|
|
Maybe I haven't thought this through properly, however can't you use a transaction to get the information you're after.
e.g.
Code:
BEGIN
@cur_id = SELECT MAX(ID) FROM [table];
LOAD DATA INFILE ...ETC... ;
INSERT INTO [second_table]
SELECT ID FROM [table] WHERE ID > @cur_id;
END;
|
|

05-14-07, 11:01
|
|
Registered User
|
|
Join Date: Mar 2007
Posts: 194
|
|
Quote:
|
Originally Posted by aschk
Maybe I haven't thought this through properly, however can't you use a transaction to get the information you're after.
e.g.
Code:
BEGIN
@cur_id = SELECT MAX(ID) FROM [table];
LOAD DATA INFILE ...ETC... ;
INSERT INTO [second_table]
SELECT ID FROM [table] WHERE ID > @cur_id;
END;
|
Your idea would work if it was only new records being inserted however i want to use the REPLACE function with LOAD DATA INFILE to replace any rows which are duplicates and I require the ids for these rows aswell. In other words I require the ids of all rows inserted/updated with LOAD DATA INFILE.
This brings me to another question. The second table am inserting into has the id of the first table as a foreign key contraint. If i do a LOAD DATA INFILE with a REPLACE will this delete the entire row that has a duplicate value and then create a new one or does it simply replace the field which has the duplicate value. Becuase if its the former then all ids of duplicate rows will disappear as well this will casue a cascade on all related rows in the second table which is not what I want? hmmm... This is more confusing that i thought.
|
|

05-15-07, 04:26
|
|
Registered User
|
|
Join Date: Mar 2007
Location: 636f6d7075746572
Posts: 770
|
|
Ah interesting, didn't know you were replacing. Use update (in your load data infile) instead, it will give you the same result, and I can guarantee that it will cascade correctly, although theoretically this shouldn't matter because you're not updating the auto_increment number and thus it will remain the same. Why are you replacing rows and not just ignoring them instead? Think of all the overhead you don't need 
Also, if you're using IGNORE instead then you don't need to insert the foreign key into your second table because they already exist, and instead you still only need to put in the new ones (from auto_increment onwards).
|
|

05-15-07, 09:00
|
|
Registered User
|
|
Join Date: Mar 2007
Posts: 194
|
|
Quote:
|
Originally Posted by aschk
Why are you replacing rows and not just ignoring them instead? Think of all the overhead you don't need 
Also, if you're using IGNORE instead then you don't need to insert the foreign key into your second table because they already exist, and instead you still only need to put in the new ones (from auto_increment onwards).
|
The reason why am replacing rows is becuase i need some way to find out if the row already exist in the table - this table is called tbl_products by the way. The second table is called tbl_suppliers. A supplier submits a product catalogue i need to import all these products in that catalogue into tbl_products. If a product is already in the table then i need the id for that product and insert into tbl_suppliers. If a product doesnt exist then a new record needs to be created and the id of this new product needs to be inserted in to tbl_suppliers. suppliers update their catalogue frequently so the second table is updated by deleting all records for that supplier and then i do the above!
The above is the only way i can do this with a bulk import rather that writing a dirty procedure that checks each record, inserting where appropriate and retriveing ids of new records and duplicate records to insert into the second table.
If you know of any better way I would be more that glad to hear of it.
|
|

05-15-07, 09:15
|
|
Jaded Developer
|
|
Join Date: Nov 2004
Location: out on a limb
Posts: 9,262
|
|
first off
there must be enough to make this unique. presumably you must have a supplier ID and what ever makes their products unique (presumably their part/stock number)
is anything stopping you doing a multi stage process
first pass delete any existing products that are not in the new supplier update
then update any existing products that do exist in the supplier update based on what makes the row unique
then add any new products that don't already exist
the fundamental approach is you will need some mechanism which uniquely identifies each row, that both your internal system and the external supplier knows about. it could be that you end up parameterising the update queries so that you supply as part of the parameters or as part of the batch / job control file.
__________________
I'd rather be riding my Versys or my Tiger 800 let alone the Norton
|
|

05-15-07, 10:05
|
|
SQL Consultant
|
|
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,534
|
|
separate tables
bulk import the suppliers' catalogs, tagging each product with the supplier id
maintain your own separate table of products
then you need a relationship table to link your product rows with supplier product rows
don't update your product table
repopulate the relationship table after each supplier upload
more straightforward than messing about with PKs and FKs and UPDATEs and INSERTs
|
|

05-15-07, 15:25
|
|
Registered User
|
|
Join Date: Mar 2007
Posts: 194
|
|
Quote:
|
Originally Posted by r937
separate tables
bulk import the suppliers' catalogs, tagging each product with the supplier id
maintain your own separate table of products
then you need a relationship table to link your product rows with supplier product rows
don't update your product table
repopulate the relationship table after each supplier upload
more straightforward than messing about with PKs and FKs and UPDATEs and INSERTs
|
I think we are all getting confused here. let me start again. this is what I have:
tbl_products
pd_id int autoincrement primary key
pd_name varchar unique
pd_desc text
tmp_updated date
tmp_pdcode varchar
tmp_supplierid int
tbl_suppliers
supplier_id int foriegn key references tbl_supplierdetail
pd_id int foriegn key references tbl_products
pd_code varchar
tbl_products has approx million records. supplier submits a product catalogue for update - approx 10000 products.
This is what I need to do: Delete exisiting catalogue. This i do by deleting all records in tbl_suppliers for this supplier. I cannot delete the referenced products in tbl_products becuase these products may be supplied by several other suppliers. secondly I import the new supplier catalogue into tbl_products
The new supplier catalogue (CSV) contains the following fields:
pd_code
pd_name
pd_desc
supplier_id
I now need to retrieve all the pd_id, tmp_pdcode, tmp_supplierid from tbl_products and insert into tbl_suppliers so that i have an updated supplier catalogue.
The only way i can think of doing this is by adding the additional temp feilds in tbl_products and then doing a LOAD DATA INFILE with REPLACE on the pd_name field on tbl_products. Then doing a select statemnt to retrieve all pd_id, tmp_supplierid, tmp_pdcode WHERE tmp_supplierid = 1 and tmp_updated = today then inserting into tbl_suppliers.
I hope you guys understand what am trying to do now. Any other solutions?
|
|

05-16-07, 03:19
|
|
Jaded Developer
|
|
Join Date: Nov 2004
Location: out on a limb
Posts: 9,262
|
|
have a look at post #10
if you import your CSV data into a temporary table then run 3 queries as suggested there you should get what you wanted done. SQL experts may well be able to reduce that using some trickery (aka knowledge). arguably you may not want to delete old products merely flag them as no longer supplied, in which case rather than deleting the row you set a flag in the product table. In essence your process is all controlled by the way the JOIN is specified.. the JOIN defines how data in table A is related to table B/C/D or whatever. Its perfectly reasonable SQL to use a join that identifies rows / records which are in table A but not in table B, vice versa or rows that do have matches in A & B
To me the columns in your product table identifying the supplier ID and their product code are not TMP at all they are external pieces of data and not transient /temporary at all.. they are vital to the maintinaint the data integrity
so you have a mechanism of relating an external suppliers products to your products, ie where the Supplier ID is the specified one, and the product code matches between the update file and your products table
your supplier ID may need to be tweaked to make it work, depending on your environment you could do it as part of your loading the CSV into a temproary table, it coudl be pat of the batch file / job control, all that matters is that there is a way of identifying what supplier it is. of course what you could do is not supply a supplier ID, run a SQL query to find the closest match (effectively the highest number of matches between a product code between the two tables and the supplier ID. I wouldn't recommend that approach.. but hey its not my problem, it could be a usefull belt and braces solution to make sure that the supplier is a likely fit. say if the supplier ID provided by you was not the same or similar as the highest number of records with the same product ID and matching supplier.
I think your model has a potential problem if say your organisation buys products from more than one supplier but are listed as one product.. examples of this could be generic items like say fruit & vegetables, screws / fasteners.
__________________
I'd rather be riding my Versys or my Tiger 800 let alone the Norton
|
|
| Thread Tools |
Search this Thread |
|
|
|
| Display Modes |
Linear Mode
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
|
|