There is this C program (on a UNIX platform) that I have, which basically reads data from a text file. It subsequently builds up entities based upon the data in the import file and then saves this information to 3-5 sybase tables. The exact number of tables to which data may get saved depends on the complexity of the entity that is built up.
At this present time, this program reads serially from the file, builds one record at a time and saves/commits to the tables. All the tables involved essentially share a similar key say columns A and B. The values for these columns are determined by data entries in the import file. There are no stored procedures in use. The import file may contain data that already exists in the tables. For this reason, for every row, checks are carried out on the key to see if the corresponding row already exists.
If the data already exists, it is blindly deleted first and then subsequently re-inserted.
The task now is to increase the performance of this program by a factor of atleast 2. (ie make it 2 times faster), for large import files, say upto 1000 new records or so....
Can somebody point out what steps need to be taken to achieve this goal?? Or would this be possible at all??
How would I go about this? Can someone point me in the right direction?
or suggest where I can get information.
I have already determined that most of the time taken up by this program is in the form of DB activity, doing the selects, deletes, and inserts, which actually get fairly complicated. Is it worth batching up the SQL, so that I do N inserts at a time. If so what kind of performance gain would that provide?? How about using Stored procs, to do the existence checks?? Would that help??
I am also willing to pursue more radical approaches such as using a shared memory segment, and/or even a separate daemon process to handle the DB activity.
Well, why not use BCP to load the data into a table?
Then once it is in an intermediate table you can use SQL to perform one insert per table, greatly decreasing the commit times you are seeing, e.g.
INSERT INTO table1 SELECT col1, col2 FROM BCPIN WHERE XYZ = 123
INSERT INTO table2 SELECT col1, col2 FROM BCPIN WHERE XYZ = 321
Hmmm, this is definitely more possible. Need to consider, but I am not sure about how it will go down with the clients and our company policies.
This will also mean a great deal of change to the program.
Also we have various foreign key constraints between the tables. One of the tables also has an identity column. Is BCP a viable/feasible means for loading data into such a collection of tables? Is BCP an acknowledged means for importing in data??
Any other possible solutions, I can put forth, which would mean less change to the program?