Unanswered: performance differences between queries...
I am updating a db with data from a file, in this data we have new info, info that has been updated and info that is to be removed from the db.
Now I was wondering which approach results in better performance/shorter executin time:
1. first update excisting values, then insert new ones, and last delete cancelled data
2. delete cancelled data and data that will be updated, then insert new and updated info
I get all this data from a file, in that file all rows are similar and there is one column that defines if the data is new, updated or to be deleted (thus all the updates also include the information for the enty that has not been altered).
Pati, I think there are too many unknown variables to properly answer the question. For instance, how many columns are you updating compared to how many columns are not being updated? If you're updating one small column, and there are ten large columns, say Char(255), that are not being updated, then I believe an update would be faster than a "delete and insert all new rows". However, you would have to have the index to support finding the rows that need updating.
Personally, I always choose the way that makes the most sense logically. I would delete cancelled data, then update existing data and then insert the new rows.
If you're doing a bulk insert, then the two step process will usually be quicker. If you're inserting row by row, then it's probably a toss up. If you really want to know the exact answer for your particular situation, I would run a test and benchmark both methods. You can use one scripts with all of the necessary statements, and then not how long it took to run in query analyzer.