Hello,

I am relatively new to DB and Postgresql in general.

That being said, I have a project that involves a lot of datalogger at different location. For now, let's take 3 datalogger (1,2 and 3) at two different site (A and B). All of the data from all datalogger are inserted into my database into 3 different tables (table1 for datalogger 1, table2 for 2, etc.). My logger generate an entry every 5 seconds, so now, I have a growing table with more than 5 millions of entries.

To populate this database, I made a script that download each line from the datalogger and create a single INSERT query per line. Although this is working well, it takes time and it is taking more and more time to insert the data. One thing that I liked is that I do not know which line is already in the database and which one is not. So, even though I had a warning per line that was already in the database, by inserting the whole file this way, I don't have to verify what data is already in the database. The down side is speed and IOs. It is very slow.

I have read this blog post: select * from depesz; Blog Archive how to insert data to database as fast as possible

From this, I understand there are faster ways to populate a database, but the solutions presented there do not work for an already "partially" populated database. I cannot simply use a solution like a multi-line insert, because it will discard the whole insert if some data was already in the database.

What I thought might be a good idea is to bulk insert my data in a temporary table (data_tmp) and then use that temporary table (data_tmp) to insert data into my main table (data), something like:

INSERT INTO data SELECT * FROM data_tmp AS t WHERE NOT EXISTS ( SELECT * FROM data AS v WHERE ( v."IDData" = t."IDData" ) );

My question is, what would be the fastest and the least disk intensive and "neatest" solution for my "bulk insert" problem?

Here is my setup,
Linode 512 server with Debian,
Postgresql 8.4.12

Thank you very much for your help and/or hints.