Unanswered: Multiple Insert or Bulk insert into a partially loaded database
I am relatively new to DB and Postgresql in general.
That being said, I have a project that involves a lot of datalogger at different location. For now, let's take 3 datalogger (1,2 and 3) at two different site (A and B). All of the data from all datalogger are inserted into my database into 3 different tables (table1 for datalogger 1, table2 for 2, etc.). My logger generate an entry every 5 seconds, so now, I have a growing table with more than 5 millions of entries.
To populate this database, I made a script that download each line from the datalogger and create a single INSERT query per line. Although this is working well, it takes time and it is taking more and more time to insert the data. One thing that I liked is that I do not know which line is already in the database and which one is not. So, even though I had a warning per line that was already in the database, by inserting the whole file this way, I don't have to verify what data is already in the database. The down side is speed and IOs. It is very slow.
From this, I understand there are faster ways to populate a database, but the solutions presented there do not work for an already "partially" populated database. I cannot simply use a solution like a multi-line insert, because it will discard the whole insert if some data was already in the database.
What I thought might be a good idea is to bulk insert my data in a temporary table (data_tmp) and then use that temporary table (data_tmp) to insert data into my main table (data), something like:
INSERT INTO data SELECT * FROM data_tmp AS t WHERE NOT EXISTS ( SELECT * FROM data AS v WHERE ( v."IDData" = t."IDData" ) );
My question is, what would be the fastest and the least disk intensive and "neatest" solution for my "bulk insert" problem?
Here is my setup,
Linode 512 server with Debian,