I will have a .NET application that will process and load a bunch of data into a series of tables. One table is of particular concern. I estimate it will grow well over 100 milion rows per year. Each batch will process and insert about 150-200k of rows into it. Table has many fields (around 30) and most of them are foreign keys. Also, there are number of indexes on it. Other tables grow by mere few thousand rows per batch.
I've done some tests. I have created a file with 70K rows in it and tried to insert it into a table. All times are averaged.
Insert...Select from temp table: 40s
Load from file... Insert into: 50s
using .NET demo app and DRDA driver
Stored proc(insert...values): 3m
using .NET demp app and CSDK driver
Stored proc(insert...values): 5m12s (?!?)
IfxBulkCopy is very performant but inability to work under a transaction is just too severe.
This informix database is 11.5 on Windows 2008 server. I have run profiler on my demo app. 98% of time was spent in the IfxCommand.ExecuteNonQuery() command. This is expected.
Can anyone explain why is insertion from .NET driver 4x slower than insertion from temp table? Can you recommend a strategy that could optimize this?
Also, ALTERing your table temporarily to RAW TYPE, while inserting, would also be of great benefit. ALTER it to STANDARD TYPE after completion. This will turn your table to non-logging, with great gains in performance.
Also check the complexity of your indexes: the more complex, the heavier is the bulk insert. If you could temporarily disable the not critical ones would also bring benefit.
Last, .if you handle very big data volume, .NET may not be the best approach to perform big volume batches. Why not check good old compiled Informix 4GL or ESQL/C?
The table in question represents general ledger in our accounting application. That said, I cannot turn the table to non logging. Disabling indexes will couse them to be recreated after enabling them? That would slow things down further.
4GL and ESQL/C are not an option right now. We're trying to keep the application as database agnostic as possible.
Given the numbers I have provided it seems that this strategy would yield somewhat better results:
- Accumulate all data in memory tables (big hit on app. server RAM)
- Use IfxBulkCopy to transfer data to raw type tables on Informix
- Under transaction fire a bunch of insert...select queries that would transfer data into actual tables.
Looks fine! If there is no possibility to load in "exclusive like" mode,
the raw type temp table is a good solution, but consider the EXTERNAL table scenario: an external table is a flat file to which you declare a schema, telling IDS it is a table in your database. So you can handle it almost as a normal table ( SELECT but no UPDATES or DELETEs though), and may avoid you the necessity of creating and loading a temporary table, accessing the data directly in your transaction Just an idea :-)
Regarding the .NET driver, IBM support should have an answer for that ...
I'm not sure what you're suggesting with your EXTERNAL table scenario. To implement that I would need to dump results into files, transfer them to database server via ftp or something similar and then map them as tables. Not sure if this is would be faster.
Regarding the .NET driver, IBM didn't respond to my topic created in their Informix developer forum. This is not new, I'm using Informix and .NET applications for quite some time. I have found in their .NET drivers such glaring bugs that sometimes I have a feeling that I'm the only one using them.
Effeectively, I was consedering your files were present on the server, this is required to create the external tables. I also considered your base material to be integrated were flat files. If not, forget this solution. In any case, I would give a try on copying the files ( this should take minutes unless your network is very slow, and then create the external table. You'll never know if you never try :-)
Regarding the orthodox solution about the driver, the right way is to create a ticket at IBM support. If not ticket is created, no fix will be created. Obvious but true. Have you tried the iiug community forum?
and also the google informix group. I have seen some stuff related to .net, check for fernando nunes reply, or contact him.
For now I've finished my reseach and results are interesting.
I've discovered a new feature of IDS data driver. On connection object you can call methods BeginChain() and EndChain(). With these two methods you can wrap your batch inserts or updates and driver will optimize usage of network and database. Speedup is 2x for inserts using loops by just adding 2 lines of code.
Still the best strategy is:
DataTable -> IfxBulkCopy to temp table -> insert...select into production table (3x speed up, more extensive changes to code)
Interestingly, this approach is slower:
DataTable -> IfxBulkCopy to raw table -> insert...select into production table
Copy operation is the same but insert into production table was 40% slower?!
No, you misunderstood. It's not ifxbulkcopy, he performs equally well for temp table and row table. Entire difference is in the insert into...select query which I find suprising. I've benchmarked several times using SET EXPLAIN ON command.