I am writing an application which receives data at about 1000 rows per second on multiple threads... so (1000 * x) per second, currently I write this data to the sql server hard disk as a text file and use the bulk insert command from a stored procedure to bulk insert the data into a staging table..
I then perform some insert selects to rationalise the data into the underlying table (this is because the data is distinct by subject, Code, CreatedBy).. I then remove the data from the staging table by the providing code.. The app then continues to receive the next 1000 rows on that thread.
The stored procedure looks something like this:-
CREATE PROCEDURE usp_BulkInsert
BULK INSERT INTO StageTable WITH (--blah blah
INSERT INTO PrimaryTable SELECT * FROM StageTable Where Part = 1
INSERT INTO SecTable SELECT pt.GUID, st.* FROM PrimaryTable pt INNER JOIN StageTable st ON pt.Subject = st.Subject AND pt.Code = st.Code AND pt.CreatedBy = st.CreatedBy
DELETE FROM StageTable WHERE Code = @Code
I have tried denormalising the database (just using the stage table but because the data is distinct on text it takes forever to retrieve data) But the insertation is amazingly fast...
The index's on the tables are designed with the minimum overhead based on the execution plan in query a.
The problem is that as the underlying tables grow in size the insert select statements take longer and longer to run and cpu usage get higher and higher.