Page 1 of 3 123 LastLast
Results 1 to 15 of 38
  1. #1
    Join Date
    Dec 2005
    Posts
    74

    Question Unanswered: Performance gains when INSERTing multiple rows

    I apologize since this seems to be a fundamental question, but I did try to search and there seems to be something wrong - after clicking on the "search" button, the page just will not update... I even tried "advanced search" but no luck there, either.

    Anyway, my question is about writing multiple data rows to tables in a SQL Server database. Currently I'm using embedded SQL from a VC++ 2005 .Net program to write data to a SQL 2005 Express server (either on the same PC or on another, via TCP). As data becomes available, I issue INSERT statements, one by one. One row is about a dozen columns, typically 8 bytes each, for each of the few tables in the database. In most cases, I have several rows of data available at the same time. Those rows come at the rate of around 200-1000 per second for the different tables. With that setup, my SQL server is not able to keep up - the data ends up getting buffered in the program, waiting for the server, the server process uses just about all the CPU cycles it can get, etc.

    I'm reading the "SQL in a nutshell" book from O'Reilly but it's mostly language oriented, it doesn't say much about performance improvements. It would seem natural that one could improve the performance. For example, when I do a SELECT query, the data comes very quickly, which makes me think that it is the overhead of making those individual calls with small amount of data. I would think that I should be able to request that multiple rows are written at the same time with a single SQL statement, and that this would improve the performance. However, I have no idea how to do that.

    I did try one thing - enlisting the sequence of INSERTs into a single transaction, thinking it will all get buffered and only get written to the server after the Commit command is issued. I did that but it doesn't seem to help. What should I try to alleviate this problem? One thing to consider: although I use SQL Server for testing, I am trying to keep compatibility with other databases, e.g., ODBC to any available data source, including MS Access over Jet. I would love it if the solution to this were compatible (i.e., not involve any Transact SQL or other vendor-specific tricks).
    I thank you in advance for your assistance!

    Kamen

  2. #2
    Join Date
    Nov 2002
    Location
    Jersey
    Posts
    10,322
    Quote Originally Posted by KamenG
    With that setup, my SQL server is not able to keep up
    That's an incorrect assumption....it's your code

    You should have a stored procedure and call that sproc and set parameters instead and make the call to that.

    If you want, you could "bundle" the inserts perhaps and have the sproc spin through it and do the inserts.

    But I would try a simple insert sproc first
    Brett
    8-)

    It's a Great Day for America everybody!

    dbforums Yak CorralRadio 'Rita
    dbForums Member List
    I'm Good Once as I ever was

    The physical order of data in a database has no meaning.

  3. #3
    Join Date
    Dec 2005
    Posts
    74
    You mean - still issue one SQL command for each row of data, but instead of using INSERT, I should call a stored procedure with the data from that record as parameters? Why would this be much faster? I'm not sure I understand what you mean. I was hoping to be able to stay away from procedures since I'm using ADO.Net and its handling of parameters is a little scary.
    Kamen

  4. #4
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    Brett knows way more than me about this but I admit I don't know how that is likely to improve performance (perhaps merely my ignorance ). It is however MS recommended good prcatice (and ADO.NET in conjunction with SQL Server sprocs & parameters is easy peasy - you shouldn't have any problems).

    Also - the transaction thing is really about integrity and keeping your changes atomic rather than about performance.

    SQL Server can insert rows very fast - you should be able to insert rows much faster than 200 per second - so there might be another issue. Out of curiosity - does the SQL do anything other than simply insert data into a single table or is it more complex? If the former - is it a heap table or does it have a clustered index? If the latter () is the index monotonically increasing (i.e. the next value is always greater than the preceding value, such as an identity column or "date run" column)?

    .. and finally - ODBC connections are not as fast as OLE. The connection you use wouldn't really make your app specific to one db engine or another (for example you could write a class to handle making the connections and then make one single change to deal with BE changes from SQL Server to Orcale to Access).

    HTH
    Testimonial:
    pootle flump
    ur codings are working excelent.

  5. #5
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    Are you using newrow? You can call it repeatedly to add all of the rows from a single group to the object before you send them to SQL, and it is a lot more efficient than doing things one at a time.

    -PatP

  6. #6
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    Quote Originally Posted by Pat Phelan
    Are you using newrow? You can call it repeatedly to add all of the rows from a single group to the object before you send them to SQL, and it is a lot more efficient than doing things one at a time.
    Is that not a behind the scenes violation of 1NF Pat?
    Testimonial:
    pootle flump
    ur codings are working excelent.

  7. #7
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    Code:
    insert 
      into mytable
         ( col1, col2, col3, col4 )
    values
         ( 23, 57, 'A', 'Tom' )
       , ( 14, 42, 'B', 'Dick' )
       , (  9, 37, 'C', 'Harry' )
    row constructors like this are part of standard sql

    one INSERT statement, multiple sets of values

    has been supported by MySQL for ages, mebbe one o' these days SQL Server will get around to supporting it too

    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  8. #8
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    No, changing the schema (either physically or logically) would be a violation of 1NF, but using a different transport methodology is not a violation at all.

    The thread you referenced is changing the logical schema (actually abandoning the physical schema) by allowing a typeless list of arbitrary length to be passed to a stored procedure as a parameter. That is a no-no in the world of referential integrity, because it basically blows the schema completely out of the water... It allows the calling code to completely disregard any schema.

    Using a record set on the client, populating it with some arbitrary number of rows (which the recordset will force to conform to all of the constraints it can check on the client side, and all constraints without exeception on the server side), then using an alternative transport mechanism (BCP instead of parsed Transact-SQL) isn't a problem from the perspective of Referential Integrity. True, you can hit some "gotchas" if you have schema bindings that the client can't confirm when the row is added to the recordset, but those will still be stopped cold before the rows are committed to disk.

    I'm a rock-solid believer in referential integrity. I can't imagine conditions in which I'd suggest something less than 3NF, and I'm usually pickier than that. While this might permit short term violations (while the data is in memory, but not yet sent to the server), I don't know of any way to prevent that... Until the data hits the database, the user can speculate about anything that suits them, whether it meets the constraints or not. The data isn't "real" to me until it is accepted by the server.

    -PatP

  9. #9
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    Quote Originally Posted by r937
    Code:
    insert 
      into mytable
         ( col1, col2, col3, col4 )
    values
         ( 23, 57, 'A', 'Tom' )
       , ( 14, 42, 'B', 'Dick' )
       , (  9, 37, 'C', 'Harry' )
    row constructors like this are part of standard sql

    one INSERT statement, multiple sets of values

    has been supported by MySQL for ages, mebbe one o' these days SQL Server will get around to supporting it too

    Even though it has nothing to do with the point I was making, I hardly think row constructors are a part of the SQL standard...

    The ratified standard is currently SQL-92, and I can't find any reference to row constructors in it. The proposed standard (sometimes called SQL-99, but I think that's become something of a mis-nomer by this point) does include a definition, but until X3H2 or a successor meets and ratifies it, SQL-99 is no more a standard than any vendor specific extensions are (since the vendor could just as validly claim that the committee will ratify their dialect as the new standard over any other other proposed standard).

    I agree that MySQL supports the row constructor, and that several other SQL dialects support it too. I think that Microsoft SQL ought to support that syntax. I haven't seen anything that I recognize as a standard that includes any formal defintion of a row constructor yet.

    What I was talking about is using relatively standard .NET constructs to work around several of the "achilles tendon" points in the process of moving data from client to server. You lose a lot of CPU cycles converting data into SQL syntax to send to the server, and often lose even more cycles on the server side converting the syntax back into binary for storage. You also usually have more bytes to send down the wire, which takes more time too. By sidestepping all of those conversions and overhead, you can speed up the process significantly. That is what using a DataTable object permits your code to do.

    -PatP

  10. #10
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    Quote Originally Posted by r937
    Code:
    insert 
      into mytable
         ( col1, col2, col3, col4 )
    values
         ( 23, 57, 'A', 'Tom' )
       , ( 14, 42, 'B', 'Dick' )
       , (  9, 37, 'C', 'Harry' )
    row constructors like this are part of standard sql

    one INSERT statement, multiple sets of values

    has been supported by MySQL for ages, mebbe one o' these days SQL Server will get around to supporting it too

    You can use this though ya?
    Code:
    insert 
      into mytable
         ( col1, col2, col3, col4 )
    SELECT
          23, 57, 'A', 'Tom'
       ,UNION SELECT 14, 42, 'B', 'Dick' 
       ,UNION SELECT 9, 37, 'C', 'Harry'
    Testimonial:
    pootle flump
    ur codings are working excelent.

  11. #11
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    yes, except for the dangling commas

    and i would use UNION ALL instead of UNION
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  12. #12
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    Quote Originally Posted by r937
    yes, except for the dangling commas

    and i would use UNION ALL instead of UNION
    Oops - yup and yup. Can't have inappropriate dangling.
    Testimonial:
    pootle flump
    ur codings are working excelent.

  13. #13
    Join Date
    Dec 2005
    Posts
    74
    Thank you for the input. Let me give you a little more details about what I'm doing.

    I have a variable amount of tables to which I write data. On each session, a new set of tables may be defined. With the current test, I have three tables with similar schema: a DATETIME field and a FLOAT field, which together form the primary key of the table. The former is also a foreign key, which is different for each session but remains the same within a session. (I know - not very optimal but helps find data afterwards.) The FLOAT field is time, so - yes, monotonically increasing, albeit not an integer, and thus - not consecutive. Then, after that, each table will have a variable number of REAL columns (in this case - 1 to 8). One of the tables gets 1000 INSERTs per second, the other two - 200.

    The way I use ADO.Net is different than the usual - I do not use DataSets at all - I form SQL statements and use ExecuteNonQuery or the like. The reason is that I found some immaturities in ADO.Net and decided to keep it simple. The reason why I say that the SQL server cannot keep up with the amount of INSERTs I send is that the Task Manager shows nearly 100% CPU utilization of slqsrvr.exe as I do that. That's on a SQL server Express on a couple of older machines (P4 1.8GHz). I tried our company server (dual Xeon @ 2GHz, SQL 2003 SBS) and the utilization was around 30% but I think it is configured to not use more CPU time; it still was lagging behind, though.

    The reason I thought using transactions would help with performance is because of a side effect - I read that all SQL commands are buffered in memory and then the whole thing is written at once. I thought that might help performance.

    Anyway, I am willing to try various solutions, I'd just rather avoid trying things that aren't likely to help, which is why I appreciate the wide discussion here.

    Kamen

  14. #14
    Join Date
    Dec 2005
    Posts
    74
    Actually, as of now, I'm mostly intrigued by the solution:

    INSERT INTO NumData1_5_0 ( StartTime, Timestamp, Ch0, Ch1)
    SELECT 8/14/2006 10:03:53 AM, 1.38956571296073, -0.1896683, -0.3793676
    UNION ALL SELECT 8/14/2006 10:03:53 AM, 1.39457221518377, -0.2204517, -0.4409377
    UNION ALL SELECT 8/14/2006 10:03:53 AM, 1.39957340946964, -0.2509933, -0.5020171
    .........
    .........
    As long as it is very likely that it will improve performance. It would actually be quite easy to try. One of the attractive features of this approach is that the number of rows can vary each time.

    Kamen

  15. #15
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    don't forget to quote your datetime values, and what kind of a timestamp value is 1.399???
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •