Results 1 to 5 of 5
  1. #1
    Join Date
    Nov 2004
    Posts
    1,427
    Provided Answers: 4

    Unanswered: speed loss between set-based, cursor-based and object-based processing

    Working set-based on a database is fast.
    Working with a cursor is at least an order of magnitude slower.
    Working object-based (reading a whole dossier out of the database into an application (.Net, Java, ...,), processing that, writing the results back to the database) is also is at least an order (or two) of magnitude slower due to the additional network slow-down.

    What are the numbers that are generally accepted that quantify the speed gain (or loss) when the previous ways of processing are compared?
    With kind regards . . . . . SQL Server 2000/2005/2012
    Wim

    Grabel's Law: 2 is not equal to 3 -- not even for very large values of 2.
    Pat Phelan's Law: 2 very definitely CAN equal 3 -- in at least two programming languages

  2. #2
    Join Date
    Nov 2002
    Location
    Jersey
    Posts
    10,322
    There is only one right way, except when it's not
    Brett
    8-)

    It's a Great Day for America everybody!

    dbforums Yak CorralRadio 'Rita
    dbForums Member List
    I'm Good Once as I ever was

    The physical order of data in a database has no meaning.

  3. #3
    Join Date
    Nov 2004
    Posts
    1,427
    Provided Answers: 4
    We're having a number of DTS processes that run at night. It takes 3.5 hours to complete. They recalculate the whole academic year: with all subscriptions, presences/absences, grades, diploma's, subsidies, ... For our new system we want to upgrade from DTS to SSIS.

    A few people want to ditch both DTS and SSIS and do the calculations outside of the database. Reading the data of one student, do all the above processes and write the results back to the database.

    I read somewhere that cursors are typical 60-100 times slower than set based operations. And with cursors, the data never leaves the database.
    When you take the data out of the database, over the network, and process them externally, and write them back into the database, that must be an extra order of magnitude slower.

    I used these numbers to estimate the processing time compared to set-based: a) cursor = 10 times slower than set oriented, b) processing outside the database 100 times slower. I get these results:
    Code:
    base: 3.5 hours, set based
    		factor	hours	days
    set		1	3,5	0,15	(status quo)
    cursor		10	35	1,5
    outside dB	100	350	14,6
    These numbers are for recalculating everything. (I could have left out cursors, nobody is actually asking to use them)

    We could add some logic to only recalculate those students for whom we have received new data during the day. Yet, that would mean that we may receive new data for at most 1% of our students, (350 h / 100 = 3.5 h) to get back to the processing time of our old system. Say we could stretch the window during which the nightly batch may run to about 14 hours, that would allow new data for 4% of our students. We are way above that.

    I find it hard to explain to the business that with our new system, they will have to wait two weeks for a total recalculation of the results. Given that sometimes we must recalculate everything due to a change in the law or a found bug, we cannot afford to leave that out of sight.

    Before I want to present these numbers I have to be somehow confident that the retardation factors that I use are realistic. (My gut feeling tells me those factors are even higher than 10 and 100, but I have no reference to back that up.)
    With kind regards . . . . . SQL Server 2000/2005/2012
    Wim

    Grabel's Law: 2 is not equal to 3 -- not even for very large values of 2.
    Pat Phelan's Law: 2 very definitely CAN equal 3 -- in at least two programming languages

  4. #4
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Provided Answers: 1
    Processing with a cursor or a loop is typically 100 times slower than a set-based solution, in my past experience. I shoot for two orders of magnitude improvement.
    All bets are off on processing outside the database. It would definitely be poor (abysmally slow), but just how bad it would be would depended entirely on how bad the coding is.

    As a side note, I'm dealing with an issue today where a vendor's data upload process appears to be updating not one record at a time, but one value in one record at a time.
    For 1.3 million records.
    In a table with 133 fields.
    This, not surprisingly, is playing havoc with our transactional replication....
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  5. #5
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    I'd take one medium complexity and the most complex calculation that you have, take five rows of test data and calculate based on those test rows... This will give you a reasonable sample of your actual compuations and shouldn't be too hard to knock out for testing purposes. This way you can give the decision makers measurements instead of estimates.

    This is a clearly contrived example, but it shows what I mean for set-based versus cursor based processing of a simple SELECT operation... You'd have to amend it to include whatever processing one or two calculations within your process actually does,
    Code:
    DECLARE @d1 DATETIME,  @d2 DATETIME, @d3 DATETIME
    
    SET @d1 = GETDATE()
    
    SELECT
       spid, kpid, blocked
    ,  waittype, waittime, lastwaittype
    ,  waitresource, dbid, uid
    ,  cpu, physical_io, memusage
    ,  login_time, last_batch, ecid
    ,  open_tran, status, sid
    ,  hostname, program_name, hostprocess
    ,  cmd, nt_domain, nt_username
    ,  net_address, net_library, loginame
    ,  context_info, sql_handle, stmt_start
    ,  stmt_end, request_id
       FROM master.dbo.sysprocesses
       
    SELECT @d2 = GETDATE()
    
    DECLARE
      @i            int = 0
    , @spid         smallint, @kpid         smallint
    , @blocked      smallint, @waittype     binary
    , @waittime     bigint,   @lastwaittype nchar
    , @waitresource nchar,    @dbid         smallint
    , @uid          smallint, @cpu          int
    , @physical_io  bigint,   @memusage     int
    , @login_time   datetime, @last_batch   datetime
    , @ecid         smallint, @open_tran    smallint
    , @status       nchar,    @sid          binary
    , @hostname     nchar,    @program_name nchar
    , @hostprocess  nchar,    @cmd          nchar
    , @nt_domain    nchar,    @nt_username  nchar
    , @net_address  nchar,    @net_library  nchar
    , @loginame     nchar,    @context_info binary
    , @sql_handle   binary,   @stmt_start   int
    , @stmt_end     int,      @request_id   int
    
    DECLARE zPerfDemo CURSOR FOR SELECT
       spid, kpid, blocked
    ,  waittype, waittime, lastwaittype
    ,  waitresource, dbid, uid
    ,  cpu, physical_io, memusage
    ,  login_time, last_batch, ecid
    ,  open_tran, status, sid
    ,  hostname, program_name, hostprocess
    ,  cmd, nt_domain, nt_username
    ,  net_address, net_library, loginame
    ,  context_info, sql_handle, stmt_start
    ,  stmt_end, request_id
       FROM master.dbo.sysprocesses
    
    OPEN zPerfDemo
    FETCH zPerfDemo INTO
      @spid, @kpid, @blocked
    , @waittype, @waittime, @lastwaittype
    , @waitresource, @dbid, @uid
    , @cpu, @physical_io, @memusage
    , @login_time, @last_batch, @ecid
    , @open_tran, @status, @sid
    , @hostname, @program_name, @hostprocess
    , @cmd, @nt_domain, @nt_username
    , @net_address, @net_library, @loginame
    , @context_info, @sql_handle, @stmt_start
    , @stmt_end, @request_id
    
    WHILE 0 = @@FETCH_STATUS
       BEGIN
          SELECT
            @spid, @kpid, @blocked
    ,       @waittype, @waittime, @lastwaittype
    ,       @waitresource, @dbid, @uid
    ,       @cpu, @physical_io, @memusage
    ,       @login_time, @last_batch, @ecid
    ,       @open_tran, @status, @sid
    ,       @hostname, @program_name, @hostprocess
    ,       @cmd, @nt_domain, @nt_username
    ,       @net_address, @net_library, @loginame
    ,       @context_info, @sql_handle, @stmt_start
    ,       @stmt_end, @request_id
          
          SET @i = 1 + @i
          
          FETCH zPerfDemo INTO
            @spid, @kpid, @blocked
    ,       @waittype, @waittime, @lastwaittype
    ,       @waitresource, @dbid, @uid
    ,       @cpu, @physical_io, @memusage
    ,       @login_time, @last_batch, @ecid
    ,       @open_tran, @status, @sid
    ,       @hostname, @program_name, @hostprocess
    ,       @cmd, @nt_domain, @nt_username
    ,       @net_address, @net_library, @loginame
    ,       @context_info, @sql_handle, @stmt_start
    ,       @stmt_end, @request_id
       END
       
    CLOSE zPerfDemo
    DEALLOCATE zPerfDemo
    
    SET @d3 = GETDATE()
    
    SELECT DATEDIFF(ms, @d1, @d2) AS set_based
    ,  DATEDIFF(ms, @d2, @d3) AS cursor_based
    ,  @i, @d1, @d2, @d3
    -PatP
    In theory, theory and practice are identical. In practice, theory and practice are unrelated.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •