Results 1 to 12 of 12
  1. #1
    Join Date
    Sep 2007
    Posts
    13

    Unanswered: Number of Rows in Result Set

    Hi Everyone,

    DB2 7.1 on one AIX server, 8.2 on another. Server code is AIX. Client code is AIX or Windows XP and can connect to either DB2 7.1 or 8.2.

    I'm asking this question in the hopes that someone smarter than I may come up with a better way of doing what the code is currently doing.

    I support a legacy application with a client GUI that displays a list of "items" based on ad-hoc criteria that the user enters. The result set can range from 0 through some large number. The query resembles:
    Code:
    
    SELECT multiple-columns
      FROM multiple-tables
     WHERE predicate-statement
     ORDER BY order-clause
    
    The code then immediately executes the following SQL to get the number of rows in the result set:
    Code:
    
    SELECT COUNT(*)
      FROM multiple-tables
     WHERE predicate-statement
    
    I don't like this because it is slow, but I do need to know the number of rows in the result set. Unfortunately some of the DB2 tables can have millions of rows in them. Is there some other way to get the number of rows in the result set without having to execute the COUNT(*)?

    Thank you very much!

  2. #2
    Join Date
    Jun 2003
    Location
    Toronto, Canada
    Posts
    5,516
    Provided Answers: 1
    The only accurate method is to fetch all rows and count them. You could obtain the estimate that the DB2 optimizer uses when analyzing the query by doing an EXPLAIN and the looking in the explain tables.

    However, the problem is best solved outside DB2. Users don't normally need (or expect) to be shown a list containing millions of items. One of the approaches would be to limit the result set to some sensible number (say, FETCH FIRST 1000 ROWS ONLY) and offer your users an opportunity to refine their search criteria if that number is exceeded.
    ---
    "It does not work" is not a valid problem statement.

  3. #3
    Join Date
    Jan 2007
    Location
    Jena, Germany
    Posts
    2,721
    Counting the rows is ONLY accurate if you run in isolation level RR (aka SERIALIZABLE). And that you don't want to do if you do not really have to.

    Everyone who writes code like the one you have shown should be taken out and shot. Just kidding... but it shows that the developer have no idea how database systems work in general.
    Knut Stolze
    IBM DB2 Analytics Accelerator
    IBM Germany Research & Development

  4. #4
    Join Date
    Sep 2007
    Posts
    13
    Quote Originally Posted by stolze
    Counting the rows is ONLY accurate if you run in isolation level RR (aka SERIALIZABLE). And that you don't want to do if you do not really have to.

    Everyone who writes code like the one you have shown should be taken out and shot. Just kidding... but it shows that the developer have no idea how database systems work in general.
    The application is running isolation level "Read Uncommitted". User A can return a list from the query but it is read-only. User B can update a row in the list under the covers. If User A then opens the updated row he will see the new information. This is fine and how the application was designed.

    Yeh man, shoot'em all and let God sort'em out Only kidding all you out there.

  5. #5
    Join Date
    Jan 2007
    Location
    Jena, Germany
    Posts
    2,721
    The part "if user A then opens the updated row" will only work if the row still qualifies the predicates in the query after the update. If not, then the count is wrong already. Same happens with new rows being inserted concurrently. If the isolation level is not RR, the second query will see those new rows as "phantoms". Again, the count from the first query is already wrong.

    If you can live with inaccuracies, then it is much simpler to do a rough estimate yourself or based on the statistics on the table. But don't execute the query twice because that can be expensive and unnecessarily burden the system with additional load.
    Knut Stolze
    IBM DB2 Analytics Accelerator
    IBM Germany Research & Development

  6. #6
    Join Date
    Nov 2002
    Location
    Jersey
    Posts
    10,322
    To get to the actual matter

    (which no one seems to want to address)

    How many rows does the predicate return

    If it's a very LARGE number, then I would ask why would you wanT that result in the first place

    Too bad DB2 doesn't have @@ROWCOUNT

    How many rows are returned in the first place?
    Brett
    8-)

    It's a Great Day for America everybody!

    dbforums Yak CorralRadio 'Rita
    dbForums Member List
    I'm Good Once as I ever was

    The physical order of data in a database has no meaning.

  7. #7
    Join Date
    Jan 2007
    Location
    Jena, Germany
    Posts
    2,721
    It is not a question about having a @@ROWCOUNT or whatever. It is a question that none of the database systems available today can give you an exact row count without fetching all rows and counting them. And even that number is not reliable if you do not use SERIALIZABLE/RR isolation level. Naturally, if you don't have a toy system, you don't want to run your query twice just to get a row count. So you are back to estimates.

    p.s: DB2 has the capabilities to retrieve the number of processed rows, i.e. you can get that count after the query is completed.
    Knut Stolze
    IBM DB2 Analytics Accelerator
    IBM Germany Research & Development

  8. #8
    Join Date
    Sep 2004
    Location
    Belgium
    Posts
    1,126
    Quote Originally Posted by stolze
    p.s: DB2 has the capabilities to retrieve the number of processed rows, i.e. you can get that count after the query is completed.
    Even before the query is completed, such a value can be easily obtained.
    (For the accuracy of the returned value, see the excellent explanation by stolze higher up.)
    On DB2 v8 for z/OS, have a look at the SQL statement
    Code:
    GET DIAGNOSTICS :v = DB2_NUMBER_ROWS
    For example, after a fully materialized OPEN CURSOR, the exact row count is known. (But then, DB2 has already done the work, so you could at most save on the transfer between DB2 and your app if you decide to not fetch because of a too high count.) In other cases, the returned value will be an estimate.
    --_Peter Vanroose,
    __IBM Certified Database Administrator, DB2 9 for z/OS
    __IBM Certified Application Developer
    __ABIS Training and Consulting
    __http://www.abis.be/

  9. #9
    Join Date
    Sep 2007
    Posts
    13

    Number of Rows in Result Set

    Hi Everyone,

    Thank you so much for the exciting discussion so far. I should probably explain that it is an Engineer who is listing a particular range of items. He "owns" some of the items in the list. The server only returns 50 rows at a time. If the engineer scrolls down to the point where more data are needed then the next 50 rows are retrieved from the server, and so on, until the entire result set has been fetched by the client.

    The fact that some other engineer may add a new item that falls within the first engineer's WHERE predicate is of no consequence, he doesn't own that item. Besides, there is a GUI option to refresh the list if the Engineer so chooses.

    The ONLY concern here is, can I get a count of items in the original query other than by running a COUNT(*) using the same WHERE predicate. I have an idea and I'm going to try it in the coming days. If it helps performance, I'll post what I did.

    Thank you one and all..

  10. #10
    Join Date
    Jan 2007
    Location
    Jena, Germany
    Posts
    2,721
    If you need an EXACT count, then there is no other way than to count.
    Knut Stolze
    IBM DB2 Analytics Accelerator
    IBM Germany Research & Development

  11. #11
    Join Date
    Nov 2002
    Location
    Jersey
    Posts
    10,322
    I would hazard to say that

    Code:
    SELECT COUNT(*) AS myCount, null as Col1, null as col2, ect
    FROM myTable
    WHERE predicate
    GROUP BY Cols
    UNION ALL
    null AS myCount, Col1, Col2, ect
    FROM myTable
    Shouldn't be too inefficient, since it should have everything already in buffer
    Brett
    8-)

    It's a Great Day for America everybody!

    dbforums Yak CorralRadio 'Rita
    dbForums Member List
    I'm Good Once as I ever was

    The physical order of data in a database has no meaning.

  12. #12
    Join Date
    Jan 2007
    Location
    Jena, Germany
    Posts
    2,721
    Quote Originally Posted by Brett Kaiser
    I would hazard to say that

    Code:
    SELECT COUNT(*) AS myCount, null as Col1, null as col2, ect
    FROM myTable
    WHERE predicate
    GROUP BY Cols
    UNION ALL
    null AS myCount, Col1, Col2, ect
    FROM myTable
    Shouldn't be too inefficient, since it should have everything already in buffer
    As usual, it depends on the query. If the query reads lots of rows, this mechanism will be slow because you effectively evict all pages from the buffer to read them in again. Additionally, you have a bigger hit on other, concurrent applications because you may force pages accessed by them out of the buffer as well.
    Knut Stolze
    IBM DB2 Analytics Accelerator
    IBM Germany Research & Development

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •