Page 1 of 2 12 LastLast
Results 1 to 15 of 26
  1. #1
    Join Date
    Jun 2004
    Posts
    6

    Unanswered: Obtaining number of rows continously

    Hi,

    My problem is as follows: I have 10 processes inserting rows in a table, currently, the main table has 50 millions of recods and I need to obtain how many records there are in the table, for each hour, to report to a customer. Iīm trying using the SELECT COUNT sentence, but is very slow, it needs about 20 minutes to report a result!! .. I need another form to obtain this date. How can I do this?

    Thanks and best regards.

  2. #2
    Join Date
    Aug 2003
    Location
    Where the Surf Meets the Turf @Del Mar, CA
    Posts
    7,776
    Provided Answers: 1
    It should take MUCH less than 20 minutes to scroll thru only 50 million records.
    I suspect that you have other serious performance issues in the DB.
    You'll need to get creative to formulate a workable solution.
    The available choice will depend upon how much freedom you have to modify the underlying application code.
    You can lead some folks to knowledge, but you can not make them think.
    The average person thinks he's above average!
    For most folks, they don't know, what they don't know.
    Good judgement comes from experience. Experience comes from bad judgement.

  3. #3
    Join Date
    Sep 2003
    Location
    Virginia, USA
    Posts
    246
    Not too bad, your query is counting 41,667 rows per second, and it is doing so under load. Here are a few options off the top of my head with some pros/cons to each option ...
    1. ANALYZE TABLE tablename COMPUTE STATISTICS. Sure, this will take a few hours, but if your table is analyzed, then the results of a "select count(*)" come back much faster. However, if there are many processes continually inserting rows then the statistics might become old immediately.
    2. SELECT MAX(ID)-MIN(ID) FROM TABLENAME. This would work if you had sequential primary key values with no gaps in the numbers. If the system ever crashed during an insert, there could be gaps in the sequence since the insert is rolled back but Oracle cannot rollback select nexval statements, so this appoach only gives you an estimate.
    3. Trigger Before Insert. Each time an insert is done you increase a counter in another table. I don't think serialization is really the problem here, just performance. If all processes are just saying "add/subtract 1 to the counter" then you don't care about order of operations.
    MarkRem
    Author, Oracle Database 10g: From Nuts to Soup
    http://www.remidata.com/book_nuts2soup.htm

  4. #4
    Join Date
    Sep 2002
    Location
    UK
    Posts
    5,171
    Provided Answers: 1
    Quote Originally Posted by markrem
    3. Trigger Before Insert. Each time an insert is done you increase a counter in another table. I don't think serialization is really the problem here, just performance. If all processes are just saying "add/subtract 1 to the counter" then you don't care about order of operations.
    Serialization is an issue here, since only 1 session can lock and update the other table at a time. So if many users try to insert records at once, their inserts will be serialized - i.e. they will wait. This may or may not be acceptable, depending on the volume of inserts.

    Question to the OP: why does the customer need an hourly update on the precise number of records in the table? Is this a real requirement, or just some meaningless nice-to-have statistic? Obviously, counting ~50 million rows takes some effort.

    Another possible method, which may or may not be faster:

    1) Create a table to keep track of record count and max(id):
    Code:
    create table record_counts (max_id integer, record_count int);
    2) Initialize the table:
    Code:
    insert into record_counts (max_id, record_count)
    select max(id), count(*) from main_table;
    3) Each hour, update the new table:
    Code:
    update record_counts set (max_id, record_count) = 
    ( select max(mt.id), rc.record_count+count(*)
      from main_table mt, record_counts rc
      where mt.id > rc.max_id);
    This will only count records added since the last time you checked. Hopefully, the index on mt.id can be used to make this faster than a full scan of the table.

    4) To report to customer:
    Code:
    Select record_count from record_counts;

  5. #5
    Join Date
    Oct 2003
    Posts
    706
    If the records happen to have a serial ID-number, or an autonumber-type field could easily be added, then you could fetch the max or last value of that column; basically retrieving the value from the last row of the table.

    But again... you're paying a lot to get that number.
    ChimneySweep(R): fast, automatic
    table repair at a click of the
    mouse! http://www.sundialservices.com

  6. #6
    Join Date
    May 2004
    Location
    Dominican Republic
    Posts
    721
    This looks like a good oportunity for Materialized Views (almost with the same escenario andrewst suggested). But another thing I noticed: Unless you are going to update this MV or TABLE on a regular basis of 1/10 of a second (which I highly doubt you would want to do that).. wouldn't the statistics be old already at the time the customer/client request it ? I mean, if you are under an average of 41,667 inserts/second (as someone else said).

  7. #7
    Join Date
    Aug 2004
    Location
    France
    Posts
    754
    Wow JMartinez, it is spoken of 41,667 COUNTS / sec, not INSERTS... I would dream of inserting 41,667 rows / second !

    If you ever did so, please tell me how ! (except if you work on a server with 20 processors and 30 Go RAM... then I understand )

    Regards,

    RBARAER

  8. #8
    Join Date
    May 2004
    Location
    Dominican Republic
    Posts
    721
    RBARAER, my bad, didn't read the post too well I guess.

    Anyways, should still consider it if you are under a heavy average of inserts.. (10 processes inserting rows).

  9. #9
    Join Date
    Aug 2003
    Location
    Where the Surf Meets the Turf @Del Mar, CA
    Posts
    7,776
    Provided Answers: 1
    If the application can be modified, then the solution should easily obtained.
    Have every process which is inserting rows keep track of how many it does each hour; either in a table or an external file.
    You establish a baseline count at some point in time.
    To come up with the total another process sums up the dozen or so values from the various previously accumulated counts & values.
    KISS & SMOP (Small Matter Of Programming)!
    You can lead some folks to knowledge, but you can not make them think.
    The average person thinks he's above average!
    For most folks, they don't know, what they don't know.
    Good judgement comes from experience. Experience comes from bad judgement.

  10. #10
    Join Date
    Mar 2002
    Location
    Reading, UK
    Posts
    1,137
    Here are a couple of ideas which may be of use

    1) if you have a sequence based primary key which is indexed, then do a count and remember the max(pk value). Then the next time you do a count do select count(*) from table where pk_value > (last max pk value). This should go in on the index, if it doesnt put a hint in.

    2) If you have that many rows consider partitioning. Then you can run parallel processes to do the count in each partition. If you find the best way to partition is on a datestamp then this makes things really easy as you only have to do a count in the most recent partition.

    Alan

  11. #11
    Join Date
    Jun 2004
    Location
    Liverpool, NY USA
    Posts
    2,509
    However the real question is why do they need a number that will be wrong immediatly after it gets the count?
    Bill
    You do not need a parachute to skydive. You only need a parachute to skydive twice.

  12. #12
    Join Date
    Jul 2003
    Posts
    2,296
    I agree with Alan and Anacedent.
    First, your insert application should be easily aware of how many rows are
    inserting (they get the data from somewhere don't they?). As stated,
    have the App monitor how many rows are inserted.

    Second, from Alan, it is much easier to count the rows since the last time
    you counted. Counting the table over and over again is needless.

    Your table either has a date-field or a sequence in the PK somewhere
    (we are guessing here). All you need to do is create a monitoring table
    with a date (last time you counted), the max sequence number (or
    most recent dated entry), and the count at that time.

    From there create a select count query based on those figures against
    the table getting inserts and then refresh your tracking table with the
    new data.

    I think the application side would already have these numbers though.
    - The_Duck
    you can lead someone to something but they will never learn anything ...

  13. #13
    Join Date
    Sep 2003
    Location
    Virginia, USA
    Posts
    246
    There is something to consider on your solutions. Even a count of rows added after an indexed timestamp could take several minutes. His table has 50,000,000 rows and is growing. He wants to CONTINUALLY get the count, not just once a day. Therefore, the process should be optimized for a sub-second response time. How will he do that?

    As a test you can "create table test (id number, stamp timestamp);" and then use a loop to insert 50 million records where id = <cursor index i> and stamp = systimestamp. OK, now create a non-unique index on the stamp column. Finally, "set timing on" and "select count(*) from test where stamp > somevalue;" How long did the count take? Now run the count where stamp > your last query's timestamp, and how long did that one take?

    I can pretty much guarantee he will not want to run the count continually even in this extremely simple setup. Then factor in a real-life production setup with user loads and how much extra time the query might take. I think the only solution is to store the count in a separate table.

  14. #14
    Join Date
    Mar 2002
    Location
    Reading, UK
    Posts
    1,137
    Mark, doing a count on an index is actually very fast, on our prod boxes (crappy 2 processor intel jobs) it can count about 0.7 million rows/second going in on a primary key index and about 0.4 million rows/second on a nonunique datestamp index so unless his app is throwing in tens of millions of rows a day I would say this is the way to go. Obviously he should try it first on his prod box to see how it performs and does depend on him having either a sequence based primary key or a datestamped index.

    Alan

  15. #15
    Join Date
    Sep 2002
    Location
    UK
    Posts
    5,171
    Provided Answers: 1
    It was once per hour, not "continuously". So sub-5-minute response should be adequate - in fact, the 20 minutes mviveros currently gets is adequate for that, bar the fact that the data is 20 minutes out of date each time ("so what?" I'm inclined to wonder.)

    I don't have the capacity here to create a 50 million row table, but since we are talking about indexed access it shouldn't really make a significant difference whether the table has 50 million rows or 3 million. The crucial question then becomes how many rows are inserted per hour - i.e. since last time we counted. Let's suppose it is 10,000 rows. How long does it take to do an indexed count of 10,000 rows out of the table?

    Here is my table, created by cloning ALL_OBJECTS repeatedly:
    Code:
    SQL> desc t9
     Name                            Null?    Type
     ------------------------------- -------- ----
     ID                              NOT NULL NUMBER
     NAME                            NOT NULL VARCHAR2(30)
     DATEVAL                         NOT NULL DATE
    With timing on, I count all the rows:
    Code:
    SQL> select count(*) from t9;
       3316224
    
     real: 145563
    Takes quite a while, as you can see (~2.5 mins). Now, with a non-unique index on DATEVAL, I count the latest ~10,000 rows:
    Code:
    SQL> select count(*) from t9 where dateval > '15-sep-2004';
         10304
    
     real: 94
    There's your sub-second response!

    Of course, none of this detracts from the utter pointlessness of the exercise!

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •