Results 1 to 4 of 4
  1. #1
    Join Date
    Jul 2009
    Posts
    1

    Red face Unanswered: Seeking SQL advise

    I have the following scenario.

    I have a tracking system. The system will record the status of an object regularly, all the status records are stored in one table. And it will keep a history of maximum 1000 status record for each object it tracks. The maximum objects the system will track is 100,000. Which means I will potentially have a table size of 100 million records.

    I have to generate a report on the latest status of all objects being tracked at a particular point in time, and also I have to allow user to sort and filter on different columns in the status record displayed in the report.

    The following is a brief description in the status record (they are not actual code)

    ObjectRecord(
    objectId bigint PrimaryKey
    desc varchar
    )

    StatusRecord (
    id bigint PrimaryKey
    objectId bigint indexed
    datetime bigint indexed
    capacity double
    reliability double
    efficiency double
    )

    I have tried to do the following, it works very well with around 20,000 objects. (The query return in less than 10s) But when I have 100,000 objects it becomes very very slow. (I don't even have patience to wait for it to return.... I kill it after 30 mins)

    select * from statusrecord s1 INNER JOIN ( SELECT objectId , MAX(datetime) AS msdt FROM statusrecord WHERE startDatetime <= 1233897527657 GROUP BY objectId ) AS s2 ON ( s1.objectId = s2.objectId AND s1.datetime = s2.msdt ) where ( capacity < 10.0 ) order by s1.datetime DESC, s1.objectId DESC;

    I did try to write a store procedure like below, for 100,000 objects and 1000 status records / object, it returns in around 30 mins.

    CREATE OR REPLACE FUNCTION getStatus(pitvalue BIGINT) RETURNS SETOF statusrecord AS $BODY$
    DECLARE
    id VARCHAR;
    status statusrecord%ROWTYPE;
    BEGIN
    FOR object IN SELECT * FROM objectRecord
    LOOP
    EXECUTE 'SELECT * FROM statusrecord WHERE objectId = ' || quote_literal(object.objectId) ||
    ' AND datetime <= ' || quote_literal(pitvalue) || ' ORDER BY datetime DESC'
    INTO status;
    IF FOUND THEN
    RETURN NEXT status;
    END IF;
    END LOOP;
    RETURN;
    END
    $BODY$ LANGUAGE plpgsql;

    Just wanna to know if anyone have a different approach to my scenario. Thanks a lot.

  2. #2
    Join Date
    Nov 2003
    Posts
    2,933
    Provided Answers: 12
    Which indexes are defined on those tables (apart from the PK)?
    What is the EXPLAIN PLAN output for your slow query?

  3. #3
    Join Date
    Jun 2004
    Location
    Arizona, USA
    Posts
    1,848
    Which version of PostgreSQL are you using?
    Lou
    使大吃一惊
    "Lisa, in this house, we obey the laws of thermodynamics!" - Homer Simpson
    "I have my standards. They may be low, but I have them!" - Bette Middler
    "It's a book about a Spanish guy named Manual. You should read it." - Dilbert


  4. #4
    Join Date
    Aug 2009
    Location
    Olympia, WA
    Posts
    337
    When you model this type of information, what you really want to do is have both a start and end time for that status. That way the info you need is all in one row. As you've found out doing self joins is cumbersome and slower.

    Additionally, why are you storing datetime as bigint?

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •