I have a tracking system. The system will record the status of an object regularly, all the status records are stored in one table. And it will keep a history of maximum 1000 status record for each object it tracks. The maximum objects the system will track is 100,000. Which means I will potentially have a table size of 100 million records.
I have to generate a report on the latest status of all objects being tracked at a particular point in time, and also I have to allow user to sort and filter on different columns in the status record displayed in the report.
The following is a brief description in the status record (they are not actual code)
I have tried to do the following, it works very well with around 20,000 objects. (The query return in less than 10s) But when I have 100,000 objects it becomes very very slow. (I don't even have patience to wait for it to return.... I kill it after 30 mins)
select * from statusrecord s1 INNER JOIN ( SELECT objectId , MAX(datetime) AS msdt FROM statusrecord WHERE startDatetime <= 1233897527657 GROUP BY objectId ) AS s2 ON ( s1.objectId = s2.objectId AND s1.datetime = s2.msdt ) where ( capacity < 10.0 ) order by s1.datetime DESC, s1.objectId DESC;
I did try to write a store procedure like below, for 100,000 objects and 1000 status records / object, it returns in around 30 mins.
CREATE OR REPLACE FUNCTION getStatus(pitvalue BIGINT) RETURNS SETOF statusrecord AS $BODY$
FOR object IN SELECT * FROM objectRecord
EXECUTE 'SELECT * FROM statusrecord WHERE objectId = ' || quote_literal(object.objectId) ||
' AND datetime <= ' || quote_literal(pitvalue) || ' ORDER BY datetime DESC'
IF FOUND THEN
RETURN NEXT status;
$BODY$ LANGUAGE plpgsql;
Just wanna to know if anyone have a different approach to my scenario. Thanks a lot.
"Lisa, in this house, we obey the laws of thermodynamics!" - Homer Simpson
"I have my standards. They may be low, but I have them!" - Bette Middler
"It's a book about a Spanish guy named Manual. You should read it." - Dilbert
When you model this type of information, what you really want to do is have both a start and end time for that status. That way the info you need is all in one row. As you've found out doing self joins is cumbersome and slower.
Additionally, why are you storing datetime as bigint?