I have a table called work (id bigserial, userid int4, kind1 enum, kind2 enum, kind3 enim, value bigint, modified timestamp)
Table will have about 2*10^6 rows (at same time - overall it can have higher IDs but old records are eventually deleted (moved to separate archive table) therefore the IDs can grow huge). After insert on table work, every row will be updated (value will be reduced, till value = 0 (work is finished)). For each row there will be from 1 to maybe 10 updates on two cells (value, modified). After work is completed (value = 0) it's record will be moved to archive table.
kind1 is an enum with two values (a and b)
- alter table work set fillfactor 50
- btree index on value, fillfactor 50
- btree index on kind1, fillfactor 50
1. What can i do to perform this selects faster:
SELECT id, value FROM work WHERE value>=$1 AND kind1=$2 AND kind2=$3 AND kind3=$4 FOR UPDATE;
SELECT id, value FROM work WHERE userid=$1 AND kind1=$1 AND kind2=$3 AND kind3=$4 FOR UPDATE;
2. How about inheriting and partitioning? I'm thinking about creating two tables, one for kind1(a) and second for kind1(b), will it help in performance?
3. Is btree best for index on enum?
4. How about creating index on complex keys like (user_id,kind1,kind2,kind3) and (price,kind1,kind2,kind3)?
A general observation - having an index on value is the main reason performance is slowing down after updates. Since the column you are updating is indexed, PG actually deletes the row then reinserts it with the new values which causes bloat. If the column wasn't indexed, PG can use Heap Only Tuple (HOT) updates which dont cause bloat. But, make sure autovacuum is on and the problem shouldn't get too bad.
1. If you have lots of users, this pessimistic locking should be avoided IMHO. But in terms of speeding up the query, I would guess indexing (kind1, kind2, kind3, user1) will help but it depends on your data distribution.
2. It will only help performance if the queries are doing sequential scans on the tables. If they are using indexes, there is no difference. Personally I would avoid them as the PG partition implementation is a pain in the ass.
3. For low selectivity column (ie few distinct values), btree indexes are generally no good. But you can still create them and PG will figure out whether to use them or not. EXPLAIN is your friend here.
4. Sure, nothing wrong with them. (kind1, kind2, kind3, user1) and (kind1, kind2, kind3, value) are best for the queries you've shown. And once 9.2 gets index only scans in place, including the id column will make them even better