I'm currently benchmarking several RDBMSs with respect to analytical query performance on medium-sized multidimensional data sets. The data set contains 30,000,000 fact rows evenly distributed in a multidimensional space of 9 hierarchical dimensions. Each dimension has 8000 members.

The test query selects about one half of the members from each dimension, and calculates fact sums grouped by 5 high-level members from each dimensional hierarchy. (There are actually some additional complications that makes the query end up listing 20 table aliases in the from-clause, 18 of which are aliases for 2 physical tables.)

On Oracle the query runs in less than 3 seconds. All steps have been taken to ensure that Oracle will apply star schema optimization to the query (e.g. having loads of single-column bitmap indexes). The query plan reveals that a bitmap merge takes place before fact lookup.

There's a lot of RAM available, and series of queries have been run in advance to make sure the required data resides in the cache. This is confirmed by a very high CPU utilization and virtually no I/O during the query execution.

I have established similar conditions for the query in PostgreSQL, and it runs in about 30 seconds. Again the CPU utilization is high with no noticable I/O. The query plan is of course very different from that of Oracle, since PostgreSQL lacks the bitmap index merge operation. It narrows down the result one dimension at a time, using the single-column indexes provided. It is not an option for us to provide multi-column indexes tailored to the specific query, since we want full freedom as to which dimensions each query will use.

Are these the results we should expect when comparing PostgreSQL to Oracle for such queries, or are there special optimization options for PostgreSQL that we may have overlooked? (I wouldn't be suprised if there are, since I spent at least 2 full days trying to trigger the star optimization magic in my Oracle installation.)