Hi,
I’m looking for different ideas for reporting techniques/architectures for SQL-based DBMS’s (e.g. Oracle).
We currently are writing SQR programs to access our Oracle 8i product’s data. As this is a commercial program we can’t change tables (add/remove columns/indexes), create views, procedures, etc. and this has caused problems since 8i does not have the concept of a ‘temporary table’ like Sybase ASE or MS SQL Server.
Our SQR programs typically run something like this:
1) Go through each row in the customer table
2) For each row, run a series of procedures to gather data or exclude customers (based on some criteria). These procedures typically live in a global include file but can also be in the report file itself.
3) Output to a temporary file, then use Unix sort to sort the file
4) Turn that file into a PDF or some other format
The push is to move to a more modular approach. Instead of calling functions in include files it will do something like this:
1) Program 1 will generate a list of IDs (the primary keys to the tables) based upon initial criteria
2) Program 1 will pass that ID list to another series of programs that will pare down the list based upon some further criteria
3) These subprograms will then pass a finalized list to the output program that will dump the data
Any pros/cons to this approach? How else have people done reporting?
My opinions:
I dislike both our current approach and the proposed ideas. It has always been my experience that doing most everything in the DBMS as possible results in improved performance and maintainability over handling things like joins/sorts/etc. in an application language.
Also, because you are doing queries on each row (instead of an exists/join/etc.) the query cost rises geometrically with each additional row (e.g. you have 10 rows in the base table, you end up running 10 * 5 + 1 queries to get 5 details plus the initial query). Obviously a join or an exists would be a single query and would scale better (b-trees are log2 n I think?).
The idea that you go through every row and then exclude them from the final tally also seems foreign to me. I’ve always used NOT EXISTS or the like to handle it in the initial query.
The application we are using is old-school and only supports the rule-based optimizer of Oracle. Because no one was aware of this until I arrived, SQL queries often table-scanned and were generally dog-slow, so the prevailing attitude here is that SQR is faster. I’ve begun teaching the optimizer hint to trigger the cost-based optimizer but the DBA doesn’t have any real statistics analysis routines set up so it is anyone’s guess how old the stats are. I’m trying to get him to at least update on a periodic basis.
I’ve done reporting in the past on MS SQL Server and we handled it quite differently. A request would arrive for a report, and if we didn’t have the report already written we’d write a stored procedure to fetch the data needed. Then the report writers would draft a Crystal report to handle the output. It seems a good mix of the two ideas above.
What I’d like to do is something like that. All the SQL would be in the DBMS, and then we’d use some sort of an application language to manipulate the results – right now if someone wants a report sorted by a different column we’d have to write a whole new report. If we could take an existing report and simply change the sort criteria (or if the application could sort on the client-end) that would be great. The long-term goal is to have reports on demand via a web interface, so maybe some .NET forms could accomplish this?
Any ideas? We do not have a large budget so purchasing a new platform that is relatively expensive is not an option. Maybe we could squeeze something small (< $10K) but I don’t know.