Unanswered: Database configuration for heavy insert/update/calculation - in memory vs SSD?
On a monthly bases our databases (Oracle and MSSQL) are heavily calculating and persisting GB of data in 100+ of schemata (5 really important schemata). 1/2 Day building archives (inserts); 2-3 Days calculating (inserting/updating); 1 Day ETL -> DWH; 1 Day creating/persisting reports. A lot of things are done in parallel in different schemata.
The Question: How should i invest the budget (40k) and effort to improve overall performance effectively?
The general requirements:
We have really a lot of old scripts, tables and indexes no one really knows but we are kind of not allowed and willing to delete/change them due to regulations and possible errors
The process must finish in 8 days in extreme case and should finish as fast as possible (2-3 days), especially because we will have more monthly processing and soon daily processing too.
Because of the huge data amounts and the physical-cpu-core count of all the servers/clusters we can't afford enterprise editions (400k+ p.a.)
There is a budget of 20k this year and 20k next year to increase performance (1 hour programming/configurating = 150 Euro)
Management decided that we have to migrate most of the scripts to mssql in the next 6 years to be able to pay still only for the oracle standard edition.
The main problems i see (after 1 week of reading, testing and thinking):
We have way to much I/O due to calculating/updating everything in the database and not in memory (we can't use memory features due to standard edition). There is no real performance difference in using tmp-tables or normal tables (in our configuration) - therefore programmers use normal tables yet.
60% of the date persisted is dropped in the next month because its only tmp data for calculations (reports, etl etc.). I think most of the data could be calculated only in memory and no one would miss the tables. A second idea is that we would have no problem to loose the data if the server crashes because we could restart the process and create all tables out of the archives again (BTW there was no server crash so far in know).
I think the infrastructure and databases are not optimally configured for such a write-intensive process. ORACLE: Just 1-2 SSD in the 3Par (all data, indexes, logs, tmp-db in there), a lot of memory not really used due to write-intensive process (ca. 300GB in sum). MSSQL: All data and indices and tmp-db on 1 raid5 and logs on an other, 180GB RAM also not really used a lot. New cluster for mssql with ssd for tmp-db and logs; raid5 for data/indizes and 360GB RAM. Not sure if the database can allocate so much rum due to license. If the database could use the rum effectively is an other question. I think the databases are not using the RAM properly because the whole calculation process could be done only in memory and only the results could be stored - much faster. But Table-variables are so incredibly slow and in memory tables not in the SE...
While testing 1 of 1000+ reports/calculations, my tests showed that the update statements are the main problem in the whole process. Normally the programmers copy (select into) some data (10mio rows, 20-30 attributes) in a normal-table and afterwords calculate/update/join 20-30 more attributes out of them to this table. then they do some grouping by and more updating in a second table. the whole results get persisted in the third table. My tests should that 1 really big full table scan update statement with case whens and some variables (combining all the small update statements) increased performance by 50%+ (1,2 hour to 25 min), but the readability is bad. 1 big cursor would be awesome for readability but also slow (because there is no in memory collection!!). there is kind of no easy solution in programming manner. in java it would be so much easier but yea...
My possible solution so far:
Please tell me your thoughts on my ideas and tell me yours. Thx.
Solution 1: Use two (oracle and mssql) correctly configured databases for the temp-calculations and migrate all relevant scripts to this databases. This tricks should help for performance boost:
Use especially tmp-tables and move the temp-db on a ram disc with 10-50GB (create 1 file per cpu-core in the filegroup) -> therefore writing is mainly done in memory
Persist all other tables on a raid5 and heavily used tables, indices and logs on SSDs in raid10 (create a lot of files in filegroup too)
Configure databases to be not 100% "ACID"/Save: d.h. no/minimal logs, snapshot reads, delay writes etc. (what should we do here?)
The other databases and the new cluster i would not touch because online-systems and archives must be on a totally save system.
Solution 2: Use the old mssql server with less cpu-power and the 180GB rum for a enterprise edition to get table partitioning, in memory etc. -> the money for the enterprise edition should be far more then 40k but we could use all the good features
Solution 3: Try a more aggressive configuration for the tmp-calculation-databases and invest money and time to test the restart process after server crash:
Put server logs and tmp-db in memory with a 80-100gb ram disc -> force programmers to use tmp-tables
Data and indices on separate two or more SSDs
Kill the system and try to restart the process (write a documentation and some scripts to do so) this scripts are used each month to rebuild the system and test the scripts
Solution 4: Get a in memory database and migrate all important scripts/calculations to this database. Witch database would be good with 180GB of rum and my requirements? Which one can get the data from mssql fast and bulk insert it back?
Solution 5: Do not calculate in database and get other programmers (also not in budget nor scope...)
Thank your for reading this really long post and your answers. Any help would be appreciated.
Database configuration for heavy insert/update/calculation
Excellent question. You seem to have a lot of understanding and have already done a lot of work. At a high level you have only two options: scale up/out (add memory) or rebuild / redesign (clearly not an option risk wise or budget wise). There are a lot of cloud (Azure, AWS) options that let you turn things off when not in use and save money but it's difficult to know whether these will be more or less expensive and you have to migrate there. But they do easily allow you to turn resources up and down.