Hopefully I can word this correctly. Using OLAP, and have your standard fact and dimension tables. So dimensions of D1, D2, D3, D4, etc... and measures of M1, M2, M3, etc...
Decent sized database, and I need this to handle 500k+ new records per-day. Right now, data is added in raw form, then intermittently is processed and added to the aggregate (fact & dimension) tables as needed. So far so good, and right now I can rip through about 1.5 million records of data an hour.
Problem is, while this data is meant as aggregate data, it is also quite specific, and I have a few extra dimensions to still add in, making it even more specific. By "specific", I mean for say 500k records of raw data, I'm probably going to end up with 350k different dimension possibilities. Then if I add the additional dimensions in, guaranteed every record of raw data will have its own record in the dimension table, which obviously defeats the entire purpose.
Any idea of how to resolve this? Obviously, less detail and more summary, but that's not an option. I need users to be able to drill this far deep into the data. My one thought was to move the date index from the dimension table into the fact table, which would drastically reduce the number of dimension rows. Right now, it's creating a whole new set of dimension rows for every day of the year. Although, as I understand it, moving a dimension like the data into the fact table would simply be horrible design?
Another thought was, the additional dimensions I need won't be accessed very often. They need to be available, but they won't be hit hard by users. So I was thinking of syphoning those extra dimensions off into extra data cubes, but that just seems horribly messy and inefficient to me. Sure the data that gets drilled into the most stays at a decent size, but that means creating new dimension & fact tables for each cube, plus maintaining & syncing all that data, which seems like a horrible idea.
Thanks for reading, and any advice would be greatly appreciated!