Results 1 to 5 of 5
  1. #1
    Join Date
    Mar 2008

    Unanswered: Need help with processing large database

    Here is the situation:

    - Oracle 10g
    - Once per day a transaction table has about 500 millions rows, with about 22 dimensions (columns)

    We need to produce a system for business that allows to query that data to count combinations of dimensions, For example, a might ask the system: how many web page views (counts) were made on day X for users that were male, aged 25-30 and who like cars.

    In this example date,age,gender and the car category are just ids in the table fields.

    So far we tried 2 approaches. Firstly we tried to make a CUBE but this is unfeasible for such a large dataset with that many dimensions it will take weeks or years to process.

    Secondly we tried to write a script to read each row and create a dictionary hash of the found combinations, so like a CUBE but only of observed data. At 17 dimensions per row it would take about 50 years.

    My next approach is going to be to try to create a new table using a GROUP BY only all dimensions, and see how quickly queries against this will be.

    Somebody out there must have a good idea of how to do this in reasonable time and with reasonable query performance?

  2. #2
    Join Date
    Aug 2003
    Where the Surf Meets the Turf @Del Mar, CA
    Provided Answers: 1
    If you really want to solve this problem, then you can start by Reading This Fine Manual.
    You can lead some folks to knowledge, but you can not make them think.
    The average person thinks he's above average!
    For most folks, they don't know, what they don't know.
    Good judgement comes from experience. Experience comes from bad judgement.

  3. #3
    Join Date
    Mar 2008
    Thank you for the reference I will be sure to read it. However, since you are obviously more educated than I, and since this manual is huge, could you suggest what topics would be most appropriate to my particular situation?

    I am not a DBA so much of this information is superfluous to me, and my time constraints are, well, constrained.

    Thanks in advance.

  4. #4
    Join Date
    Dec 2003
    500 million records a day, so every 10 days you have 1 billion records? Geez, that's 182,500,000,000 records a year (182 billion)?

    You can look into partitioning tables, or materialized views ...


  5. #5
    Join Date
    Mar 2008
    Well we have only 28 days of data, and we have it partitioned already by day. So that is 28 partitions of 450,000,000 rows (12.6 billion rows total).

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts