Page 1 of 2 12 LastLast
Results 1 to 15 of 23

Thread: indexed views

  1. #1
    Join Date
    Jun 2003
    Location
    san francisco
    Posts
    14

    Unanswered: indexed views

    Hello,
    I'm currently performance tuning a table with 100 million rows in it
    (about 18 GB of data) and would like to know if -

    1. Is the table too large to be performance tuned. Is it better to just
    redesign the schema ?
    2. Can using techniques as indexed views really help me with tuning such a table.
    3. How long would it take to create a clustered, non clustered index on
    a varchar column (for instance) on a table with 100 million rows ?
    (i know this is a function of hardware as well - let's assume i'm using
    afairly maxed out DL 360 - i.e. dual processor with 4 GB of memory)

    Thank you very much
    Alan

  2. #2
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    1. Nope, that's a good size for tuning.
    2. Yes.
    3. It usually takes me about 20 seconds to type a CREATE INDEX statement.

    -PatP

  3. #3
    Join Date
    Jun 2003
    Location
    san francisco
    Posts
    14
    My question wasn't about your typing skills.
    It's about how long it takes to create a clustered / non-clustered
    index on a varchar column of a table with 100 million rows.
    In this case I'm not joining other tables. (the index view question)
    I'm only selecting from this one table with 100 million rows.

  4. #4
    Join Date
    Feb 2004
    Posts
    492
    For clustered index: it really depends on how fragmented the data is, what other indexes are on the same table. In my experience, 150M records takes less than half an hour for a regular index.

  5. #5
    Join Date
    Jun 2003
    Location
    san francisco
    Posts
    14
    Thank you Kaiowas. That's good to know.
    How long does it typically take to perform a query on your indexed,
    150 million row table ?

    In this case, the fragmentation is low. The table serves as a user activity log.
    This is currently a 20 GB table with 100 million rows and I'd like to
    decide between re-modelling the database design (normalize the table into
    a few more tables) or focusing on tuning the table.

  6. #6
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    Sorry, I answered your question as asked instead of what you meant. It has been one of those weeks.

    Building indexes isn't really CPU bound, and is rarely RAM bound (although for your table the RAM could be a constraint). Disk performance is usually the key component for building indexes. The DL 360 can be configured a lot of different ways, and disk performance is wildly variable depending on how the system is configured.

    Normalizing the table is usually a good idea. When dealing with 20 Gb of data, normalization is really important. If you normalize the data, you might be able to turn one table with 20 Gb of data into six tables with 8 Gb (total) of data. That will almost always improve your performance all by itself, just because the machine needs to schlepp a lot less data from point A to point B in order to resolve any query.

    -PatP

  7. #7
    Join Date
    Jan 2003
    Location
    Massachusetts
    Posts
    5,800
    Provided Answers: 11
    Just out of curiosity, how wide is this 20GB table? For me, that would be one of the places to look. Since creating an index on this table could take up to half an hour, this is something that you will not be able to do by trial and error. You are certain there is table scanning going on? And more importantly, that the queries you are tuning can use indexes?

  8. #8
    Join Date
    Jun 2003
    Location
    san francisco
    Posts
    14
    Hi MCrowley,
    What do you mean by wide ? I'm sorry I didn't quite understand.
    By wide do you mean - The size of the largest row in the table ?

    Thank you for your response

  9. #9
    Join Date
    Jun 2003
    Location
    san francisco
    Posts
    14
    Also MCrowley,
    The queries on this log table are pretty straight-forward and simple.
    The kind of vanilla queries you'd typically run on a log table that
    tracks online utilization.

  10. #10
    Join Date
    Jan 2003
    Location
    Massachusetts
    Posts
    5,800
    Provided Answers: 11
    Sorry. I meant how many columns.

    As for the queries, if they are of the form:

    Code:
    select *
    from [really hugenormous table]
    where logmessage like '%error%'
    then no amount of indexing is going to help. Kinda like looking for all people in the phone book who are named '%son'.

  11. #11
    Join Date
    Jun 2003
    Location
    san francisco
    Posts
    14
    MCrowley,
    The queries are not wildcarded. There are 15 columns in the table.
    Queries are vanilla, of the form -

    select ....
    from ...
    where col1 = val1
    and col2 = val2
    ...

    no like clauses etc.

  12. #12
    Join Date
    Jan 2005
    Location
    TempDb
    Posts
    228
    As PatP points out, normalization will give you significant performance benefits. I'd say it would trump disk speed and will improve performance for caching and index seeking and searching. Then, index those tables to benefit your application.

  13. #13
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Provided Answers: 1
    "Normalizing the table is usually a good idea. When dealing with 20 Gb of data, normalization is really important. If you normalize the data, you might be able to turn one table with 20 Gb of data into six tables with 8 Gb (total) of data. That will almost always improve your performance all by itself, just because the machine needs to schlepp a lot less data from point A to point B in order to resolve any query."

    Ok, I'm gonna stick my neck out on this one and disagree with Pat.

    Normalizing data does NOT necessarily result in an increase in performance, because many of your queries will now require joins which would not be required in a normalized schema. The primary purpose of denormalizing data is increase the performance of output queries. Witness the common Star Schema used (all too frequently) in Data Warehouses.

    What you lose by denormalizing data is control over relational integrity. When you have hierarchical relationships between datasets greater than two levels, and then try to represent this in a flat table, enforcing business rules for input becomes cumbersome, and you will find yourself doing a lot of coding to make up for it. That is why Star Schemas are best used for applications that focus on reporting, and which recieve their data from external sources which can be trusted to verify the integrity of the data they submit.

    That said, I never use Star Schemas (the idiot's "One Size Fits All" solution to data warehousing), and the databases I design are almost always fully normalized.

    Pat?
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  14. #14
    Join Date
    Jan 2005
    Location
    TempDb
    Posts
    228
    If ROLAP is an option, then MOLAP is probably the right answer. The extra cost/effort is well worth the additional benefit.

    Need to understand how current the data need to be.

  15. #15
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    As a general rule, I find that normalized databases will outperform non-normalized databases almost every time.

    If you think about what work goes on at a primative level, shlepping about 8 Gb of data which you join at the last minute to produce 20 Gb of results is going to be faster than hauling the entire 20 Gb from start to finish. The overhead needed to support joins when indicies are available is trivial, well less than 1% of total query cost for each joined table. When you compare the cost of the join to the cost of the raw I/O, the normalized design will run roughshod over the non-normalized design.

    The reason that data warehouses store non-normalized data doesn't really have anything to do with machine performance... The intent of a data warehouse is to make data available to the end user. End users don't do joins naturally, so a star schema suits them well. When you look at the actual performance issues, a normalized design usually outperforms the warehouse design from the machine's perspective.

    In a warehouse situation, people time is a lot more expensive than machine time... If the subject matter experts using a data warehouse have to spend an extra minute composing a query, that delay can cost more than the query can save. If the delay impacts a decision (for example in a board meeting), it will definitely cost the business money.

    Disk is usually the weakest link in a database chain between a web/application server and the data it needs. Reducing the total amount of disk I/O is the highest payback optimization that I've found. If you can normalize data, it almost always reduces the amount of disk I/O needed, and that speeds up the whole system.

    -PatP

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •