Results 1 to 5 of 5
  1. #1
    Join Date
    Oct 2010
    Posts
    5

    Business Card Analogy for Understanding Indexes

    Is this a good analogy to understand indexes in a database. (I use SQL Server and not familiar with other DB systems. I don't know if this matters.)

    (1) Heap: A stack of business cards is a heap. Anytime you want to search for anything, you need to go thru the cards one by one. This is a table scan.

    (2) Clustered Index: You order your cards in last name first name order and organized them in a rolodex. The rolodex is the clustered index.

    (3) Clustered Index Seek: It's easy to search the rolodex and find the card(s) for Mr Smith. The search would be just as easy for Jim Smith. That is a clustered index seek.

    (4) Clustered Index Scan: But the search won't be easy for all Jim's. In this case, the order of the rolodex (last name, followed by first name) doesn't help as much. So, you need to go through all the sections of your roloedex and look for Jim. This is clustered index scan. The clustered index does help to some degree because all Jim's within each section of all last names would be together.

    (5) Non Clustered Index: Let's says sometimes you need to search your business cards by state followed type of business. For this, you would build a list on a seaprate piece of paper. On this list, you would have an entry for each business card in your rolodex. For each entry, you would have state, business type, last name, first name. This list would be ordered by state then business type. This list is a non clustered index. (Question: Unless I have the analogy wrong, why is this called a NONCLUSTERED index? The list is ordered or clustered together by state and business type.)

    (6) Non Clustered Index Seek: The list makes it very easy to find all businesses in NY. It's just as easy to find all businesses in NY doing dry cleaning. This is a non clustered index seek.

    (7) Non Clustered Index Scan: But, the list does not help as much if you are looking for all dry cleaning business. You would need to look at all state sections on the list and locate all dry cleaning entries. This is a non clustered index scan. The non clustered index helps to some degree because all dry cleaning entires would be together within each state section.

    (8) Key Lookup: In (6) and (7), if all you were looking for was just state, business type, last name, first name, then you have all this info on this list. You don't need to look in the rolodex. However, if you wanted to know the phone number, then you now need to use the keys to the rolodex found on this list (last name first name) and go to the rolodex. This is a key lookup.
    Last edited by DoolinDalton; 11-25-10 at 01:19.

  2. #2
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    7 is a non clustered index *scan*. The last sentence of 4 and 7 are not true either in so far as it doesn't make things easier.
    This demomstrates you understand indexes the basic principles but the analogy is a touch crude and does not account for some of the more complex aspects, such as pages, leafs, b-trees etc.

  3. #3
    Join Date
    Oct 2010
    Posts
    5
    Quote Originally Posted by pootle flump View Post
    7 is a non clustered index *scan*. The last sentence of 4 and 7 are not true either in so far as it doesn't make things easier.
    This demomstrates you understand indexes the basic principles but the analogy is a touch crude and does not account for some of the more complex aspects, such as pages, leafs, b-trees etc.
    Thank you.

    I corrected (7). I was copy pasting some of the stuff and forgot to change it. (7) is indeed non clustered index scan.

    On (4) and (7), doesn't it help somewhat? In the example of looking for 'Jim', you would be looking for Jim for every last name. But within each last name, 'Jim' would exist together.

    I do agree the example does not model a B-Tree search.

    What about my question on (5). Why is a non clustered index called NON CLUSTERED? Isn't the index ordered by the keys in the index? So the keys would be grouped together or clustered together? I know you can have only one clustered index, so by default all other indexes are non clustered. But to me, that makes it sound like the keys aren't ordered.

  4. #4
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    You are correctly describing a feature of the index, but it does not help the optimiser. It will still scan all the pages. In theory it could read fewer pages if the cardinality of the leading column were low enough by instead performing a series of seeks, however it does not do this in practice.
    Code:
    USE test;
    
    IF EXISTS (SELECT NULL FROM sys.tables WHERE object_id = OBJECT_ID('dbo.low_cardinatilty')) 
    BEGIN
        DROP TABLE dbo.low_cardinatilty;
    END;
    
    CREATE TABLE dbo.low_cardinatilty
        (
            leading_column      BIT                                                 NOT NULL 
          , second_column       INT                                                 NOT NULL
          , datas               CHAR(100)           COLLATE Latin1_General_CI_AS    NOT NULL 
          , CONSTRAINT pk_low_cardinatilty PRIMARY KEY CLUSTERED (leading_column, second_column) WITH (FILLFACTOR = 100)
        ) ON [PRIMARY];
        
    INSERT INTO dbo.low_cardinatilty
        (
            leading_column
          , second_column
          , datas
        )
    SELECT  leading_column      = number % 2
          , second_column       = FLOOR(number/2.0)
          , datas               = REPLICATE('YAYA', 25)
    FROM    dbo.numbers
    
    SET STATISTICS TIME, IO, PROFILE ON 
    
    SELECT  *
    FROM    dbo.low_cardinatilty
    WHERE   second_column       = 0
    
    SET STATISTICS TIME, IO, PROFILE OFF
    In the above, the optimiser scans rather than seeks. The scan reads 929 pages. The index depth is 3 (including leaf) so two seeks would only require 3 reads each. We can prove this empirically:
    Code:
    SET STATISTICS TIME, IO, PROFILE ON 
    
    SELECT  *
    FROM    dbo.low_cardinatilty
    WHERE   second_column       = 0
        AND leading_column      = 0
    UNION ALL
    SELECT  *
    FROM    dbo.low_cardinatilty
    WHERE   second_column       = 0
        AND leading_column      = 1   
        
    SET STATISTICS TIME, IO, PROFILE OFF
    The point is, this is not an optimisation the team bothered putting in to the cost based optimiser, presumably because contrived examples like this are going to be very rare.



    A clustered index clusters the table data together. The leaf level of the index is the table. As such, it is a B-Tree index and also the table. Arguably since SQL Server 2005 and the introduction of includes columns, non-clustered indexes have become a little like clustered-indexes-lite but we are getting in to some of the complications I referred to earlier.
    Testimonial:
    pootle flump
    ur codings are working excelent.

  5. #5
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    Whoops. I thought this was in the SQL Server forum.
    Other RDMBSs might take advantage of the what you describe, but I would not bet on it.
    Testimonial:
    pootle flump
    ur codings are working excelent.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •