Page 1 of 2 12 LastLast
Results 1 to 15 of 19
  1. #1
    Join Date
    Oct 2004
    Location
    Paris, FRANCE
    Posts
    132

    Unanswered: INDEX size threshold

    Hi guys,

    I have tables in my database that, I noticed recently,
    don't use some evident indexes.
    So I tried by copying smaller tables which uses the indexes well.
    I have the impression that when the table surpasses a certain size,
    it doesn't use certain indexes anymore.
    Is there a way to set a threshold anywhere ?

  2. #2
    Join Date
    Oct 2004
    Location
    Paris, FRANCE
    Posts
    132

    Post

    Do you know what I mean ?
    A simple
    Code:
    SELECT * FROM table WHERE field = "x"
    Does not use the index if table is larger than a certain size.
    How to set the threshold ?

  3. #3
    Join Date
    Jun 2007
    Location
    London
    Posts
    2,527
    Often, when tables are quite small, the optimiser decides it's more costly to go to the index pages and from there to the data pages so it just jumps straight into the data pages without using the indexes. Other times it might decide that because the values you are indexing aren't very varied ie if most the fields = "x" then it's not worth using that index. If you start setting thresholds yourself then you will almost certainly do a worse job than the existing optimiser.

    Performance tuning is quite an in-depth subject and can't be covered in a simple post (or thread) so the above is just a small taster. Are you actually having performance issues in which case it might be worth giving us your SQL and the table definition and we can work on that.

  4. #4
    Join Date
    Oct 2004
    Location
    Paris, FRANCE
    Posts
    132

    Post

    Actually I have performance issues.
    Here is attached one of my tables (38 fields) that doesn't use certain indexes.

    Example:
    Code:
    EXPLAIN SELECT * FROM convention WHERE invoicing = 1
    gives
    Code:
    +----+-------------+------------+-----+---------------+-----+---------+-----+------+-------------+
    | id | select_type | table      |type | possible_keys | key | key_len | ref | rows | Extra       |
    +----+-------------+------------+-----+---------------+-----+---------+-----+------+-------------+
    | 1  | SIMPLE      | convention | ALL | Invoicing     |     |         |     | 798  | Using where |
    +----+-------------+------------+-----+---------------+-----+---------+-----+------+-------------+


    Code:
    EXPLAIN SELECT * FROM convention WHERE invoicing = 0
    gives
    Code:
    +----+-------------+------------+-----+---------------+-----------+---------+-------+------+-------+
    | id | select_type | table      |type | possible_keys | key       | key_len | ref   | rows | Extra |
    +----+-------------+------------+-----+---------------+-----------+---------+-------+------+-------+
    | 1  | SIMPLE      | convention | ref | Invoicing     | Invoicing | 1       | const | 76   |       |
    +----+-------------+------------+-----+---------------+-----------+---------+-------+------+-------+
    Attached Files Attached Files

  5. #5
    Join Date
    Jun 2007
    Location
    London
    Posts
    2,527
    No it won't use the index because the field invoicing only takes two values (0 and 1). This means that if the optimiser loads these index pages then at best it will cut down it's search by 50% - it probably figures that it's not worth the expense (in time) of loading these index pages and iterating between the index pages and the data pages each time.

    I can't see why you're having performance issues pulling data from a table that only contains tables 800 rows. This should be almost instant even if it was doing a table scan. Is there more SQL involved?

  6. #6
    Join Date
    Oct 2004
    Location
    Paris, FRANCE
    Posts
    132
    I use this table to filter a query with multiple
    tables JOINed searching in an average of 50k lines each.

    If you look at my EXPLAINs you will see that the index
    is used with 'invoicing = 0', not for 'invoicing = 1'.

    This is affecting the execution time by about 50x.

  7. #7
    Join Date
    Jun 2007
    Location
    London
    Posts
    2,527
    How many records have invoicing=0 and how many =1?
    If only a small percentage are =0 then the index will be used.
    The cut off point is usually around 10%

    Can we see all the SQL?

  8. #8
    Join Date
    Oct 2004
    Location
    Paris, FRANCE
    Posts
    132
    As in the EXPLAINs:
    798 rows with invoice = 1 (index not used)
    76 rows with invoice = 0 (index used)

  9. #9
    Join Date
    Jun 2007
    Location
    London
    Posts
    2,527
    total number of rows = 798 + 76 = 874
    Num of rows where invoice=0 =76 =~ 9% which is under 10% so index used
    Num of rows where invoice=1 =798 =~ 91% which is over 10% so index not used

    The 10% cut off is my estimate at what MySQL uses - systems vary.

    Mike

  10. #10
    Join Date
    Oct 2004
    Location
    Paris, FRANCE
    Posts
    132
    Isn't it when we have to get a lot of rows that MySQL should use the indexes ?
    You said before
    Quote Originally Posted by mike_bike_kite
    [...]when tables are quite small, the optimiser decides it's more costly to go to the index pages and from there to the data pages so it just jumps straight into the data pages without using the indexes[...]
    I have the impression that it does the opposite, doesn't it ?

  11. #11
    Join Date
    Jun 2007
    Location
    London
    Posts
    2,527
    I don't think I can explain things any more clearly - sorry. Perhaps someone else can try? Alternatively why don't you just give us the SQL that's running slowly, tell us how long it's currently taking and we'll have a go at rewriting it for you.

  12. #12
    Join Date
    Oct 2004
    Location
    Paris, FRANCE
    Posts
    132
    Rudy, what do you think ?

    Here is one of my query
    Code:
    SELECT
      tup.session_id AS "session_id"
    , sest.session_student_id AS "sest_id"
    , usm.user_id AS "user_id"
    , tut.user_id AS "tutor_id"
    , UCASE(usm.user_family_name) AS "nom"
    , usm.user_emp_no AS "emp_no"
    , usm.matricule AS "matricule"
    , tup.date AS "date"
    , cem.center_name AS "centre"
    , IF(tup.date <= cem.freezed_date, "validé", "") AS "validation"
    , cem.center_id AS "centre_id"
    , prm.product_name AS "produit"
    , lam.language AS "langue"
    , prc.name AS "cat"
    , /* pri.cost */ 0 AS "pu"
    , (CASE sest.attendance WHEN "P" THEN "Present" WHEN "A" THEN "Absent" ELSE "" END) AS "statut"
    , (TIME_TO_SEC(tup.end_time) - TIME_TO_SEC(tup.start_time)) / 3600 AS "duree"
    , CONCAT(tup.start_time, " - ", tup.end_time) AS "heure"
    , sest.signature_done AS "signed"
    , usm.user_login AS "login"
    , spm.spm_name AS "sp_name"
    , spp.spp_explanation AS "gp_name"
    , spp.ded AS "ded"
    , spp.cost_center AS "cc"
    , spp.level AS "level"
    , cli.alias AS "client"
    , spm.spm_start_date AS "sp_start"
    , spm.spm_end_date AS "sp_end"
    , IF(spm.spm_status = 0, "Open", "Closed") AS "sp_status"
    , com.company_name AS "company_name"
    FROM company_master AS com
      JOIN client AS cli ON (cli.company_id = com.company_id)
      JOIN convention_master AS cov ON (cov.client_id = cli.client_id)
      JOIN study_path_product AS spp ON (spp.conv_id = cov.id)
      JOIN product_master AS prm ON (spp.product_id = prm.product_id)
      JOIN product_category AS prc ON (prm.category_id = prc.id)
      JOIN language_master AS lam ON (prm.language_id = lam.language_id)
      JOIN study_path_main AS spm ON (spm.study_path_main_id = spp.study_path_main_id)
      JOIN user_master AS usm ON (usm.user_id = spm.user_id)
      JOIN session_student AS sest ON (sest.user_id = spm.user_id)
      JOIN tutor_planner AS tup ON (
        tup.date BETWEEN spm.spm_start_date AND spm.spm_end_date
        AND tup.product_id = spp.product_id
        AND tup.session_id = sest.session_id
      )
      JOIN user_master AS tut ON (tup.user_id = tut.user_id)
      JOIN center_master AS cem ON (tup.center_id = cem.center_id)
    WHERE TRUE
      AND com.company_id = 175
      AND tup.date BETWEEN "2009-09-01" AND "2009-09-30"
      AND prm.category_id <> 6
    GROUP BY
      usm.user_id
    , sest.session_id
    ORDER BY
      `nom`
    , `prenom`
    , tup.date
    ;
    The EXPLAIN

    Code:
    +----+-------------+-------+--------+---------------------------------------------------+---------+---------+---------------------------------------------+------+---------------------------------+
    | id | select_type | table | type   | possible_keys                                     | key     | key_len | ref                                         | rows | Extra                           |
    +----+-------------+-------+--------+---------------------------------------------------+---------+---------+---------------------------------------------+------+---------------------------------+
    |  1 | SIMPLE      | com   | const  | PRIMARY,CompanyID                                 | PRIMARY | 4       | const                                       |    1 | Using temporary; Using filesort |
    |  1 | SIMPLE      | cem   | ALL    | PRIMARY                                           | NULL    | NULL    | NULL                                        |  295 |                                 |
    |  1 | SIMPLE      | tup   | ref    | PRIMARY,Tutor,Center,Date,Product,TutorDateCenter | Center  | 5       | v8.cem.center_id                            |   12 | Using where                     |
    |  1 | SIMPLE      | prm   | eq_ref | PRIMARY,Cat,Lang                                  | PRIMARY | 4       | v8.tup.product_id                           |    1 | Using where                     |
    |  1 | SIMPLE      | lam   | eq_ref | PRIMARY,language_id                               | PRIMARY | 4       | v8.prm.language_id                          |    1 |                                 |
    |  1 | SIMPLE      | prc   | eq_ref | PRIMARY                                           | PRIMARY | 4       | v8.prm.category_id                          |    1 |                                 |
    |  1 | SIMPLE      | tut   | eq_ref | PRIMARY                                           | PRIMARY | 98      | v8.tup.user_id                              |    1 |                                 |
    |  1 | SIMPLE      | sest  | ref    | Protec1,Session,Learner                           | Protec1 | 98      | v8.tup.session_id                           |    1 |                                 |
    |  1 | SIMPLE      | usm   | eq_ref | PRIMARY                                           | PRIMARY | 98      | v8.sest.user_id                             |    1 |                                 |
    |  1 | SIMPLE      | spm   | ref    | PRIMARY,user_id,DateRange                         | user_id | 98      | v8.usm.user_id                              |    2 | Using where                     |
    |  1 | SIMPLE      | spp   | eq_ref | SpProd                                            | SpProd  | 198     | v8.spm.study_path_main_id,v8.prm.product_id |    1 | Using where                     |
    |  1 | SIMPLE      | cov   | eq_ref | PRIMARY,Client                                    | PRIMARY | 4       | v8.spp.conv_id                              |    1 |                                 |
    |  1 | SIMPLE      | cli   | eq_ref | PRIMARY                                           | PRIMARY | 4       | v8.cov.client_id                            |    1 | Using where                     |
    +----+-------------+-------+--------+---------------------------------------------------+---------+---------+---------------------------------------------+------+---------------------------------+
    - 20sec execution time

    Here MySQL doesn't seem to take the good way to find the data,
    I think the cem table should be at the end

    Now change
    Code:
    <  AND com.company_id = 175
    >  AND tup.center_id = 245
    You will get

    Code:
    +----+-------------+-------+--------+---------------------------------------------------+---------+---------+---------------------------------------------+------+---------------------------------+
    | id | select_type | table | type   | possible_keys                                     | key     | key_len | ref                                         | rows | Extra                           |
    +----+-------------+-------+--------+---------------------------------------------------+---------+---------+---------------------------------------------+------+---------------------------------+
    |  1 | SIMPLE      | cem   | const  | PRIMARY                                           | PRIMARY | 4       | const                                       |    1 | Using temporary; Using filesort |
    |  1 | SIMPLE      | tup   | ref    | PRIMARY,Tutor,Center,Date,Product,TutorDateCenter | Center  | 5       | const                                       | 4682 | Using where                     |
    |  1 | SIMPLE      | prm   | eq_ref | PRIMARY,Cat,Lang                                  | PRIMARY | 4       | v8.tup.product_id                           |    1 | Using where                     |
    |  1 | SIMPLE      | prc   | eq_ref | PRIMARY                                           | PRIMARY | 4       | v8.prm.category_id                          |    1 |                                 |
    |  1 | SIMPLE      | lam   | eq_ref | PRIMARY,language_id                               | PRIMARY | 4       | v8.prm.language_id                          |    1 |                                 |
    |  1 | SIMPLE      | tut   | eq_ref | PRIMARY                                           | PRIMARY | 98      | v8.tup.user_id                              |    1 |                                 |
    |  1 | SIMPLE      | sest  | ref    | Protec1,Session,Learner                           | Protec1 | 98      | v8.tup.session_id                           |    1 |                                 |
    |  1 | SIMPLE      | usm   | eq_ref | PRIMARY                                           | PRIMARY | 98      | v8.sest.user_id                             |    1 |                                 |
    |  1 | SIMPLE      | spm   | ref    | PRIMARY,user_id,DateRange                         | user_id | 98      | v8.sest.user_id                             |    2 | Using where                     |
    |  1 | SIMPLE      | spp   | eq_ref | SpProd                                            | SpProd  | 198     | v8.spm.study_path_main_id,v8.tup.product_id |    1 |                                 |
    |  1 | SIMPLE      | cov   | eq_ref | PRIMARY,Client                                    | PRIMARY | 4       | v8.spp.conv_id                              |    1 |                                 |
    |  1 | SIMPLE      | cli   | eq_ref | PRIMARY                                           | PRIMARY | 4       | v8.cov.client_id                            |    1 |                                 |
    |  1 | SIMPLE      | com   | eq_ref | PRIMARY,CompanyID                                 | PRIMARY | 4       | v8.cli.company_id                           |    1 |                                 |
    +----+-------------+-------+--------+---------------------------------------------------+---------+---------+---------------------------------------------+------+---------------------------------+
    - With 0.3sec execution time

    In both cases, the same volume is fetched (~100 lines returned)

  13. #13
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    Quote Originally Posted by gtk
    Rudy, what do you think ?
    i think mike is doing a splendid job for you

    your query is way too complicated for me

    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  14. #14
    Join Date
    Oct 2004
    Location
    Paris, FRANCE
    Posts
    132
    Rudy, this is the morning that is complicated, isn't it ?

  15. #15
    Join Date
    Jun 2007
    Location
    London
    Posts
    2,527
    Quote Originally Posted by r937
    i think mike is doing a splendid job for you
    I've already given up I'm afraid. I will quickly say that gtk's previous query just involved the field invoicing - now it appears that that this field doesn't appear at all within your SQL and there's now a join with 12 other tables but at least that explains why the index isn't used on that field. I suspect that the optimiser just takes one look at that query and simply throws up it's hands - that's certainly what I'm doing right now
    Last edited by mike_bike_kite; 10-22-09 at 06:33.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •