Results 1 to 2 of 2
  1. #1
    Join Date
    Jan 2009

    Unanswered: Index for single table query with group by/order by


    I am having some difficulty creating a suitable index for a particularly heavy query (it basically caused my system to grind to a halt until mysql was restarted).

    The table in question contains a considerable amount of data: 4.2GB and ~ 12 million records.

    The structure is as follows:
    CREATE `offers` (
    `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
    `dep_date` date NOT NULL,
    `board_basis_code` varchar(2) NOT NULL,
    `rating` int(1) DEFAULT NULL,
    `resort_id` int(11) DEFAULT NULL,
    `region_id` int(11) DEFAULT NULL,
    `country_id` int(11) DEFAULT NULL,
    `price_pp` int(11) DEFAULT NULL,
    `creation_date` datetime NOT NULL,
    `update_date` datetime DEFAULT NULL,
    PRIMARY KEY (`id`),
    KEY `resort_id` (`resort_id`),
    KEY `region_id` (`region_id`),
    KEY `country_id` (`country_id`),
    KEY `board_basis_code` (`board_basis_code`),
    KEY `dep_date` (`dep_date`),
    KEY `offers_search` (`country_id`,`rating`,`dep_date`,`region_id`,`pri ce_pp`),

    The query in question is:

    SELECT *
    FROM offers
    WHERE country_id = ?
    AND rating >= ?
    AND dep_date > ?
    GROUP BY region_id
    ORDER BY price_pp
    LIMIT 3

    When I "explain" this query, the output is as follows:

    id: 1
    select_type: SIMPLE
    table : offers
    type: range
    possible_keys: country_id,dep_date,offers_search
    key: offers_search
    key_len: 1689576
    ref: NULL
    rows: 1689576
    Extra: Using where; Using temporary; Using filesort

    The details of the index cardinality selected (as I expected) are:

    country_id 31 A YES
    rating 124 A YES
    dep_date 15721 A
    region_id 49992 A YES
    price_pp 3982727 A YES

    I'd really appreciate an advice which can be offered here. Why is the query using temporary and filesort? I'd have hoped it wouldn't need to as all group by/order by columns are in the index.

    Many thanks,

  2. #2
    Join Date
    Jun 2007

    Here's a bunch of comments:
    • The optimiser will try to use an index that has country_id at the start as this value has been supplied. It then depends on what value is used for rating and how much the result set is whittled down. If rating doesn't shrink the result set by much then the rest of the index is unlikely to be used.
    • Would dep_date be more restrictive? and if so could it be placed before rating in the index order?
    • If only selecting 3 rows for each country then would it be a good idea to copy just these rows to a separate table once a day and then pull the data from this new table?
    • Why do you have "group by region" in the query but then no aggregate functions in the select statement? This seems like an error to me.
    • You have a load of indexes but are they actually used by anything?

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts