Results 1 to 9 of 9
  1. #1
    Join Date
    May 2007
    Posts
    80

    Unanswered: optimising a query, how to ?

    I'd like to know if there's a way to speed up this query:

    Code:
    select
    
    RACEDATE,races.name,country, RACETIME,RACEORDER,RACESTATUS, races.handicap,races. runnersNo,races.places, races.title,wintime, position,draw,horse.name,horse.registration_country, jockey.lastname,trainer.lastname
    from
    (select
    race.race_id, RACEDATE,RACETIME,RACEORDER,RACESTATUS, course.name,if(sbs_regions_and_countries.Englishname = "Ireland","(IRE)","(GB)") as country, race.title,wintime, handicap,runnersNo,places
    FROM race
    left join course on course.course_id = race.course_id left join course_country on course_country.course_id = course.course_id left join sbs_regions_and_countries on sbs_regions_and_countries.id = course_country.country_id left join race_derived on race_derived.race_id = race.race_id
    where
    /* racedate = UTC_DATE() */ racedate = "2006-06-10" /* ( racedate >= "2006-06-01" and racedate <= "2006-06-31") */ /* ( racedate >= "2007-01-01" and racedate <= "2007-12-31") */ and ( sbs_regions_and_countries.englishname = "England" or sbs_regions_and_countries.Englishname = "Ireland" )
    ) as races
    join racerunner on racerunner.race_id = races.race_id left join horse on racerunner.horse_id = horse.horse_id left join trainer on racerunner.trainer_id = trainer.trainer_id left join jockey on racerunner.jockey_id = jockey.jockey_id
    order by racedate,racetime,raceorder,position,draw;

    We're talking horse racing here.
    I want a list of races with runners for a timeperiod and limited to countries "england" and "ireland". This is what i do in the inner select where i join the country data to the course of the races for the timeperiod. A "race" is always run on a "course" (or track) which is in a "country".

    This way i have a sub-selection of races BEFORE i join in the horses.

    Each "race" has any number of horses, called "runners" and each one has a "jockey" and "trainer" associated with it, that's the outer select.


    When i run this query on a full year then i get about 9000 races with 100000 runners and the query runs in 30 to 40 seconds. Fair enough for retireving and sorting 100000 records. But just wondering if i can get it faster.

    The inner select which selects the sub-set of races in itself runs in 0.009s so not much to gain there i'd say.

    If i remove the "order by" part then the whole query runs in 3 seconds, so obviously that is where there's time to gain. I just don't know how ?

  2. #2
    Join Date
    May 2007
    Posts
    80
    Actually it's more of a general question as a learning opportunity rather than about this query specifically.

    How do i post the result of the EXPLAIN command ?
    Can't post HTML on this forum so a copy&paste gets all screwed up ?

  3. #3
    Join Date
    May 2007
    Posts
    80
    Ran the explain command and attached the result as an image here.
    Attached Thumbnails Attached Thumbnails explain.JPG  

  4. #4
    Join Date
    Dec 2007
    Location
    Richmond, VA
    Posts
    1,328
    Provided Answers: 5
    code
    select RACEDATE
    ,course.name
    ,if(sbs_regions_and_countries.Englishname = "Ireland","(IRE)","(GB)") as country
    ,RACETIME
    ,RACEORDER
    ,RACESTATUS
    ,handicap
    ,runnersNo
    ,places
    ,title
    ,wintime
    ,position
    ,draw
    ,horse.name
    ,horse.registration_country
    ,jockey.lastname
    ,trainer.lastname
    FROM race
    left join course on course.course_id = race.course_id
    left join course_country on course_country.course_id = course.course_id
    inner join sbs_regions_and_countries on sbs_regions_and_countries.id = course_country.country_id
    and
    sbs_regions_and_countries.englishname in ( "England","Ireland")
    left join race_derived on race_derived.race_id = race.race_id
    inner join racerunner on racerunner.race_id = race.race_id
    left join horse on racerunner.horse_id = horse.horse_id
    left join trainer on racerunner.trainer_id = trainer.trainer_id
    left join jockey on racerunner.jockey_id = jockey.jockey_id
    where/* racedate = UTC_DATE() */
    racedate = "2006-06-10"
    /* ( racedate >= "2006-06-01" and racedate <= "2006-06-31") */
    /* ( racedate >= "2007-01-01" and racedate <= "2007-12-31") */

    order by racedate
    ,racetime
    ,raceorder
    ,position
    ,draw;
    /code

  5. #5
    Join Date
    Dec 2007
    Location
    Richmond, VA
    Posts
    1,328
    Provided Answers: 5
    i rearranged things a bit, as the nested expression wasn't needed, no summation/grouping/min/max there. Also, changed join to sbs_countries to an inner join as it was a waste to join to all the tables, then throw out conditions in the where clause. Probably still some issue with the sorting, just check sort heap sizes to ensure enough room for amount of data being sorted.

    Dave

  6. #6
    Join Date
    May 2007
    Posts
    80
    thanks for the reply,
    tried that but still around 30 seconds

    You've actually got close to what i started with.
    Here's the problem, the database contains 140000 races with an average of 10 runners per race. The way you've done it is that you join all of those first, before the where clause, giving 1.4 million records, then make a selection from those millions. That takes time.

    That's why i got the inner select so from the 140000 races i first select the 10000 i need, using existing indexes so pretty fast, and then join them with the racerunners which is a lot less records to join.

    The dis-advantage here is that by using the inner select i save time by joining less records but i lose the ability to use an index for the ordering.

  7. #7
    Join Date
    May 2007
    Posts
    80
    Code:
    inner join sbs_regions_and_countries on sbs_regions_and_countries.id = course_country.country_id
    and
    sbs_regions_and_countries.englishname in ( "England","Ireland")
    this would appear to be a bit faster than having it in the WHERE so taken that on board. (note to self: re-read documentation on inner join )

  8. #8
    Join Date
    May 2007
    Posts
    80
    I've gone a slightly different direction. I need the selection of races for a number of queries so what i've done is created a procedure and i've put the inner selection of races into a temporary table in memory, with an index.

    Then i use that to join the other tables with. Testing it with a myisam table rather than in memory has shaved 5 seconds off so down to 25 seconds.

    Without order by it's now 0.7 seconds and with 25 seconds so still thinking there should be something i can do to speed that up.

  9. #9
    Join Date
    May 2007
    Posts
    80
    WAHEEEEY down to 1.5 seconds

    Step 1, using a temporary table for a selection of races,
    Step 2, using a second temporary table for a selection of those races with their runners,
    including all fields required for ordering in an index,
    Step 3, join all other tables and select with order by

    put the whole thing in a procedure and it runs in 1.5 seconds.

    Code:
    DELIMITER $$
    
    DROP PROCEDURE IF EXISTS `sbs_racingdb`.`testit` $$
    CREATE PROCEDURE `testit`()
    BEGIN
    
    /* temporary table to hold sub-selection of races */
    CREATE TEMPORARY TABLE IF NOT EXISTS `temporary_races` (
      `RACE_ID` mediumint(1) unsigned NOT NULL,
      PRIMARY KEY (`RACE_ID`)
    ) ENGINE=MEMORY;
    
    /* temporary table to hold races and runners,
       including all fields needed for ordering with an index */
    CREATE TEMPORARY TABLE IF NOT EXISTS `temporary_runners` (
      `RACE_ID` mediumint(1) unsigned NOT NULL,
      `RACEDATE` date default NULL,
      `RACETIME` time default NULL,
      `RACEORDER` tinyint(2) default NULL,
      `HORSE_ID` mediumint(1) unsigned NOT NULL,
      `position` mediumint(1) unsigned NOT NULL,
      `draw` mediumint(1) unsigned NOT NULL,
      `jockey_ID` mediumint(1) unsigned NOT NULL,
      `trainer_ID` mediumint(1) unsigned NOT NULL,
      PRIMARY KEY (`RACE_ID`,`HORSE_ID`),
      key (`RACEDATE`,`RACETIME`,`RACEORDER`,`position`,`draw`)
    ) ENGINE=MEMORY;
    
    
    /* select needed races */
    insert into temporary_races
    select
        race.race_id
      FROM race
        left join course on course.course_id = race.course_id
        left join course_country on course_country.course_id = course.course_id
        inner join sbs_regions_and_countries
          on  sbs_regions_and_countries.id = course_country.country_id
          and sbs_regions_and_countries.englishname in ( "England","Ireland")
      where
        /* racedate = UTC_DATE() */
        /* ( racedate >= "2006-06-10" and racedate <= "2006-06-10") */
        /* ( racedate >= "2006-06-01" and racedate <= "2006-06-31") */
        ( racedate >= "2007-01-01" and racedate <= "2007-12-31")
    ;
    
    /* for those races select all runners and fields needed for ordering */
    insert into temporary_runners
      select
        race.race_id,RACEDATE,RACETIME,RACEORDER,horse_id,position,draw,jockey_id,trainer_id
      from
        temporary_races
        left join race on race.race_id = temporary_races.race_id
        inner join racerunner on racerunner.race_id = temporary_races.race_id
    ;
    
    /* bring it all together */
    select
        temporary_runners.RACEDATE,course.name as coursename,
        temporary_runners.RACETIME,temporary_runners.RACEORDER,RACESTATUS,
        handicap,runnersNo,places,
        race.title,wintime,
        position,draw,horse.name as horsename,horse.registration_country as horsecountry,
        jockey.lastname as jockeyname,trainer.lastname as trainername
      from
        temporary_runners
        left join race on race.race_id = temporary_runners.race_id
        left join course on course.course_id = race.course_id
        left join race_derived on race_derived.race_id = temporary_runners.race_id
        left join horse   on temporary_runners.horse_id = horse.horse_id
        left join trainer on temporary_runners.trainer_id = trainer.trainer_id
        left join jockey  on temporary_runners.jockey_id = jockey.jockey_id
      order by racedate,racetime,raceorder,position,draw
    ;
    /* return resultset */
    
    END $$
    
    DELIMITER ;
    But now i'm thinking i may be doing this the long way around and there's something simpler possible

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •