| |
|
If this is your first visit, be sure to check out the FAQ by clicking the link above.
You may have to register before you can post: click the register link above to proceed.
To start viewing messages, select the forum that you want to visit from the selection below.
|
 |

08-31-09, 16:05
|
|
Registered User
|
|
Join Date: May 2007
Posts: 78
|
|
|
optimising a query, how to ?
|
|
I'd like to know if there's a way to speed up this query:
Code:
selectRACEDATE,races.name,country,
RACETIME,RACEORDER,RACESTATUS,
races.handicap,races. runnersNo,races.places,
races.title,wintime,
position,draw,horse.name,horse.registration_country,
jockey.lastname,trainer.lastname from(selectrace.race_id,
RACEDATE,RACETIME,RACEORDER,RACESTATUS,
course.name,if(sbs_regions_and_countries.Englishname = "Ireland","(IRE)","(GB)") as country,
race.title,wintime,
handicap,runnersNo,places FROM raceleft join course on course.course_id = race.course_id
left join course_country on course_country.course_id = course.course_id
left join sbs_regions_and_countries on sbs_regions_and_countries.id = course_country.country_id
left join race_derived on race_derived.race_id = race.race_id where/* racedate = UTC_DATE() */
racedate = "2006-06-10"
/* ( racedate >= "2006-06-01" and racedate <= "2006-06-31") */
/* ( racedate >= "2007-01-01" and racedate <= "2007-12-31") */
and
( sbs_regions_and_countries.englishname = "England" or sbs_regions_and_countries.Englishname = "Ireland" ) ) as races join racerunner on racerunner.race_id = races.race_id
left join horse on racerunner.horse_id = horse.horse_id
left join trainer on racerunner.trainer_id = trainer.trainer_id
left join jockey on racerunner.jockey_id = jockey.jockey_id order by racedate,racetime,raceorder,position,draw;
We're talking horse racing here.
I want a list of races with runners for a timeperiod and limited to countries "england" and "ireland". This is what i do in the inner select where i join the country data to the course of the races for the timeperiod. A "race" is always run on a "course" (or track) which is in a "country".
This way i have a sub-selection of races BEFORE i join in the horses.
Each "race" has any number of horses, called "runners" and each one has a "jockey" and "trainer" associated with it, that's the outer select.
When i run this query on a full year then i get about 9000 races with 100000 runners and the query runs in 30 to 40 seconds. Fair enough for retireving and sorting 100000 records. But just wondering if i can get it faster.
The inner select which selects the sub-set of races in itself runs in 0.009s so not much to gain there i'd say.
If i remove the "order by" part then the whole query runs in 3 seconds, so obviously that is where there's time to gain. I just don't know how ?
|
|

08-31-09, 16:14
|
|
Registered User
|
|
Join Date: May 2007
Posts: 78
|
|
Actually it's more of a general question as a learning opportunity rather than about this query specifically.
How do i post the result of the EXPLAIN command ?
Can't post HTML on this forum so a copy&paste gets all screwed up ?
|
|

08-31-09, 16:31
|
|
Registered User
|
|
Join Date: May 2007
Posts: 78
|
|
|
|
Ran the explain command and attached the result as an image here.
|
|

09-02-09, 16:43
|
|
Registered User
|
|
Join Date: Dec 2007
Location: Richmond, VA
Posts: 782
|
|
code
select RACEDATE
,course.name
,if(sbs_regions_and_countries.Englishname = "Ireland","(IRE)","(GB)") as country
,RACETIME
,RACEORDER
,RACESTATUS
,handicap
,runnersNo
,places
,title
,wintime
,position
,draw
,horse.name
,horse.registration_country
,jockey.lastname
,trainer.lastname
FROM race
left join course on course.course_id = race.course_id
left join course_country on course_country.course_id = course.course_id
inner join sbs_regions_and_countries on sbs_regions_and_countries.id = course_country.country_id
and
sbs_regions_and_countries.englishname in ( "England","Ireland")
left join race_derived on race_derived.race_id = race.race_id
inner join racerunner on racerunner.race_id = race.race_id
left join horse on racerunner.horse_id = horse.horse_id
left join trainer on racerunner.trainer_id = trainer.trainer_id
left join jockey on racerunner.jockey_id = jockey.jockey_id
where/* racedate = UTC_DATE() */
racedate = "2006-06-10"
/* ( racedate >= "2006-06-01" and racedate <= "2006-06-31") */
/* ( racedate >= "2007-01-01" and racedate <= "2007-12-31") */
order by racedate
,racetime
,raceorder
,position
,draw;
/code
|
|

09-02-09, 16:45
|
|
Registered User
|
|
Join Date: Dec 2007
Location: Richmond, VA
Posts: 782
|
|
i rearranged things a bit, as the nested expression wasn't needed, no summation/grouping/min/max there. Also, changed join to sbs_countries to an inner join as it was a waste to join to all the tables, then throw out conditions in the where clause. Probably still some issue with the sorting, just check sort heap sizes to ensure enough room for amount of data being sorted.
Dave
|
|

09-02-09, 18:58
|
|
Registered User
|
|
Join Date: May 2007
Posts: 78
|
|
thanks for the reply,
tried that but still around 30 seconds
You've actually got close to what i started with.
Here's the problem, the database contains 140000 races with an average of 10 runners per race. The way you've done it is that you join all of those first, before the where clause, giving 1.4 million records, then make a selection from those millions. That takes time.
That's why i got the inner select so from the 140000 races i first select the 10000 i need, using existing indexes so pretty fast, and then join them with the racerunners which is a lot less records to join.
The dis-advantage here is that by using the inner select i save time by joining less records but i lose the ability to use an index for the ordering.
|
|

09-02-09, 19:07
|
|
Registered User
|
|
Join Date: May 2007
Posts: 78
|
|
Code:
inner join sbs_regions_and_countries on sbs_regions_and_countries.id = course_country.country_id
and
sbs_regions_and_countries.englishname in ( "England","Ireland")
this would appear to be a bit faster than having it in the WHERE so taken that on board. (note to self: re-read documentation on inner join  )
|
|

09-02-09, 19:42
|
|
Registered User
|
|
Join Date: May 2007
Posts: 78
|
|
I've gone a slightly different direction. I need the selection of races for a number of queries so what i've done is created a procedure and i've put the inner selection of races into a temporary table in memory, with an index.
Then i use that to join the other tables with. Testing it with a myisam table rather than in memory has shaved 5 seconds off so down to 25 seconds.
Without order by it's now 0.7 seconds and with 25 seconds so still thinking there should be something i can do to speed that up.
|
|

09-02-09, 20:07
|
|
Registered User
|
|
Join Date: May 2007
Posts: 78
|
|
WAHEEEEY down to 1.5 seconds
Step 1, using a temporary table for a selection of races,
Step 2, using a second temporary table for a selection of those races with their runners,
including all fields required for ordering in an index,
Step 3, join all other tables and select with order by
put the whole thing in a procedure and it runs in 1.5 seconds.
Code:
DELIMITER $$
DROP PROCEDURE IF EXISTS `sbs_racingdb`.`testit` $$
CREATE PROCEDURE `testit`()
BEGIN
/* temporary table to hold sub-selection of races */
CREATE TEMPORARY TABLE IF NOT EXISTS `temporary_races` (
`RACE_ID` mediumint(1) unsigned NOT NULL,
PRIMARY KEY (`RACE_ID`)
) ENGINE=MEMORY;
/* temporary table to hold races and runners,
including all fields needed for ordering with an index */
CREATE TEMPORARY TABLE IF NOT EXISTS `temporary_runners` (
`RACE_ID` mediumint(1) unsigned NOT NULL,
`RACEDATE` date default NULL,
`RACETIME` time default NULL,
`RACEORDER` tinyint(2) default NULL,
`HORSE_ID` mediumint(1) unsigned NOT NULL,
`position` mediumint(1) unsigned NOT NULL,
`draw` mediumint(1) unsigned NOT NULL,
`jockey_ID` mediumint(1) unsigned NOT NULL,
`trainer_ID` mediumint(1) unsigned NOT NULL,
PRIMARY KEY (`RACE_ID`,`HORSE_ID`),
key (`RACEDATE`,`RACETIME`,`RACEORDER`,`position`,`draw`)
) ENGINE=MEMORY;
/* select needed races */
insert into temporary_races
select
race.race_id
FROM race
left join course on course.course_id = race.course_id
left join course_country on course_country.course_id = course.course_id
inner join sbs_regions_and_countries
on sbs_regions_and_countries.id = course_country.country_id
and sbs_regions_and_countries.englishname in ( "England","Ireland")
where
/* racedate = UTC_DATE() */
/* ( racedate >= "2006-06-10" and racedate <= "2006-06-10") */
/* ( racedate >= "2006-06-01" and racedate <= "2006-06-31") */
( racedate >= "2007-01-01" and racedate <= "2007-12-31")
;
/* for those races select all runners and fields needed for ordering */
insert into temporary_runners
select
race.race_id,RACEDATE,RACETIME,RACEORDER,horse_id,position,draw,jockey_id,trainer_id
from
temporary_races
left join race on race.race_id = temporary_races.race_id
inner join racerunner on racerunner.race_id = temporary_races.race_id
;
/* bring it all together */
select
temporary_runners.RACEDATE,course.name as coursename,
temporary_runners.RACETIME,temporary_runners.RACEORDER,RACESTATUS,
handicap,runnersNo,places,
race.title,wintime,
position,draw,horse.name as horsename,horse.registration_country as horsecountry,
jockey.lastname as jockeyname,trainer.lastname as trainername
from
temporary_runners
left join race on race.race_id = temporary_runners.race_id
left join course on course.course_id = race.course_id
left join race_derived on race_derived.race_id = temporary_runners.race_id
left join horse on temporary_runners.horse_id = horse.horse_id
left join trainer on temporary_runners.trainer_id = trainer.trainer_id
left join jockey on temporary_runners.jockey_id = jockey.jockey_id
order by racedate,racetime,raceorder,position,draw
;
/* return resultset */
END $$
DELIMITER ;
But now i'm thinking i may be doing this the long way around and there's something simpler possible 
|
|
| Thread Tools |
Search this Thread |
|
|
|
| Display Modes |
Linear Mode
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
|
|