If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > Data Access, Manipulation & Batch Languages > ANSI SQL > list all employees that didn't take courses in 2005 (was "SQL statment")

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 08-10-06, 14:37
thien0809 thien0809 is offline
Registered User
 
Join Date: Apr 2004
Posts: 33
list all employees that didn't take courses in 2005 (was "SQL statment")

I need help on the SQL statment.

This SQL statment is extreamly slow and i don't know how to make it run faster. I have a tblTrainingStudents table where it keeps all the employees trainings and tblSHPEmployee is the employee roster and tblBasicSchool is the table where we track all the cadets who graduate from the basic school (from cadets to troopers) in each year. What i want the result to be is list all employees that didn't take courses in say year 2005. New troopers don't have to take any extra courses for year 2005 since they took these course when they were in basic school. I really need someone can help me speed up my SQL statment ASAP.

Thanks.

************************************************** **

SELECT *
FROM
(SELECT *
FROM tblTrainingCourses C, tblSHPEmployee
WHERE year = '#currentYear#'
AND (reg_no LIKE '1%'
OR reg_no LIKE '2%'
OR reg_no LIKE '3%')
AND troop = '#troop#'
AND NOT EXISTS
(SELECT *
FROM tblTrainingStudents S
WHERE C.trainingType = S.trainingType
AND C.courseID = S.courseID
AND year = '#currentYear#'
AND reg_no = regNo)) as employee

WHERE employee.reg_no NOT IN
(SELECT regNo
FROM tblBasicschool
WHERE employee.reg_no = regNo
AND left(enddate,4) = '#currentYear#'
AND (rank='TRP' OR left(regNo, 1) = '3'))
************************************************** *********

ORDER BY troop, lname, fname, minit, trainingType, courseID
Reply With Quote
  #2 (permalink)  
Old 08-10-06, 15:43
Peter.Vanroose Peter.Vanroose is offline
Registered User
 
Join Date: Sep 2004
Location: Belgium
Posts: 1,079
I don(t see any need for the "SELECT FROM SELECT", so what about:
Code:
SELECT *
FROM   tblTrainingCourses as C, tblSHPEmployee
WHERE  year = '#currentYear#'
  AND  (reg_no LIKE '1%' OR reg_no LIKE '2%' OR reg_no LIKE '3%')
  AND  troop = '#troop#'
  AND  NOT EXISTS
       (SELECT 1
        FROM   tblTrainingStudents
        WHERE  C.trainingType = trainingType
          AND  C.courseID = courseID
          AND  year = '#currentYear#'
          AND  reg_no = regNo)
  AND  reg_no NOT IN
        (SELECT regNo
         FROM   tblBasicschool
         WHERE  left(enddate,4) = '#currentYear#'
           AND  (rank='TRP' OR regNo LIKE '3%')
        )
ORDER BY ...
Clearly the "NOT EXISTS" and "NOT IN" will potentially slow down the query (since no index can be used for resolving these conditions).
Similarly the use of left(enddate,4) may slow down things; you should (if possible) replace it with someting like "enddate LIKE '2006%'".
__________________
--_Peter Vanroose,
__IBM Certified Database Administrator, DB2 9 for z/OS
__IBM Certified Application Developer
__ABIS Training and Consulting
__http://www.abis.be/

Last edited by Peter.Vanroose; 08-10-06 at 15:48.
Reply With Quote
  #3 (permalink)  
Old 08-17-06, 08:00
thien0809 thien0809 is offline
Registered User
 
Join Date: Apr 2004
Posts: 33
Peter, thanks for your post. It is still so slow. It takes about 3 mins to run the query. Is there any other way to speed this up a lot more? The tblTrainingStudents is kind of big. I really appreciate your help.
Reply With Quote
  #4 (permalink)  
Old 08-17-06, 10:24
Tyveleyn Tyveleyn is offline
Registered User
 
Join Date: Aug 2006
Location: The Netherlands
Posts: 248
It looks like your main query produces a cartesian product for I don't see a join-clause between 'tblTrainingCourses' and 'tblSHPEmployee'... or did I miss something???

Grts
Reply With Quote
  #5 (permalink)  
Old 08-17-06, 16:29
Peter.Vanroose Peter.Vanroose is offline
Registered User
 
Join Date: Sep 2004
Location: Belgium
Posts: 1,079
What about the following query?
I've removed tblTrainingCourses, and also removed the redundant condition reg_no = regNo in the second subquery.
You will maybe have to re-insert your conditions on year and troop -- I've no idea in which tables they belong.
Also, for the time being, I replaced the "in current year" condition by ">= '2006-01-01'"; adapt this if necessary.
The conditions on reg_no (with "BETWEEN" and ">= '3'") are more performant than a LIKE; but they assume reg_no to be all digits, and moreover assume an ASCII ordering.
Again, adapt if necessary.
Code:
SELECT *
FROM   tblSHPEmployee AS e
WHERE  reg_no BETWEEN '1' AND '39999999'
  AND  NOT EXISTS
       ( SELECT regNo
         FROM   tblTrainingStudents AS s INNER JOIN tblTrainingCourses AS c
                      ON s.CourseID = c.CourseID
         WHERE regNo = e.reg_No
             AND  year = '2006'
       )
  AND  reg_no NOT IN
       ( SELECT regNo
         FROM   tblBasicschool
         WHERE  endDate >= '2006-01-01'
           AND  (rank = 'TRP' OR regNo >= '3')
       )
You may replace the "NOT EXISTS" by a "NOT IN" and vice versa; that could give performance differences (depending on the size of the tables), so try out all four possibilities. Don't forget to remove the condition "regNo = e.reg_No" in the "NOT IN" and to add it in the "NOT EXISTS".
__________________
--_Peter Vanroose,
__IBM Certified Database Administrator, DB2 9 for z/OS
__IBM Certified Application Developer
__ABIS Training and Consulting
__http://www.abis.be/

Last edited by Peter.Vanroose; 08-18-06 at 02:05.
Reply With Quote
  #6 (permalink)  
Old 08-18-06, 07:57
disruptivehair disruptivehair is offline
Registered User
 
Join Date: Dec 2005
Location: Texas
Posts: 100
'LIKE' keywords also slow down queries since they force a table scan. Also you're selecting * when you potentially could not be.
Reply With Quote
  #7 (permalink)  
Old 08-20-06, 10:52
Peter.Vanroose Peter.Vanroose is offline
Registered User
 
Join Date: Sep 2004
Location: Belgium
Posts: 1,079
Quote:
Originally Posted by disruptivehair
'LIKE' keywords also slow down queries since they force a table scan.
Not necessarily: most RDBMS' will use a matching index scan for "LIKE '3%'". But you're right in the case of a "%" at the beginning of the string.
Quote:
Originally Posted by disruptivehair
Also you're selecting * when you potentially could not be.
This is less of a performance problem, it's at most a data transmission bottleneck (over a slow communication line).
Except in the case where you would only SELECT a column which happens to be an indexed column (or column combination), since in that case the query *could* be performed with an index-only scan (but that also depends on the WHERE condition of course).
__________________
--_Peter Vanroose,
__IBM Certified Database Administrator, DB2 9 for z/OS
__IBM Certified Application Developer
__ABIS Training and Consulting
__http://www.abis.be/
Reply With Quote
  #8 (permalink)  
Old 08-21-06, 06:08
disruptivehair disruptivehair is offline
Registered User
 
Join Date: Dec 2005
Location: Texas
Posts: 100
Quote:
Originally Posted by Peter.Vanroose
Not necessarily: most RDBMS' will use a matching index scan for "LIKE '3%'". But you're right in the case of a "%" at the beginning of the string.
This is less of a performance problem, it's at most a data transmission bottleneck (over a slow communication line).
Except in the case where you would only SELECT a column which happens to be an indexed column (or column combination), since in that case the query *could* be performed with an index-only scan (but that also depends on the WHERE condition of course).

I defer to your superior knowledge Peter!

Our DBAs here refuse to allow us to use LIKE in code because most of the time it does a table scan. 'SELECT *' is also banned since not only is it wasteful, it has a tendency to stop working if the table schema changes.
Reply With Quote
  #9 (permalink)  
Old 08-21-06, 10:04
Peter.Vanroose Peter.Vanroose is offline
Registered User
 
Join Date: Sep 2004
Location: Belgium
Posts: 1,079
Quote:
Originally Posted by disruptivehair
'SELECT *' is also banned since not only is it wasteful, it has a tendency to stop working if the table schema changes.
I fully agree with that: it's *always* safer to list all required columns explicitly, even it turns out to be all table columns.
So indeed "SELECT *" should be banned!
__________________
--_Peter Vanroose,
__IBM Certified Database Administrator, DB2 9 for z/OS
__IBM Certified Application Developer
__ABIS Training and Consulting
__http://www.abis.be/
Reply With Quote
  #10 (permalink)  
Old 08-22-06, 10:02
disruptivehair disruptivehair is offline
Registered User
 
Join Date: Dec 2005
Location: Texas
Posts: 100
Quote:
Originally Posted by Peter.Vanroose
I fully agree with that: it's *always* safer to list all required columns explicitly, even it turns out to be all table columns.
So indeed "SELECT *" should be banned!

ITA Peter...we've still got some 'select *' stuff hanging around but it is slowly being re-written to explicitly list all the columns.
Reply With Quote
  #11 (permalink)  
Old 08-24-06, 10:46
Peter.Vanroose Peter.Vanroose is offline
Registered User
 
Join Date: Sep 2004
Location: Belgium
Posts: 1,079
Or if you want to list all courses together with all employees who did not take that course:
Code:
SELECT c.CourseName, e.EmployeeName
FROM
          tblSHPEmployee AS e
        INNER JOIN
          tblTrainingCourses AS c
        ON NOT EXISTS
            ( SELECT 1
              FROM   tblTrainingStudents AS s
              WHERE  s.CourseID = c.CourseID
                AND  e.reg_No = s.regNo
            )
WHERE
          reg_no BETWEEN '1' AND '39999999'
ORDER BY  1, 2
Mind you that this may again run a lot slower!
__________________
--_Peter Vanroose,
__IBM Certified Database Administrator, DB2 9 for z/OS
__IBM Certified Application Developer
__ABIS Training and Consulting
__http://www.abis.be/
Reply With Quote
  #12 (permalink)  
Old 08-24-06, 11:20
pootle flump pootle flump is offline
King of Understatement
 
Join Date: Feb 2004
Location: One Flump in One Place
Posts: 14,905
[Non-ANSI Comment]Assuming this is SQL Server then SELECT * within an exists statement is the one time SELECT * is recommended - the optimiser then selects optimal index.[/Non-ANSI Comment]
__________________
Testimonial:
Quote:
pootle flump
ur codings are working excelent.
Reply With Quote
  #13 (permalink)  
Old 08-24-06, 11:41
Peter.Vanroose Peter.Vanroose is offline
Registered User
 
Join Date: Sep 2004
Location: Belgium
Posts: 1,079
Quote:
Originally Posted by pootle flump
SELECT * within an exists statement is recommended - the optimiser then selects optimal index.
Are you sure that SQLServer would not do that with "SELECT 1" ?
I know DB2 will typically do an index scan without data lookup when using "SELECT 1".
__________________
--_Peter Vanroose,
__IBM Certified Database Administrator, DB2 9 for z/OS
__IBM Certified Application Developer
__ABIS Training and Consulting
__http://www.abis.be/
Reply With Quote
  #14 (permalink)  
Old 08-25-06, 09:54
pootle flump pootle flump is offline
King of Understatement
 
Join Date: Feb 2004
Location: One Flump in One Place
Posts: 14,905
You know - I have lost my original MS link but coincidently today I've read something elsewhere (by the bloomin head of SQL Server Optimisation no less) that contradicts what I posted. So disregard
__________________
Testimonial:
Quote:
pootle flump
ur codings are working excelent.
Reply With Quote
  #15 (permalink)  
Old 08-25-06, 11:42
Peter.Vanroose Peter.Vanroose is offline
Registered User
 
Join Date: Sep 2004
Location: Belgium
Posts: 1,079
Anyway, in an EXISTS, "SELECT 1" is better than "SELECT *" since the former does not require data access (when an index is available on the column(s) specified in the WHERE condition) while the latter may need data access (unless the optimizer is clever enough to see that the two are equivalent :-)
__________________
--_Peter Vanroose,
__IBM Certified Database Administrator, DB2 9 for z/OS
__IBM Certified Application Developer
__ABIS Training and Consulting
__http://www.abis.be/
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On