Results 1 to 15 of 15
  1. #1
    Join Date
    Apr 2004
    Posts
    33

    Unanswered: list all employees that didn't take courses in 2005 (was "SQL statment")

    I need help on the SQL statment.

    This SQL statment is extreamly slow and i don't know how to make it run faster. I have a tblTrainingStudents table where it keeps all the employees trainings and tblSHPEmployee is the employee roster and tblBasicSchool is the table where we track all the cadets who graduate from the basic school (from cadets to troopers) in each year. What i want the result to be is list all employees that didn't take courses in say year 2005. New troopers don't have to take any extra courses for year 2005 since they took these course when they were in basic school. I really need someone can help me speed up my SQL statment ASAP.

    Thanks.

    ************************************************** **

    SELECT *
    FROM
    (SELECT *
    FROM tblTrainingCourses C, tblSHPEmployee
    WHERE year = '#currentYear#'
    AND (reg_no LIKE '1%'
    OR reg_no LIKE '2%'
    OR reg_no LIKE '3%')
    AND troop = '#troop#'
    AND NOT EXISTS
    (SELECT *
    FROM tblTrainingStudents S
    WHERE C.trainingType = S.trainingType
    AND C.courseID = S.courseID
    AND year = '#currentYear#'
    AND reg_no = regNo)) as employee

    WHERE employee.reg_no NOT IN
    (SELECT regNo
    FROM tblBasicschool
    WHERE employee.reg_no = regNo
    AND left(enddate,4) = '#currentYear#'
    AND (rank='TRP' OR left(regNo, 1) = '3'))
    ************************************************** *********

    ORDER BY troop, lname, fname, minit, trainingType, courseID

  2. #2
    Join Date
    Sep 2004
    Location
    Belgium
    Posts
    1,126
    I don(t see any need for the "SELECT FROM SELECT", so what about:
    Code:
    SELECT *
    FROM   tblTrainingCourses as C, tblSHPEmployee
    WHERE  year = '#currentYear#'
      AND  (reg_no LIKE '1%' OR reg_no LIKE '2%' OR reg_no LIKE '3%')
      AND  troop = '#troop#'
      AND  NOT EXISTS
           (SELECT 1
            FROM   tblTrainingStudents
            WHERE  C.trainingType = trainingType
              AND  C.courseID = courseID
              AND  year = '#currentYear#'
              AND  reg_no = regNo)
      AND  reg_no NOT IN
            (SELECT regNo
             FROM   tblBasicschool
             WHERE  left(enddate,4) = '#currentYear#'
               AND  (rank='TRP' OR regNo LIKE '3%')
            )
    ORDER BY ...
    Clearly the "NOT EXISTS" and "NOT IN" will potentially slow down the query (since no index can be used for resolving these conditions).
    Similarly the use of left(enddate,4) may slow down things; you should (if possible) replace it with someting like "enddate LIKE '2006%'".
    Last edited by Peter.Vanroose; 08-10-06 at 16:48.
    --_Peter Vanroose,
    __IBM Certified Database Administrator, DB2 9 for z/OS
    __IBM Certified Application Developer
    __ABIS Training and Consulting
    __http://www.abis.be/

  3. #3
    Join Date
    Apr 2004
    Posts
    33
    Peter, thanks for your post. It is still so slow. It takes about 3 mins to run the query. Is there any other way to speed this up a lot more? The tblTrainingStudents is kind of big. I really appreciate your help.

  4. #4
    Join Date
    Aug 2006
    Location
    The Netherlands
    Posts
    248
    It looks like your main query produces a cartesian product for I don't see a join-clause between 'tblTrainingCourses' and 'tblSHPEmployee'... or did I miss something???

    Grts

  5. #5
    Join Date
    Sep 2004
    Location
    Belgium
    Posts
    1,126
    What about the following query?
    I've removed tblTrainingCourses, and also removed the redundant condition reg_no = regNo in the second subquery.
    You will maybe have to re-insert your conditions on year and troop -- I've no idea in which tables they belong.
    Also, for the time being, I replaced the "in current year" condition by ">= '2006-01-01'"; adapt this if necessary.
    The conditions on reg_no (with "BETWEEN" and ">= '3'") are more performant than a LIKE; but they assume reg_no to be all digits, and moreover assume an ASCII ordering.
    Again, adapt if necessary.
    Code:
    SELECT *
    FROM   tblSHPEmployee AS e
    WHERE  reg_no BETWEEN '1' AND '39999999'
      AND  NOT EXISTS
           ( SELECT regNo
             FROM   tblTrainingStudents AS s INNER JOIN tblTrainingCourses AS c
                          ON s.CourseID = c.CourseID
             WHERE regNo = e.reg_No
                 AND  year = '2006'
           )
      AND  reg_no NOT IN
           ( SELECT regNo
             FROM   tblBasicschool
             WHERE  endDate >= '2006-01-01'
               AND  (rank = 'TRP' OR regNo >= '3')
           )
    You may replace the "NOT EXISTS" by a "NOT IN" and vice versa; that could give performance differences (depending on the size of the tables), so try out all four possibilities. Don't forget to remove the condition "regNo = e.reg_No" in the "NOT IN" and to add it in the "NOT EXISTS".
    Last edited by Peter.Vanroose; 08-18-06 at 03:05.
    --_Peter Vanroose,
    __IBM Certified Database Administrator, DB2 9 for z/OS
    __IBM Certified Application Developer
    __ABIS Training and Consulting
    __http://www.abis.be/

  6. #6
    Join Date
    Dec 2005
    Location
    Texas
    Posts
    100
    'LIKE' keywords also slow down queries since they force a table scan. Also you're selecting * when you potentially could not be.

  7. #7
    Join Date
    Sep 2004
    Location
    Belgium
    Posts
    1,126
    Quote Originally Posted by disruptivehair
    'LIKE' keywords also slow down queries since they force a table scan.
    Not necessarily: most RDBMS' will use a matching index scan for "LIKE '3%'". But you're right in the case of a "%" at the beginning of the string.
    Quote Originally Posted by disruptivehair
    Also you're selecting * when you potentially could not be.
    This is less of a performance problem, it's at most a data transmission bottleneck (over a slow communication line).
    Except in the case where you would only SELECT a column which happens to be an indexed column (or column combination), since in that case the query *could* be performed with an index-only scan (but that also depends on the WHERE condition of course).
    --_Peter Vanroose,
    __IBM Certified Database Administrator, DB2 9 for z/OS
    __IBM Certified Application Developer
    __ABIS Training and Consulting
    __http://www.abis.be/

  8. #8
    Join Date
    Dec 2005
    Location
    Texas
    Posts
    100
    Quote Originally Posted by Peter.Vanroose
    Not necessarily: most RDBMS' will use a matching index scan for "LIKE '3%'". But you're right in the case of a "%" at the beginning of the string.
    This is less of a performance problem, it's at most a data transmission bottleneck (over a slow communication line).
    Except in the case where you would only SELECT a column which happens to be an indexed column (or column combination), since in that case the query *could* be performed with an index-only scan (but that also depends on the WHERE condition of course).

    I defer to your superior knowledge Peter!

    Our DBAs here refuse to allow us to use LIKE in code because most of the time it does a table scan. 'SELECT *' is also banned since not only is it wasteful, it has a tendency to stop working if the table schema changes.

  9. #9
    Join Date
    Sep 2004
    Location
    Belgium
    Posts
    1,126
    Quote Originally Posted by disruptivehair
    'SELECT *' is also banned since not only is it wasteful, it has a tendency to stop working if the table schema changes.
    I fully agree with that: it's *always* safer to list all required columns explicitly, even it turns out to be all table columns.
    So indeed "SELECT *" should be banned!
    --_Peter Vanroose,
    __IBM Certified Database Administrator, DB2 9 for z/OS
    __IBM Certified Application Developer
    __ABIS Training and Consulting
    __http://www.abis.be/

  10. #10
    Join Date
    Dec 2005
    Location
    Texas
    Posts
    100
    Quote Originally Posted by Peter.Vanroose
    I fully agree with that: it's *always* safer to list all required columns explicitly, even it turns out to be all table columns.
    So indeed "SELECT *" should be banned!

    ITA Peter...we've still got some 'select *' stuff hanging around but it is slowly being re-written to explicitly list all the columns.

  11. #11
    Join Date
    Sep 2004
    Location
    Belgium
    Posts
    1,126
    Or if you want to list all courses together with all employees who did not take that course:
    Code:
    SELECT c.CourseName, e.EmployeeName
    FROM
              tblSHPEmployee AS e
            INNER JOIN
              tblTrainingCourses AS c
            ON NOT EXISTS
                ( SELECT 1
                  FROM   tblTrainingStudents AS s
                  WHERE  s.CourseID = c.CourseID
                    AND  e.reg_No = s.regNo
                )
    WHERE
              reg_no BETWEEN '1' AND '39999999'
    ORDER BY  1, 2
    Mind you that this may again run a lot slower!
    --_Peter Vanroose,
    __IBM Certified Database Administrator, DB2 9 for z/OS
    __IBM Certified Application Developer
    __ABIS Training and Consulting
    __http://www.abis.be/

  12. #12
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    [Non-ANSI Comment]Assuming this is SQL Server then SELECT * within an exists statement is the one time SELECT * is recommended - the optimiser then selects optimal index.[/Non-ANSI Comment]
    Testimonial:
    pootle flump
    ur codings are working excelent.

  13. #13
    Join Date
    Sep 2004
    Location
    Belgium
    Posts
    1,126
    Quote Originally Posted by pootle flump
    SELECT * within an exists statement is recommended - the optimiser then selects optimal index.
    Are you sure that SQLServer would not do that with "SELECT 1" ?
    I know DB2 will typically do an index scan without data lookup when using "SELECT 1".
    --_Peter Vanroose,
    __IBM Certified Database Administrator, DB2 9 for z/OS
    __IBM Certified Application Developer
    __ABIS Training and Consulting
    __http://www.abis.be/

  14. #14
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    You know - I have lost my original MS link but coincidently today I've read something elsewhere (by the bloomin head of SQL Server Optimisation no less) that contradicts what I posted. So disregard
    Testimonial:
    pootle flump
    ur codings are working excelent.

  15. #15
    Join Date
    Sep 2004
    Location
    Belgium
    Posts
    1,126
    Anyway, in an EXISTS, "SELECT 1" is better than "SELECT *" since the former does not require data access (when an index is available on the column(s) specified in the WHERE condition) while the latter may need data access (unless the optimizer is clever enough to see that the two are equivalent :-)
    --_Peter Vanroose,
    __IBM Certified Database Administrator, DB2 9 for z/OS
    __IBM Certified Application Developer
    __ABIS Training and Consulting
    __http://www.abis.be/

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •