Page 1 of 2 12 LastLast
Results 1 to 15 of 16
  1. #1
    Join Date
    Oct 2003
    Posts
    84

    Unanswered: Best practices: GROUP BY clause

    I was wondering what the best way to write a GROUP BY clause when there are many (and time consuming) operations in the fields by grouped.

    Fictious example:

    SELECT DeptNo, AVG(Salary) FROM Department GROUP BY DeptNo;

    This will give me the average salary per department. Let's say, however that
    I had 10-15 fields being returned (along with the AVG(Salary)) and some fields even had operations being performed on them. Is it better to create a temporary table to calculate the sum per department (or a VIEW) and then
    perform a JOIN with the rest of the data?

    Fictious example:

    SELECT DATENAME(y, StartDate), DATENAME(m, StartDate), DATEPART(d, StartDate), SUBSTR(DeptName, 1, 10), SomeFunction(SomeField), SomeFunction(SomeField), AVG(Salary)
    GROUP BY DATENAME(y, StartDate), DATENAME(m, StartDate), DATEPART(d, StartDate), SUBSTR(DeptName, 1, 10), SomeFunction(SomeField), SomeFunction(SomeField);

    Am I better off writing my query this way or using a JOIN on some temporary table or view?

    Thanks

  2. #2
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Provided Answers: 1
    Write your query this way. One way to maximize the efficiency of a process is to reduce the number of times the server has to scan through the data. By putting all your aggregate functions in a single statement, the server only needs to run through the dataset one time.

    But are all those datename and datepart functions necessary? That seems kind of wastefull. You could accomplish the same thing just by sorting by date.
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  3. #3
    Join Date
    Oct 2003
    Posts
    84
    Bindman...That was just an example I made up. I am not asking this for a particular case right now, but I have in the past had queries that had many fields, and many of those fields had math/string/etc functions performed on them. Most of the time This was to provide formatting for a query that would be dumped into a report or out put to the user. For example, I might format an ID by Left padding with zeroes:
    RIGHT(REPLICATE(MyPadChar, MyFieldWidth) + CAST(MyID AS VARCHAR), MyFieldWidth) AS [MyFormattedID].......

    so all those fields would appear in my group by clause.....I was wondering if this was a good practice.

    Thanks

  4. #4
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Provided Answers: 1
    It's acceptable in my opinion.

    Anybody else want to comment on this?
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  5. #5
    Join Date
    Nov 2002
    Location
    Jersey
    Posts
    10,322
    Code:
    SELECT * FROM (
      SELECT  DATENAME(y, StartDate) AS Col1
    	, DATENAME(m, StartDate) AS Col2
    	, DATEPART(d, StartDate) AS Col3
    	, SUBSTRING(DeptName, 1, 10) AS Col4
    	, SomeFunction(SomeField) AS Col5
    	, SomeFunction(SomeField) AS Col6
    	, AVG(Salary) AS Col7
        FROM myTable99) AS XXX
    GROUP BY Col1, Col2, Col3, Col4, Col5, Col6
    Brett
    8-)

    It's a Great Day for America everybody!

    dbforums Yak CorralRadio 'Rita
    dbForums Member List
    I'm Good Once as I ever was

    The physical order of data in a database has no meaning.

  6. #6
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Provided Answers: 1
    Yeah, I thought about suggesting that. I've used it for clarity of coding before, but can you think of any reason it might or might not be more efficient? I guess the question is, when you include a formula in the output and also specify it in the GROUP BY clause, does the server calculate the formula twice, or is it smart enough to just calculate it once?
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  7. #7
    Join Date
    Oct 2003
    Posts
    84
    Kaiser. Thanks for the hint....that'll clean things up a whole lot....
    As for blindman, yeah, I'd like to know if evaluation takes place twice when the quesy is run.

  8. #8
    Join Date
    Nov 2002
    Location
    Jersey
    Posts
    10,322
    Quote Originally Posted by blindman
    I guess the question is, when you include a formula in the output and also specify it in the GROUP BY clause, does the server calculate the formula twice, or is it smart enough to just calculate it once?

    It had better be once, since it's a derived table...I never checked, but a SHOWPLAN should tell you what's up.

    But again, this is M$, so you never know....

    But since I'm a betting kinda guy...

    $US1000.00 on Once to Win.....
    Brett
    8-)

    It's a Great Day for America everybody!

    dbforums Yak CorralRadio 'Rita
    dbForums Member List
    I'm Good Once as I ever was

    The physical order of data in a database has no meaning.

  9. #9
    Join Date
    Nov 2002
    Location
    Jersey
    Posts
    10,322
    Quote Originally Posted by afx2029
    Kaiser. Thanks for the hint.....
    Your welcome.....Kaiser? Try Brett, x002548, or Ski (get it....)

    Where are you in the world?
    Brett
    8-)

    It's a Great Day for America everybody!

    dbforums Yak CorralRadio 'Rita
    dbForums Member List
    I'm Good Once as I ever was

    The physical order of data in a database has no meaning.

  10. #10
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Provided Answers: 1
    Then would you agree that your subquery example would not be more efficient than coding formulas in the WHERE clause, though it scores points for clarity?
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  11. #11
    Join Date
    Nov 2002
    Location
    Jersey
    Posts
    10,322
    No...would you agree that using formulas would cause non sargable predicates there by invalidating the use of any index?
    Brett
    8-)

    It's a Great Day for America everybody!

    dbforums Yak CorralRadio 'Rita
    dbForums Member List
    I'm Good Once as I ever was

    The physical order of data in a database has no meaning.

  12. #12
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Provided Answers: 1
    Yes, but the use of indexes is lost anyway when you filter on the results of forumulas in the subquery. I don't see how either of these methods would make efficient use of indexes.
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  13. #13
    Join Date
    Oct 2003
    Posts
    84
    In the subquery, Query Analyser still complains about the subquery you mentioned. If it sees an aggregate function with other fields, you have to GROUP BY the other fields.
    Sorry, I read it too fast the first time. I thought I saw:

    SELECT ClientID AS Col1, OrderID AS Col2, SUM(Price) AS Col3
    GROUP BY Col1, Col2;


    I did not see the subquery and thought that what you gave me was similiar to
    the ORDER BY 1, 2, 3... clause (instead of using actual column names)

  14. #14
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    scores points for clarity? are you talking about post #5?

    yes, assuming all the errors were fixed up!
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  15. #15
    Join Date
    Oct 2003
    Posts
    84
    Yeah, I meant post #5. (the thread is getting a little long).

    Um, the subquery won't run because of the aggregate missing a GROUP BY clause.

    I also couldn't get the group clause to work on a column alias....

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •