Results 1 to 10 of 10
  1. #1
    Join Date
    Apr 2008
    Posts
    4

    Unanswered: Need help writing a query...

    Hi Friends,

    I need help on writing an optimized query for the following problem..

    Consider a table with columns "Date Emp1 Emp2"

    Date Emp1 Emp2
    1-feb 1 . . 2
    2-feb 3 . . 4
    3-feb 6 . . 1
    4-feb 1 . . 2
    5-feb 1 . . 5
    6-Feb 5 . . 7


    Now, I want search for records with employee id 1 in a way that if column emp1 contains 1 then I want the data in column emp2, and if column emp2 contains 1, then I want data in column emp1.

    The output has to be a single column with no duplicate values. In the above example, for employee id 1, the output would be, 2,5 and 6 only.

    This table has lakhs of records. I have to scan both columns for a given employee id. What will be the most optimized way to retrieve the data faster. Also, do I need to restructure this table for faster data retrieval?
    I have indexes on emp1 and emp2 columns. Do we need union here and if yes, what is the best optimized query for the same?

  2. #2
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    Code:
    SELECT    DISTINCT res    = COALESCE(NULLIF(Emp1, 1), Emp2)
    FROM    t
    WHERE    1 IN(Emp1, Emp2)
    Regarding structure - I am concerned about first normal form and third normal form.

    EDIT - reread the requirements - added distinct.
    Last edited by pootle flump; 04-15-08 at 06:59.
    Testimonial:
    pootle flump
    ur codings are working excelent.

  3. #3
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    Code:
    select emp2
      from daTable
     where emp1 = 1
    union
    select emp1
      from daTable
     where emp2 = 1
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  4. #4
    Join Date
    Apr 2008
    Posts
    4
    I will have to test COALESCE and union for perfomance...MY table has really really huge number of records. If you guys can think of some other solution, do let know. Thanks for the help.

  5. #5
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    It partly depends on the selectivity of Emp1 and Emp2. It would also help if in combination they are unique (as in your sample data). Another though - is this SQL Server 2005?

    EDIT - related to the last question - what is the clustered index?
    Last edited by pootle flump; 04-15-08 at 09:55.
    Testimonial:
    pootle flump
    ur codings are working excelent.

  6. #6
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    BTW - lakhs - that is thousands right? If you mean you have "thousands of records" then no - your table is nothing like huge. How fast does this need to be returned?
    Testimonial:
    pootle flump
    ur codings are working excelent.

  7. #7
    Join Date
    Apr 2008
    Posts
    4
    yes..lakhs of records. A stored procedure puts lots of date daily in the table.
    I have another concernabout how to manage this data. All the data from the table must be available all the time. Nad yes,it MS SQL 2005.

  8. #8
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    Were the queries fast enough or do you need further optimisation? Which performed best? I suspect it will be Rudy's (with the correct indexes). Also remember that adding indexes will slow down insert and (probably) update speed....

    How many is "lots daily"?
    Testimonial:
    pootle flump
    ur codings are working excelent.

  9. #9
    Join Date
    Apr 2008
    Posts
    4
    LOTS is basically I can say around 5000 records per day. And yes, Rudy's query is a little faster. Also, indexes is not a problem as the stored procedure runs midnight and does inserts only and the time it takes to execute is not a constraint...

    I am trying to use partitions. I cannot use date to filter my rows in the where clause so cannot have partition on date column....is there anyway I can partition based on emp1 or emp2 for better data retrieval and storage.

    btw, pootle flump- thanks for all the help so far..

  10. #10
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    Don't use partitions. 5k per day is a very small number of records.

    For this query (and considering this query only) the fastest index choice would be to create a clustered index on emp1 and a nonclustered index on emp2. emp2 would be in the leaf level of the clustered index, and in 2005 the optimiser can access the clustered index value in the non clustered index without visiting the clustered index so, in effect, emp1 would be in the leaf level of the nonclustered index. Obviously, you could switch this the other way round - it is immaterial which is clustered or not.

    Basically this would mean Rudy's query would do two index seeks. The distinct requirement will hold things up a bit but no alternative with your data structure.

    Please also note this is not necessarily optimal for this table - it is only optimal for this one query. Index tuning can only be done properly considering all the factors of all the operations on the table. Normally, for example, you would want the clustered index to be unique and you may still do so in this instance.
    Testimonial:
    pootle flump
    ur codings are working excelent.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •