Results 1 to 8 of 8
  1. #1
    Join Date
    Oct 2010
    Posts
    7

    Unanswered: Random selection of records BUT no more than one per worker

    I have to produce a random sample of five subjects for work items (called episodes) of a particular type which were completed within a chosen time period by workers who work for a particular team.

    The difficult part to the requirement is that all five of the subjects in the sample must come from episodes for which the done_by_worker_id is different, i.e. five unique done_by_worker_id's. How can I achieve this?

    I can easily produce a sample of five subjects within the period :


    select
    subject,
    'TEAM A' as assigned_workers_team
    from
    (select distinct
    episodes.subject
    from
    episodes,
    episode_types
    where
    --joins
    episodes.type = episode_types.type
    --other conditions
    and end_date between '&P1' and '&P2'
    and upper(th_ad.asc_name_org(th_ad.asc_primary_team(ep isodes.done_by_worker_id, end_date)))='TEAM A'
    and upper(description) in ('CLDS SINGLE ASSESSMENT','OVERVIEW ASSESSMENT')
    order by
    dbms_random.value()
    )
    where
    rownum<=5


    I did try to create a correlated subquery in which I would have randomly selected an "end date" for a valid episode of that type for that worker, but this would have required making a join to an alias for the episodes.done_by_worker_id and I was getting "invalid identifier", ORA-00904. Possibly I was trying to use an inline view where not allowed?

    Any ideas?

    Dave

  2. #2
    Join Date
    Aug 2003
    Location
    Where the Surf Meets the Turf @Del Mar, CA
    Posts
    7,776
    Provided Answers: 1
    You can lead some folks to knowledge, but you can not make them think.
    The average person thinks he's above average!
    For most folks, they don't know, what they don't know.
    Good judgement comes from experience. Experience comes from bad judgement.

  3. #3
    Join Date
    Oct 2010
    Posts
    7
    Anacedent

    Thanks for the link and inspiring tagline.

    Do you have any further explanation for how the sample_clause section of the Oracle docs (your link) will help me. Yes, I've read through a few screens down from there but nothing there jumps out at me.

    Thanks.

    Dave

  4. #4
    Join Date
    Oct 2010
    Posts
    7
    Quote Originally Posted by anacedent View Post
    Did you actually read my query? Can you actually solve it?

    On current evidence I'd say it can't be done in SQL so I'll be going to PL/SQL. The link to RTFM was a timewaster.

    Dave

  5. #5
    Join Date
    Jan 2004
    Location
    Croatia, Europe
    Posts
    4,094
    Provided Answers: 4
    Would it be possible to provide a simple test case (some sample data, as well as desired output based on that data)? I don't know about the others, but I can't quite imagine what you have and what you want to get.

  6. #6
    Join Date
    Mar 2007
    Posts
    623
    Quote Originally Posted by wobbly View Post
    The difficult part to the requirement is that all five of the subjects in the sample must come from episodes for which the done_by_worker_id is different, i.e. five unique done_by_worker_id's. How can I achieve this?
    Using analytic ROW_NUMBER function in the innermost query, you can assign each row with different DONE_BY_WORKER_ID a sequence. Something like
    Code:
    ROW_NUMBER() OVER (PARTITION BY episodes.done_by_worker_id ORDER BY dbms_random.value())
    Then, pick rows which have this expression the least (equal to one). The exact query depends on the fact, what shall be returned when there are more/less distinct DONE_BY_WORKER_IDs than 5? Just choose the random ones or return rows from each DONE_BY_WORKER_ID (although they would be more than 5)? Pick up less rows than 5 or pick up second(/third/...) rows for some DONE_BY_WORKER_IDs?

  7. #7
    Join Date
    Oct 2010
    Posts
    7
    Quote Originally Posted by Littlefoot View Post
    Would it be possible to provide a simple test case (some sample data, as well as desired output based on that data)? I don't know about the others, but I can't quite imagine what you have and what you want to get.
    Every subject has episodes.

    The episodes have the following columns: subject, done_by_worker_id, type.

    Strictly speaking I could have written my initial attempt for a query to look for a selection of random episodes but this would have had the downside that two episodes may relate to the same subject (though this is actually very unlikely as it happens). So in fact it was better to look for a selection of random subjects using distinct.

    What I need to do now is to show a chosen number of random episodes where that episode ended within the chosen period, but so that no two episodes relate to the same subject, or were done by the same worker.

    This is not actual data but illustrates the point:
    EPISODES:
    Code:
    ID    SUBJECT     DONE_BY_WORKER_ID
    1     21          6
    2     45          7
    3     67          8
    4     67          5
    5     33          3
    6     99          23
    7     6           3
    8     91          3
    9     55          5
    10    22          41

    I want a valid selection of 3 random episodes fulfilling my criteria

    The following would be invalid as subjects are repeated
    Code:
    ID    SUBJECT     DONE_BY_WORKER_ID
    3     67          8
    4     67          5
    7     6           3
    The following would be invalid as done_by_worker_id gets repeated
    Code:
    ID    SUBJECT     DONE_BY_WORKER_ID
    5     33          3
    7     6           3
    9     55          5
    The following is valid as neither repeats
    Code:
    ID    SUBJECT     DONE_BY_WORKER_ID
    10    22          41
    3     67          8
    1     21          6

  8. #8
    Join Date
    Oct 2010
    Posts
    7
    Quote Originally Posted by flyboy View Post
    Using analytic ROW_NUMBER function in the innermost query, you can assign each row with different DONE_BY_WORKER_ID a sequence. Something like
    Code:
    ROW_NUMBER() OVER (PARTITION BY episodes.done_by_worker_id ORDER BY dbms_random.value())
    Then, pick rows which have this expression the least (equal to one). The exact query depends on the fact, what shall be returned when there are more/less distinct DONE_BY_WORKER_IDs than 5? Just choose the random ones or return rows from each DONE_BY_WORKER_ID (although they would be more than 5)? Pick up less rows than 5 or pick up second(/third/...) rows for some DONE_BY_WORKER_IDs?
    Thanks to flyboy for the excellent tip! I believe I have the solution now:

    Code:
    select 
       * 
    from
      (select 
          * 
       from
         (select 
    	     * 
          from
            (select 
                subject, 
                episodes.id, 
                row_number() over (partition by subject order by dbms_random.value()) as epi_num_for_subject, 
                done_by_worker_id, 
                row_number() over (partition by done_by_worker_id order by dbms_random.value()) as epi_num_for_worker
             from 
                episodes, 
                episode_types
             where 
                end_date between '&P1' and '&P2'
                and upper(th_ad.asc_name_org(th_ad.asc_primary_team(done_by_worker_id, end_date)))='TEAM A'
                and episodes.type = episode_types.type
                and upper(description) in ('CLDS SINGLE ASSESSMENT','OT ASSESSMENT')
    		)
          where 
             epi_num_for_subject = 1
             and epi_num_for_worker=1
          )
       order by
          dbms_random.value()
       )
    where
       rownum<=5

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •