Page 1 of 2 12 LastLast
Results 1 to 15 of 21

Thread: Random select

  1. #1
    Join Date
    Feb 2004
    Location
    Copenhagen
    Posts
    220

    Unanswered: Random select

    AIX 5.2 DB2 8.1.4

    I have to select 10000 rows out of a 100.000.000 row
    table - at RANDOM! (there's some stupid SAS consultant
    demanding it)

    Does anyone have a clue to how this can be accomplished?

    Thanks
    Kristian K. Hansen
    Project Supervisor
    National Board of Health

  2. #2
    Join Date
    Jan 2003
    Posts
    1,605
    Hi,

    Try this:
    SELECT COL1, COL2, COL3
    FROM TAB
    FETCH FIRST 10000 ROWS ONLY

    NOTE: Please don't insult consultant. This is not the forum to insult people, it is DB2 forum!!!

    Hope this helps,
    Grofaty

  3. #3
    Join Date
    Feb 2004
    Location
    Copenhagen
    Posts
    220

    Re: Random select

    Ok sorry about the snide remark

    But the fetch first clause does not
    select at random - it selects in the order
    that the table was created!
    Kristian K. Hansen
    Project Supervisor
    National Board of Health

  4. #4
    Join Date
    Mar 2004
    Location
    Toronto, ON, Canada
    Posts
    513

    Re: Random select

    Originally posted by Tank
    Ok sorry about the snide remark

    But the fetch first clause does not
    select at random - it selects in the order
    that the table was created!
    Not always... depends on how db2 decides to retreive it, how things are sitting in the bufferpool, etc... but if it's all REORG'd nicely then it will likely be ordered.

    You need a reverse REORG... :-)

    Can you add a column and insert some random values into it, then select a range out of that? Or create a side table using the key from the table and a random value?
    --
    Jonathan Petruk
    DB2 Database Consultant

  5. #5
    Join Date
    Feb 2004
    Location
    Copenhagen
    Posts
    220
    Yes I have come to the
    same conclusion its
    just such a gigantic operation,
    but thank for the input. ;-)
    Kristian K. Hansen
    Project Supervisor
    National Board of Health

  6. #6
    Join Date
    Jun 2002
    Location
    UK
    Posts
    525
    I'm toying with the idea of using MOD in conjunction with GENERATE_UNIQUE and an integer column ( or row_number()over() ). This could easily be used to pick every Nth row at random out of the table but each row would always be 10000 rows apart in this instance, so it woudln't truly be random.

    You could maybe do something clever using the concept above in some kind of recursive select that would mutate the interval. You'd only have to know the total number of rows up front and increment a counter by a factor of GENERATE_UNIQUE() until you had the number of rows required.

    If nobody else comes up with a solution, I'll have a pop when I get time.

    Damian

  7. #7
    Join Date
    Jun 2003
    Location
    Toronto, Canada
    Posts
    5,516
    Provided Answers: 1

    Re: Random select

    Originally posted by Tank
    AIX 5.2 DB2 8.1.4

    I have to select 10000 rows out of a 100.000.000 row
    table - at RANDOM! (there's some stupid SAS consultant
    demanding it)

    Does anyone have a clue to how this can be accomplished?

    Thanks
    I think you can achieve the intended result by adding a column to the source table that would contain random values, ordering the select by that column, and returning first 10K records.

  8. #8
    Join Date
    Feb 2004
    Location
    Copenhagen
    Posts
    220

    Re: Random select

    Yes

    However I have just discovered
    the tablesample clause in the
    select statement and this works great,
    even though you cannot specify a
    number of rows - only a percentage.

    Thanks for all the help

    syntax example:

    create table ssr.t_test as
    (select * from ssr.t_ssik2003 tablesample bernoulli (1))
    definition only;
    Kristian K. Hansen
    Project Supervisor
    National Board of Health

  9. #9
    Join Date
    Jun 2003
    Location
    Toronto, Canada
    Posts
    5,516
    Provided Answers: 1

    Re: Random select

    Originally posted by Tank
    Yes

    However I have just discovered
    the tablesample clause in the
    select statement and this works great,
    even though you cannot specify a
    number of rows - only a percentage.

    Thanks for all the help

    syntax example:

    create table ssr.t_test as
    (select * from ssr.t_ssik2003 tablesample bernoulli (1))
    definition only;
    That's a cool feature of v8! Thanks for the info.

  10. #10
    Join Date
    Mar 2004
    Location
    Toronto, ON, Canada
    Posts
    513

    Re: Random select

    Originally posted by Tank
    Yes

    However I have just discovered
    the tablesample clause in the
    select statement and this works great,
    even though you cannot specify a
    number of rows - only a percentage.

    Thanks for all the help

    syntax example:

    create table ssr.t_test as
    (select * from ssr.t_ssik2003 tablesample bernoulli (1))
    definition only;
    Does DB2 support that option? I tried it on a v8.1 system and got an SQL0270N rc=43 ("One or more of the options you have requested is currently not supported.")... and nothing in the SQL Reference about it.

    This is on Linux, though, are you z/0S?

    Very neat solution, though!
    --
    Jonathan Petruk
    DB2 Database Consultant

  11. #11
    Join Date
    May 2003
    Location
    USA
    Posts
    5,737
    This feature was added in 8.1.2. check the online doc for details.
    http://publib.boulder.ibm.com/infoce...help/index.jsp
    M. A. Feldman
    IBM Certified DBA on DB2 for Linux, UNIX, and Windows
    IBM Certified DBA on DB2 for z/OS and OS/390

  12. #12
    Join Date
    Mar 2004
    Location
    Toronto, ON, Canada
    Posts
    513
    Originally posted by Marcus_A
    This feature was added in 8.1.2. check the online doc for details.
    http://publib.boulder.ibm.com/infoce...help/index.jsp
    Thanks. Turns out I'm at fixpack 4 so I have it, just that I was trying it against a view which apparantly doesn't work. Works when I go against a "real" table.

    Jon
    --
    Jonathan Petruk
    DB2 Database Consultant

  13. #13
    Join Date
    Mar 2004
    Location
    Pittsburgh
    Posts
    1

    Wink

    For those who donot have the cool tablesample...
    Here is a round about way to get rows in random.
    #1 Load the table (full or partial) on to a dataset
    #2 Append random numbers with the utility IEBDG - Test data generator
    A sample job would look like this
    Code:
    //STEP10   EXEC PGM=IEBDG                                               
    //SYSPRINT DD  SYSOUT=A                                                 
    //SEQIN    DD  DSNAME=<INPUT.FILE>,DISP=OLD                      
    //SEQOUT   DD  DSNAME=<OUTPUT.FILE>,DISP=OLD                     
    //SYSIN    DD  *                                                        
         DSD     OUTPUT=(SEQOUT),INPUT=(SEQIN)                              
         FD      NAME=F1,LENGTH=04,FORMAT=RA,STARTLOC=1                     
         FD      NAME=F2,LENGTH=30,FORMAT=AL,ACTION=RP,STARTLOC=10          
         FD      NAME=F3,LENGTH=45,FORMAT=AN,ACTION=TR,STARTLOC=40          
         FD      NAME=F4,LENGTH=03,FORMAT=ZD,INDEX=1,STARTLOC=85            
         FD      NAME=F5,LENGTH=03,FORMAT=ZD,INDEX=2,STARTLOC=89            
         FD      NAME=F6,LENGTH=04,FORMAT=ZD,INDEX=3,STARTLOC=93            
         CREATE QUANTITY=04,INPUT=SEQIN,NAME=(F1,F2,F3,F4,F5,F6)            
         END                                                                
    /*
    This would put random numbers (binary) onto first 4 bytes (FORMAT=RA).
    #3 Now you can sort the dataset on random number (ascending, descending or any order of your choice) and can get 1000 'random' sample records.
    For more details on IEBDG, you can refer to
    http://publibz.boulder.ibm.com/cgi-b...22161259#3.2.4

  14. #14
    Join Date
    Mar 2004
    Location
    Toronto, ON, Canada
    Posts
    513
    TABLESAMPLE - an interesting use for this could be the "bulk delete" problem many dbas run into... ie. trying to delete many rows from a table that is actively being used in a 24x7 system.

    Instead of monkying around trying to find a range for one of your columns that would give you 1/10th of the rows, just use something like:

    DELETE FROM C WHERE B IN
    (SELECT B FROM C TABLESAMPLE BERNOULLI (10));

    Maybe not the most terribly efficient way... but neither are half the solutions I've seen people try!

    Just tried it on my sample db, seems to work. One caveat would be that you can't use 10% 10 times and make 100%, since each new run would be 10% of the post-delete table...
    --
    Jonathan Petruk
    DB2 Database Consultant

  15. #15
    Join Date
    Mar 2004
    Location
    Toronto, ON, Canada
    Posts
    513
    Originally posted by J Petruk
    TABLESAMPLE - an interesting use for this could be the "bulk delete" problem many dbas run into... ie. trying to delete many rows from a table that is actively being used in a 24x7 system.

    Instead of monkying around trying to find a range for one of your columns that would give you 1/10th of the rows, just use something like:

    DELETE FROM C WHERE B IN
    (SELECT B FROM C TABLESAMPLE BERNOULLI (10));

    Maybe not the most terribly efficient way... but neither are half the solutions I've seen people try!

    Just tried it on my sample db, seems to work. One caveat would be that you can't use 10% 10 times and make 100%, since each new run would be 10% of the post-delete table...
    Or use FETCH FIRST X ROWS ONLY...
    --
    Jonathan Petruk
    DB2 Database Consultant

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •