Results 1 to 11 of 11
  1. #1
    Join Date
    Jun 2006
    Posts
    11

    Unanswered: efficiently creating random numbers in very large table

    Hello,

    I need to sample data in a very large table in SQL Server 2000 (a gazillion rows of Performance Monitor statitics).

    I'd like to take the top 5%, for instance, based upon a column containing random numbers.

    Can anyone suggest a highly efficient method of populating a column with random numbers.

    Thanks in advance.

    Rod

  2. #2
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Provided Answers: 1
    select TOP 5 PERCENT * from [YourTable] order by newid()
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  3. #3
    Join Date
    Jun 2006
    Posts
    11
    Quote Originally Posted by blindman
    select TOP 5 PERCENT * from [YourTable] order by newid()

    Thank you, I'll give that a go.


    Regards,

    Rod

  4. #4
    Join Date
    May 2004
    Location
    Seattle
    Posts
    1,313
    that won't populate your table with any random numbers obviously.

    it will give you a random 5% slice of the table. a different slice each time you run it.

  5. #5
    Join Date
    Jun 2006
    Posts
    11
    Thanks, Good point; maybe I can have another column to set a bit , so that I can reproduce. I'll have to test performance, perhaps someone has some experience with this or have a different technique to propose. Thank you.

    Rod

  6. #6
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Provided Answers: 1
    If you really want a column of random values, then just create a GUID column with a default of NEWID(). But this won't give you a random sample every time, of course.
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  7. #7
    Join Date
    Jun 2006
    Posts
    11
    Quote Originally Posted by blindman
    If you really want a column of random values, then just create a GUID column with a default of NEWID(). But this won't give you a random sample every time, of course.

    That's ok blindman, I just neede something that's efficient in terms populating random values. Regards, Rod

  8. #8
    Join Date
    Mar 2005
    Location
    Netherlands
    Posts
    280
    Quote Originally Posted by blindman
    just create a GUID column with a default of NEWID()
    Ofcourse this works but if your table is really that big beware of the time it takes to alter the table! SQL Server has to expand each record so numerous page splits will occur, indexes will have to be rebuild, etc, etc. This could take a couple of hours.

  9. #9
    Join Date
    Jun 2006
    Posts
    11
    Quote Originally Posted by Lexiflex
    Ofcourse this works but if your table is really that big beware of the time it takes to alter the table! SQL Server has to expand each record so numerous page splits will occur, indexes will have to be rebuild, etc, etc. This could take a couple of hours.

    ...ugh.. Thanks. There does not seem to be a really efficient way of doing this...

    Thanks for you input. Rod

  10. #10
    Join Date
    May 2004
    Location
    Seattle
    Posts
    1,313
    how many rows is the table?

    also, you can generate random numbers in sql using rand() if you don't like guids. if a random number from 0-255 is sufficient you could store it in a tinyint and less page splits would result.

    this code ran in 31 sec on my dev box. not great, but it is what it is:

    Code:
    set nocount on
    declare @t table (RandomColumn tinyint)
    declare @i int
    set @i=0
    
    while @i < 1000000
    begin
        insert into @t select round(rand() * 255, 0)
        set @i = @i + 1 
    end

  11. #11
    Join Date
    Jun 2006
    Posts
    11
    Quote Originally Posted by jezemine
    how many rows is the table?

    also, you can generate random numbers in sql using rand() if you don't like guids. if a random number from 0-255 is sufficient you could store it in a tinyint and less page splits would result.

    this code ran in 31 sec on my dev box. not great, but it is what it is:

    Code:
    set nocount on
    declare @t table (RandomColumn tinyint)
    declare @i int
    set @i=0
    
    while @i < 1000000
    begin
        insert into @t select round(rand() * 255, 0)
        set @i = @i + 1 
    end


    That maybe ok, you're right, not great but maybe we can live that. Thanks for your code.

    Regards,

    Rod

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •