Page 1 of 2 12 LastLast
Results 1 to 15 of 21
  1. #1
    Join Date
    Jun 2004
    Posts
    15

    Unanswered: String to Number

    Quick question I hope:

    Is there a way to convert a string into a unique number. I have tried using checksum(str) but this appears not to be unique.

    For example:
    select checksum('KEY AND KEY')
    select checksum('COPENBARGER AND COPENBARGER')
    both yield -2027749374.

    Can someone suggest an alternative function?
    Many thanks !

  2. #2
    Join Date
    Jan 2005
    Posts
    6
    No, the CHECKSUM() function is not guaranteed to return unique values.

    With strings of 20 or so characters, there are far more possible string values than there are 32-bit (or 64-bit) integer values. So a unique mapping is not possible.

  3. #3
    Join Date
    Jun 2004
    Posts
    15
    anyone aware of an alternative?

    Tis a long story involving multiple systems and converting data. The destination system requires a numeric key to be used across several tables - the main table in the source contains a unique name and it would be ideal if I could quickly (across millions of rows) generate a unique key seeded on this.

    Looks like I may need to come up with another idea!
    Thanks.

  4. #4
    Join Date
    Nov 2002
    Location
    Jersey
    Posts
    10,322
    Are you looking for GUID?
    Brett
    8-)

    It's a Great Day for America everybody!

    dbforums Yak CorralRadio 'Rita
    dbForums Member List
    I'm Good Once as I ever was

    The physical order of data in a database has no meaning.

  5. #5
    Join Date
    Jun 2004
    Posts
    15
    hmm, im not sure that that is quite what I need, thanks for the suggestion though. Isnt a GUID generated based on time and network address? I need something that is just a function of my string so that the key generated is meaningful accross the tables and systems I am merging.

    I can see some long way round solutions involving transferring data round tables and identity columns and such. But I had this cunning idea and it seems I cant pull it off.


    Is there not a function which converts a string to a binary number or something? Im sure Ive seen it done. Might be my imagination!

    I suppose I could write a function that converts each letter to a number but I fear that will be slow over millions of rows.

    Cheers for the response .

  6. #6
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Provided Answers: 1
    I know of no practical algorithm that will do what you want. Honestly, I'm surprised you were actually able to find two string values that generate the same checksum. It's supposed to very improbably, and I have not encountered it before.
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  7. #7
    Join Date
    Nov 2002
    Location
    Jersey
    Posts
    10,322
    Well why do you want to convert everything to an integer? That seems like a waste...aren't the char values unique enough?

    And if they're not, how would you joing across databases with a number anyway?
    Brett
    8-)

    It's a Great Day for America everybody!

    dbforums Yak CorralRadio 'Rita
    dbForums Member List
    I'm Good Once as I ever was

    The physical order of data in a database has no meaning.

  8. #8
    Join Date
    Nov 2002
    Location
    Jersey
    Posts
    10,322
    Close eyes...pull trigger...

    Code:
    USE Northwind
    GO
    
    CREATE FUNCTION udfMyAlgorithym(@string varchar(8000))
    	RETURNS decimal(32,16)
    as
      BEGIN
    	DECLARE @Len int, @CheckDigit int, @Conversion decimal(32,16)
    	SELECT @Len = LEN(@string), @CheckDigit = ASCII(SUBSTRING(@string,1,1))
    	SELECT @Conversion = (CHECKSUM(@string)/(@Len*1.000))+@CheckDigit
      	RETURN @Conversion
      END
    GO
    
    SELECT dbo.udfMyAlgorithym(CustomerId) FROM Orders
    GO
    
    DROP FUNCTION udfMyAlgorithym
    GO
    Brett
    8-)

    It's a Great Day for America everybody!

    dbforums Yak CorralRadio 'Rita
    dbForums Member List
    I'm Good Once as I ever was

    The physical order of data in a database has no meaning.

  9. #9
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Provided Answers: 1
    Oh, I like it. Very cool idea. But I don't think it solves the problem. It's still possible for two strings to yield the same result, though maybe less likely than before. Plus, couldn't rounding truncate two results to the same value?
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  10. #10
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    Ok, granted I tend to say "what box" when people suggest that I think outside the box, but if you need "numbers", how about:
    Code:
    SELECT Checksum('KEY AND KEY'), Checksum('COPENBARGER AND COPENBARGER') -- Same
    SELECT Cast('KEY AND KEY' AS VARBINARY)
    ,  Cast('COPENBARGER AND COPENBARGER' AS VARBINARY) -- Different
    -PatP

  11. #11
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Provided Answers: 1
    How large can the Uniqe Value be?

    select checksum('KEY AND KEY')*10000000000 + checksum(left('KEY AND KEY', len('KEY AND KEY') -1))
    select checksum('COPENBARGER AND COPENBARGER')*10000000000 + checksum(left('COPENBARGER AND COPENBARGER', len('COPENBARGER AND COPENBARGER') -1))
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  12. #12
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    Quote Originally Posted by blindman
    How large can the Uniqe Value be?
    A better question is "How small does the unique value have to be?" As Zev proved that to uniquely encode N random bits without loss requires N bits. If there are patterns (which there usually are), then you can uniquely encode the N bits into a pattern that is the difference between N and its redundancy factor R, which is typically about 55% of N for English text.

    Anywho, this math gets really complex if you want to persue it, but the short answer is that getting really unique values that are significantly smaller than the original has been a holy grail of computing for years. Good luck!

    -PatP

  13. #13
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Provided Answers: 1
    Damn!

    Fails when last characters are the same:

    select checksum('KEY AND KEYX')*10000000000 + checksum(left('KEY AND KEYX', len('KEY AND KEYX') -1))
    select checksum('COPENBARGER AND COPENBARGERX')*10000000000 + checksum(left('COPENBARGER AND COPENBARGERX', len('COPENBARGER AND COPENBARGERX') -1))

    I guess a function would be required for the second CHECKSUM that would peel off characters from both strings until it finds two characters that do not match.
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  14. #14
    Join Date
    Jun 2004
    Posts
    15
    Quote Originally Posted by Brett Kaiser
    Well why do you want to convert everything to an integer? That seems like a waste...aren't the char values unique enough?

    And if they're not, how would you joing across databases with a number anyway?
    I must admit I tried to keep the question simple, and didnt feel it necessary to go into the details of a convoluted project. You'll just have to trust me ! As Pat suggests, there isnt always a box . Which brings me on to...


    Quote Originally Posted by Pat Phelan
    Anywho, this math gets really complex if you want to persue it, but the short answer is that getting really unique values that are significantly smaller than the original has been a holy grail of computing for years. Good luck!
    An interesting topic! Not something Id thought about before and would certainly give me a headache if I though for too long about that. The cast idea seems simple and ideal.



    Thanks everyone for their posts

  15. #15
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Provided Answers: 1
    The CAST solution will not give you a unique NUMBER. It gives you a unique VARBINARY value.
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •