Results 1 to 15 of 15
  1. #1
    Join Date
    Nov 2004
    Posts
    1,427
    Provided Answers: 4

    Unanswered: MSSQL analogue for DB2's TRANSLATE() function

    Hi,

    I have to exchange data, including names and surnames, about persons with the government. The problem at hand is that if I pass a name like 'Franois', their system compares it with 'FRANCOIS' in their database, notice the difference and throw an error back. (They seem to uppercase the names first and then compare them.) They can't deal with special characters like , , , , , , ...

    Given our customer base with also people originating from France, Spain, Scandinavian countries, Turkey, East European countries, ... the names and surnames contain many of those special characters.

    In DB2 I could write
    Code:
    SELECT TRANSLATE(ColumnName,
    'aaaaaeieieiianoooooaouuuuyAAAAAEIEIEIIANNOOOOAOUUUUY',
    '')
    FROM DaTable
    to get a cleaned up string. 'Franois de Dona lafson' becomes 'Francois de Donea Olafson'.

    I have not been able to find an analogue function in SQL Server so far. I'd prefer not to have to write a long long list of enclosing REPLACE(REPLACE(..., '', 'a'), '', 'e') statements.

    (SQL Server 2005)
    With kind regards . . . . . SQL Server 2000/2005/2012
    Wim

    Grabel's Law: 2 is not equal to 3 -- not even for very large values of 2.
    Pat Phelan's Law: 2 very definitely CAN equal 3 -- in at least two programming languages

  2. #2
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    There is no equivalent - you have to roll your own.

    I have written quite a few functions like this - what you write depends on your requirement.

    If you want quick a dirty code and don't worry about performance then write some code to generate your REPLACE(REPLACE(REPLACE(... statements. This solution performs worse the fewer replacements it needs to make. Note also it doesn't handle consecutive characters.

    If performance is a concern than you need to be more clever. The best method I have found is a CLR function using string.IndexOf (or a similar method - I forget the exact syntax) to find the next character you want to swap out. Run the replace on that character and repeat in a loop until no characters remain.

  3. #3
    Join Date
    Sep 2001
    Location
    Chicago, Illinois, USA
    Posts
    601
    Whenever I have done this, I use a SELECT statement with a LIKE criteria that can specify the list of characters I am looking to transform. For any records resulting, I post the primary key of that record to a temorary table. I then use a cursor against the original table joined to the temporary table, and I iterate through the "defective records," for each character I am looking to replace and, if there is a hit, I "correct" that character.

    By doing it this way, you reduce the number of records you have to cursor-through.

    Just my two cents.
    Ken

    Maverick Software Design

    (847) 864-3600 x2

  4. #4
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    EDIT....not worth posting.....

  5. #5
    Join Date
    Nov 2004
    Posts
    1,427
    Provided Answers: 4
    Pootle, I used your idea. This is what I came up with.
    Code:
    IF OBJECT_ID (N'dbo.Translate', N'FN') IS NOT NULL
    	DROP FUNCTION dbo.Translate;
    GO
    
    CREATE FUNCTION dbo.Translate (
    	@String NVARCHAR(MAX), 
    	@SearchFor NVARCHAR(MAX), 
    	@ReplaceBy NVARCHAR(MAX)
    )
    RETURNS NVARCHAR(MAX)
    AS
    BEGIN
    	if @String IS NULL
    	BEGIN
    --		RAISERROR ('@String can not be NULL.', 10, 1)
    		RETURN(NULL)
    	END
    
    	if @SearchFor IS NULL
    	BEGIN
    --		RAISERROR ('@SearchFor can not be NULL.', 10, 1)
    		RETURN(NULL)
    	END
    
    	if @ReplaceBy IS NULL
    	BEGIN
    --		RAISERROR ('@ReplaceBy can not be NULL.', 10, 1)
    		RETURN(NULL)
    	END
    
    -- Suppose a character 'X' is present (one time or multiple times) in @String AND in @SearchFor. 
    -- And the index of the character 'X' in @SearchFor is I. 
    -- The function will replace all occurances of the character 'X' in @String with the corresponding 
    -- character Y at the index I in @ReplaceBy.
    -- When no characters of @SearchFor are present in @String,  the function returns @String unaltered.
    -- When there are more characters in @SearchFor than in @ReplaceBy (I > len(@ReplaceBy)), those extra 
    -- characters in @SearchFor are removed from @String (replaced by an empty string). 
    -- There may not be more characters in @ReplaceBy than in @SearchFor. 
    	if len(@ReplaceBy) > len(@SearchFor)
    	BEGIN
    --		RAISERROR ('The number of characters in @ReplaceBy can not be higher than those in @SearchFor.', 10, 1)
    		RETURN(NULL)
    	END
    
    	DECLARE @SearchForIndex INT
    	DECLARE @SearchChar NVARCHAR(1)
    	DECLARE @ReplaceChar NVARCHAR(1)
    	DECLARE @ResultString NVARCHAR(MAX)
    
    	SET @SearchForIndex = 1
    	SET @ResultString = @String
    
    	-- for all characters in @SearchFor
    	WHILE @SearchForIndex <= len(@SearchFor)
    	BEGIN
    		-- get the current character from @SearchFor
    		SET @SearchChar = Substring(@SearchFor, @SearchForIndex, 1)
    		-- if the current @SearchFor character is also present in @String
    		if Charindex(@SearchChar, @ResultString) > 0
    		begin
    			-- Replace alle oocurences of the current @SearchFor character in @String
    			-- by the corresponding character (same index) in @ReplaceBy.
    			-- BUT when there is no corresponding character in @ReplaceBy 
    			-- (@SearchForIndex > len(@ReplaceBy)), replace the characters in @String 
    			-- by the empty string '' (remove them)
    			if @SearchForIndex <= len(@ReplaceBy) 
    				SET @ReplaceChar = Substring(@ReplaceBy, @SearchForIndex, 1)
    			else
    				SET @ReplaceChar = ''
    
    			SET @ResultString = REPLACE(@ResultString, @SearchChar, @ReplaceChar)
    		end
    
    		SET @SearchForIndex = @SearchForIndex + 1
    	END;
    
    	RETURN(@ResultString);
    END;
    GO
    
    
    SELECT dbo.Translate('123abcABC1', '1', 'E') AS Translated; -- 'E23abcABCE' OK
    SELECT dbo.Translate('123abcABC1', '123', 'E') AS Translated; -- 'EabcABCE' OK
    SELECT dbo.Translate('123abcABC1', 'mno', 'XYZ') AS Translated; -- '123abcABC1' OK
    SELECT dbo.Translate('123abcABC1', 'ABC', 'XYZ') AS Translated; -- '123abcXYZ1' NOK, got '123XYZXYZ1', Charindex() is case insensitive
    It doesn't work properly.

    Translate('123abcABC1', 'ABC', 'XYZ') should return '123abcXYZ1', but it returns '123XYZXYZ1', Charindex() seems to be case insensitive. How can I verify if a character is present, in a case sensitive way? string.IndexOf is not a TransactSQL command. How can I use it?

    I have mostly written SP's so far. RAISERROR does not seem to work in an UDF. Any idea how I could let it raise an error when the input is wrong?
    With kind regards . . . . . SQL Server 2000/2005/2012
    Wim

    Grabel's Law: 2 is not equal to 3 -- not even for very large values of 2.
    Pat Phelan's Law: 2 very definitely CAN equal 3 -- in at least two programming languages

  6. #6
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    I'm afraid I'm on my phone so can't answer thoroughly until back in work.
    1) that wasn't my idea I recommend CLR for any string manipulation that needs to perform well - t-sql scalar functions on strings are poop if the dataset is significant. Having said that, first glance is you are writing a fairly decent t-sql function
    2) charindex will be case insensitive if the input is using a case insensitive colation. You can force a colation using the colation statement.
    3) you realise you are raising informational errors there right? Have you tried a level of 16?

    If you haven't cracked it by a.m. UK time I'll have another, sober, nose at it

  7. #7
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    Wim:

    First and foremost let me repeat with emphasis that you ought to use the CLR for scalable character manipulation. Doing this in pure Transact SQL is likely to perform badly.

    Next, your code works with two very minor tweaks:
    Code:
    IF OBJECT_ID (N'dbo.Translate', N'FN') IS NOT NULL
    	DROP FUNCTION dbo.Translate;
    GO
    
    CREATE FUNCTION dbo.Translate (
    	@String NVARCHAR(MAX), 
    	@SearchFor NVARCHAR(MAX), 
    	@ReplaceBy NVARCHAR(MAX)
    )
    RETURNS NVARCHAR(MAX)
    AS
    BEGIN
    	if @String IS NULL
    	BEGIN
    --		RAISERROR ('@String can not be NULL.', 10, 1)
    		RETURN(NULL)
    	END
    
    	if @SearchFor IS NULL
    	BEGIN
    --		RAISERROR ('@SearchFor can not be NULL.', 10, 1)
    		RETURN(NULL)
    	END
    
    	if @ReplaceBy IS NULL
    	BEGIN
    --		RAISERROR ('@ReplaceBy can not be NULL.', 10, 1)
    		RETURN(NULL)
    	END
    
    -- Suppose a character 'X' is present (one time or multiple times) in @String AND in @SearchFor. 
    -- And the index of the character 'X' in @SearchFor is I. 
    -- The function will replace all occurances of the character 'X' in @String with the corresponding 
    -- character Y at the index I in @ReplaceBy.
    -- When no characters of @SearchFor are present in @String,  the function returns @String unaltered.
    -- When there are more characters in @SearchFor than in @ReplaceBy (I > len(@ReplaceBy)), those extra 
    -- characters in @SearchFor are removed from @String (replaced by an empty string). 
    -- There may not be more characters in @ReplaceBy than in @SearchFor. 
    	if len(@ReplaceBy) > len(@SearchFor)
    	BEGIN
    --		RAISERROR ('The number of characters in @ReplaceBy can not be higher than those in @SearchFor.', 10, 1)
    		RETURN(NULL)
    	END
    
    	DECLARE @SearchForIndex INT
    	DECLARE @SearchChar NVARCHAR(1)
    	DECLARE @ReplaceChar NVARCHAR(1)
    	DECLARE @ResultString NVARCHAR(MAX)
    
    	SET @SearchForIndex = 1
    	SET @ResultString = @String
    
    	-- for all characters in @SearchFor
    	WHILE @SearchForIndex <= len(@SearchFor)
    	BEGIN
    		-- get the current character from @SearchFor
    		SET @SearchChar = Substring(@SearchFor, @SearchForIndex, 1)
    		-- if the current @SearchFor character is also present in @String
    		if Charindex(@SearchChar, @ResultString COLLATE Latin1_General_BIN) > 0
    		begin
    			-- Replace alle oocurences of the current @SearchFor character in @String
    			-- by the corresponding character (same index) in @ReplaceBy.
    			-- BUT when there is no corresponding character in @ReplaceBy 
    			-- (@SearchForIndex > len(@ReplaceBy)), replace the characters in @String 
    			-- by the empty string '' (remove them)
    			if @SearchForIndex <= len(@ReplaceBy) 
    				SET @ReplaceChar = Substring(@ReplaceBy, @SearchForIndex, 1)
    			else
    				SET @ReplaceChar = ''
    
    			SET @ResultString = REPLACE(@ResultString COLLATE Latin1_General_BIN, @SearchChar, @ReplaceChar)
    		end
    
    		SET @SearchForIndex = @SearchForIndex + 1
    	END;
    
    	RETURN(@ResultString);
    END;
    GO
    
    
    SELECT dbo.Translate('123abcABC1', '1', 'E') AS Translated; -- 'E23abcABCE' OK
    SELECT dbo.Translate('123abcABC1', '123', 'E') AS Translated; -- 'EabcABCE' OK
    SELECT dbo.Translate('123abcABC1', 'mno', 'XYZ') AS Translated; -- '123abcABC1' OK
    SELECT dbo.Translate('123abcABC1', 'ABC', 'XYZ') AS Translated; -- '123abcXYZ1' OK
    Third, if you are bound by some constraint to do this using pure Transact SQL, if you can live without translating the ^, -, and ] characters then you can do this in Transact SQL far more efficiently using PatIndex and converting your search string into a pattern. I would strongly prefer you to use the CLR, but I can show you how to use PatIndex if you must do this and can live without those characters.

    -PatP
    In theory, theory and practice are identical. In practice, theory and practice are unrelated.

  8. #8
    Join Date
    Nov 2004
    Posts
    1,427
    Provided Answers: 4
    Pootle, Pat,

    Thank you.
    1) that wasn't my idea ...
    I was referring to the algorithm.
    2) charindex will be case insensitive if the input is using a case insensitive colation. You can force a colation using the colation statement.
    I understand it now, after seeing how Pat solved it.
    3) you realise you are raising informational errors there right? Have you tried a level of 16?
    I didn't realise it would matter. This is the info I based it upon (BOL). No mention about 16 invoking any other behaviour compared to using 5 or 10 or ..
    severity
    Is the user-defined severity level associated with this message. When using msg_id to raise a user-defined message created using sp_addmessage, the severity specified on RAISERROR overrides the severity specified in sp_addmessage.

    Severity levels from 0 through 18 can be specified by any user. Severity levels from 19 through 25 can only be specified by members of the sysadmin fixed server role or users with ALTER TRACE permissions. For severity levels from 19 through 25, the WITH LOG option is required.
    First and foremost let me repeat with emphasis that you ought to use the CLR for scalable character manipulation.
    I have no experience in using CLR and I didn't realise T-SQL would perform poorly in this case. Guess I have some more study ahead. Is it possible to write ironPython code for CLR, or are only VB and C# options?
    ...but I can show you how to use PatIndex if you must do this and can live without those characters.
    T-SQL is not a requirement, just the tool I know the best. Yes, I am interested in your solution using Patindex. I'm making a guess here, does it involve replacing Charindex(@SearchChar, @ResultString COLLATE Latin1_General_BIN) by PatIndex('%[@SearchChar]%', @ResultString COLLATE Latin1_General_BIN) (can't check it here, I can't access the database)?
    With kind regards . . . . . SQL Server 2000/2005/2012
    Wim

    Grabel's Law: 2 is not equal to 3 -- not even for very large values of 2.
    Pat Phelan's Law: 2 very definitely CAN equal 3 -- in at least two programming languages

  9. #9
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    In simple terms T-SQL is pretty sucky for lots of string manipulations. Distance algorithms, for example, run 1000s of times slower than in CLR (based on my tests).

    You can use any .NET language to write a CLR function though I confess I only know C# and VB.NET. Have a google for all the available languages.

    Also, although Pat and I are recommending CLR you should only venture in to that if you find your T-SQL is not performing as well as you hoped.

  10. #10
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    A minor optimisation: since you don't use the return of CHARINDEX, it is more efficient to use LIKE instead here:
    Code:
    ......
    if Charindex(@SearchChar, @ResultString COLLATE Latin1_General_BIN) > 0
    Also, may I ask why you use all the semi colons? I don't have an agenda with the question - just curious that I might be missing out on something.

  11. #11
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    Quote Originally Posted by Wim View Post
    No mention about 16 invoking any other behaviour compared to using 5 or 10 or ..
    You might find this useful:
    Error Message Severity Levels

    I use 10 or 16 and nothing else when raising errors. 10 for information, 16 for genuine errors. Admittedly, it looks like 11 - 16 are treated the same so you don't have to use 16 as such.
    Last edited by pootle flump; 04-21-10 at 05:29.

  12. #12
    Join Date
    Nov 2004
    Posts
    1,427
    Provided Answers: 4
    Quote Originally Posted by pootle flump View Post
    A minor optimisation: since you don't use the return of CHARINDEX, it is more efficient to use LIKE instead here:
    Code:
    ......
    if Charindex(@SearchChar, @ResultString COLLATE Latin1_General_BIN) > 0
    Also, may I ask why you use all the semi colons? I don't have an agenda with the question - just curious that I might be missing out on something.
    I replaced Charindex(@SearchChar, @ResultString COLLATE Latin1_General_BIN) by
    Code:
    if @ResultString COLLATE Latin1_General_BIN LIKE '%@SearchChar%'
    But now the function doesn't work correctly any more, it always returns the original string.

    The semicolons originate from the CREATE FUNCTION example I found somewhere on the Internet. I don't have any bad feelings about ending SQL statements by a semicolon. In DB2 it is mandatory. It divides your script in clear chunks of SQL statements. It took me a while to get accustomed towards their optional use in SQL Server.

    I did some coarse efficiency tests on selecting 5000 records with two columns: name and first name.
    Without use of Translate function: 0 sec
    With use of Translate function, Charindex: 2 sec
    With use of Translate function, PatIndex: 5 sec (if PatIndex('%[@SearchChar]%', @ResultString COLLATE Latin1_General_BIN) > 0)
    With use of Translate function, LIKE: 6 sec (returned erroneous string)

    It seems the Charindex() function is after all the fastest T-SQL solution in this case.
    With kind regards . . . . . SQL Server 2000/2005/2012
    Wim

    Grabel's Law: 2 is not equal to 3 -- not even for very large values of 2.
    Pat Phelan's Law: 2 very definitely CAN equal 3 -- in at least two programming languages

  13. #13
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    Code:
    LIKE '%@SearchChar%'
    should be:
    Code:
    LIKE '%' + @SearchChar + '%'

  14. #14
    Join Date
    Nov 2004
    Posts
    1,427
    Provided Answers: 4
    The LIKE '%' + @SearchChar + '%' trick worked. I did the 5000 records test again.

    Without use of Translate function: 0 sec
    With use of Translate function, Charindex: 2 sec
    With use of Translate function, PatIndex: 5 sec
    With use of Translate function, LIKE: 1.5 sec

    We have a new winner.
    With kind regards . . . . . SQL Server 2000/2005/2012
    Wim

    Grabel's Law: 2 is not equal to 3 -- not even for very large values of 2.
    Pat Phelan's Law: 2 very definitely CAN equal 3 -- in at least two programming languages

  15. #15
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    Quote Originally Posted by Wim View Post
    Without use of Translate function: 0 sec
    LOL!

    You can be pretty sure any SQL containing a scalar function (even a system one) will be slower than any alternative T-SQL syntax that does not contain one.

    Of course some of the more obtuse members of the community will now be hammering their keyboards in an attempt to be the first to prove me wrong ...

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •