Results 1 to 9 of 9
  1. #1
    Join Date
    May 2014
    Posts
    3

    Unanswered: Spliting a column into Multiple columns

    Hi Guys,

    I have to split a column using comma a delimiter into multiple columns.

    I am able to do it if i know how many column will be present in the final output.

    But in daily run, the columns may vary randomly.

    Thats where i need help how to split columns without hardcoding how many columns it ll come.

    This is the code am using

    Code:
    WITH Split_Names (Fil_id,Name, xmlname)
    AS
    (
        SELECT Fil_ID,
        Allergen,
        CONVERT(XML,'<Names><name>'  
        + REPLACE(Allergen,',', '</name><name>') + '</name></Names>') AS xmlname
          FROM stg.T_STG_Allergen
    )
    
     SELECT  Fil_id,
      xmlname.value('/Names[1]/name[1]','varchar(100)') AS aller1,    
     xmlname.value('/Names[1]/name[2]','varchar(100)') AS aller2,
     xmlname.value('/Names[1]/name[3]','varchar(100)') AS aller3,
     xmlname.value('/Names[1]/name[4]','varchar(100)') AS aller4,
     xmlname.value('/Names[1]/name[5]','varchar(100)') AS aller5,
     xmlname.value('/Names[1]/name[6]','varchar(100)') AS aller6,
     xmlname.value('/Names[1]/name[7]','varchar(100)') AS aller7,
     xmlname.value('/Names[1]/name[8]','varchar(100)') AS aller8
     FROM Split_Names
    This query will work fine if we have max 8 allergens..

    But it will fail to capture if we have 15 allergens.

    so can somebody help to dynamically handle it.

    Thanks,
    Magesh

  2. #2
    Join Date
    Apr 2014
    Location
    Kentucky
    Posts
    496
    Provided Answers: 24
    Cant you loop until all are found?

    Code:
    iMax = 0
    j = 1
    i = instr(sXML,"name["& j & "]")
    while i >0
        iMax = j
        j = j +1
        i = instr(sXML,"name["& j & "]")
    wend 
    'build sql with iMax values

  3. #3
    Join Date
    Dec 2007
    Location
    Richmond, VA
    Posts
    1,328
    Provided Answers: 5
    Dave

  4. #4
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Provided Answers: 1
    Code:
    CREATE function [dbo].[ParseString](@String varchar(max), @Delimiter char(1))
    returns table
    as
    
    --function ParseString
    --blindman, 5/20/2008
    --Parses a string based upon a given single-character delimiter,
    --without using loops or a tally table.
    
    --Revision History
    --blindman, 5/29/2013:	Changed paramater type to varchar(max)
    
    --test parameters
    --declare	@String varchar(500)
    --declare	@Delimiter char(1)
    --set		@String = 'rnbqkbnr/pp1ppppp/8/2p5/4P3/5N2/PPPP1PPP/RNBQKB1R'
    --set		@Delimiter = '/'
    --set		@String = 'Abracadabra!, Shazam!, Presto!'
    --set		@Delimiter = ','
    --;
    
    return
    (
    with Results as
    		(select	1 as Ordinal,
    				ltrim(left(@String, charindex(@Delimiter, @String + @Delimiter)-1)) as StringValue,
    				convert(varchar(max), right(@String + @Delimiter, len(@String) - charindex(@Delimiter, @String+@Delimiter) + 1)) as Remaining
    		UNION ALL
    		select	Ordinal+1,
    				convert(varchar(max), ltrim(left(Remaining, charindex(@Delimiter, Remaining)-1))),
    				right(Remaining, len(Remaining) - charindex(@Delimiter, Remaining))
    		from	Results
    		where	len(Remaining) > 0)
    select	Ordinal,
    		StringValue
    from	Results
    )
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  5. #5
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    Here's a different way of looking at your problem.
    Code:
    IF Object_Id('dbo.fnSplit') IS NOT NULL DROP FUNCTION dbo.fnSplit
    GO
    --  ptp  20140610  UDF to split a denormalized string into 1NF on a separator
    
    CREATE FUNCTION dbo.fnSplit(
       @source      VARCHAR(7999)
    ,  @separator   VARCHAR(9)
    ) RETURNS @Split TABLE (
       item         VARCHAR(99)
       )
       BEGIN
          DECLARE @i INT
    
          SET @i = CharIndex(@separator, @source)
    
          WHILE 0 < @i
             BEGIN
                INSERT INTO @Split SELECT Left(@source, @i - 1)
                SET @source = SubString(@source, @i + Len(@separator), 7999)
                SET @i = CharIndex(@separator, @source)
             END
    
          INSERT INTO @Split VALUES (@Source)
    
          RETURN
       END
    GO
    
    --  Set up the demo data
    
    DECLARE @T_STG_Allergen TABLE (
       Fil_ID       INT
    ,  Allergen     VARCHAR(4999)
       )
    
    INSERT INTO @T_STG_Allergen(Fil_ID, Allergen) VALUES
       (15, 'One, Two, Three, Four, Five, Six, Seven, Eight, Nine, Ten, Eleven, Twelve, Thirteen, Fourteen, Fifteen')
    ,  (14, 'One, Two, Three, Four, Five, Six, Seven, Eight, Nine, Ten, Eleven, Twelve, Thirteen, Fourteen')
    ,  (13, 'One, Two, Three, Four, Five, Six, Seven, Eight, Nine, Ten, Eleven, Twelve, Thirteen')
    ,  (12, 'One, Two, Three, Four, Five, Six, Seven, Eight, Nine, Ten, Eleven, Twelve')
    ,  (11, 'One, Two, Three, Four, Five, Six, Seven, Eight, Nine, Ten, Eleven')
    ,  (10, 'One, Two, Three, Four, Five, Six, Seven, Eight, Nine, Ten')
    ,  ( 9, 'One, Two, Three, Four, Five, Six, Seven, Eight, Nine')
    ,  ( 8, 'One, Two, Three, Four, Five, Six, Seven, Eight')
    ,  ( 7, 'One, Two, Three, Four, Five, Six, Seven')
    ,  ( 6, 'One, Two, Three, Four, Five, Six')
    ,  ( 5, 'One, Two, Three, Four, Five')
    ,  ( 4, 'One, Two, Three, Four')
    ,  ( 3, 'One, Two, Three')
    ,  ( 2, 'One, Two')
    ,  ( 1, 'One')
    --,  ( 0, '')
    
    --  Demo using UDF
    
    SELECT Fil_ID, LTrim(item) AS Allergen
       FROM @T_STG_Allergen
       CROSS APPLY dbo.fnSplit(Allergen, ',')
    
    --  Demo using XML
    
    ; WITH AllergenList (Fil_id, Allergen, xmlname) AS (
        SELECT Fil_ID,
          Allergen,
          CONVERT(XML, '<Names><name>'  
    +     REPLACE(Allergen,', ', '</name><name>') + '</name></Names>') AS xmlname
          FROM @T_STG_Allergen
    ), SpecificAllergens (Fil_ID, item) AS
    (  SELECT Fil_ID, parsed.node.value('.', 'VARCHAR(99)') AS Allergen
          FROM AllergenList
    	  CROSS APPLY xmlname.nodes('/Names[1]/name') AS parsed(node)
    )
     SELECT *
        FROM SpecificAllergens
    -PatP
    In theory, theory and practice are identical. In practice, theory and practice are unrelated.

  6. #6
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    I did a bit more research into this problem, since it is one that I encounter pretty often for clients working with enormous data sets (billions of rows). I always knew that string manipulation was SQL Server's "Achilles heel" for performance, but this was an eye opener for me!

    On a laptop with 8 Gb of RAM, the performance difference was astounding. Please post your min/max/avg values from the very last result set, I'm quite interested in knowing how much difference there is "in the wild" for this code.
    Code:
    IF Object_Id('dbo.fnSplit') IS NOT NULL DROP FUNCTION dbo.fnSplit
    GO
    --  ptp  20140610  UDF to split a denormalized string into 1NF based on a separator
    
    CREATE FUNCTION dbo.fnSplit(
       @source      VARCHAR(7999)
    ,  @separator   VARCHAR(9)
    ) RETURNS @Split TABLE (
       item         VARCHAR(99)
       )
       BEGIN
          DECLARE @i INT
    
          SET @i = CharIndex(@separator, @source)
    
          WHILE 0 < @i
             BEGIN
                INSERT INTO @Split SELECT Left(@source, @i - 1)
                SET @source = SubString(@source, @i + DataLength(@separator), 7999)
                SET @i = CharIndex(@separator, @source)
             END
    
          INSERT INTO @Split VALUES (@Source)
    
          RETURN
       END
    GO
    IF Object_Id('dbo.ParseString') IS NOT NULL DROP FUNCTION dbo.ParseString
    GO
    CREATE function [dbo].[ParseString](@String varchar(max), @Delimiter char(1))
    returns table
    as
    
    --function ParseString
    --blindman, 5/20/2008
    --Parses a string based upon a given single-character delimiter,
    --without using loops or a tally table.
    
    --Revision History
    --blindman, 5/29/2013:	Changed paramater type to varchar(max)
    
    --test parameters
    --declare	@String varchar(500)
    --declare	@Delimiter char(1)
    --set		@String = 'rnbqkbnr/pp1ppppp/8/2p5/4P3/5N2/PPPP1PPP/RNBQKB1R'
    --set		@Delimiter = '/'
    --set		@String = 'Abracadabra!, Shazam!, Presto!'
    --set		@Delimiter = ','
    --;
    
    return
    (
    with Results as (
    select	1 as Ordinal,
       ltrim(left(@String, charindex(@Delimiter
    ,  @String + @Delimiter)-1)) as StringValue
    ,  convert(varchar(max), right(@String + @Delimiter, len(@String) 
    -  charindex(@Delimiter, @String+@Delimiter) + 1)) as Remaining
    UNION ALL
    select	Ordinal+1,
       convert(varchar(max), ltrim(left(Remaining, charindex(@Delimiter, Remaining)-1)))
    ,  right(Remaining, len(Remaining) - charindex(@Delimiter, Remaining))
       from	Results
       where	len(Remaining) > 0
    )
    select  Ordinal,
            StringValue
    from    Results
    )
    GO
    --  Declare the timing variables
    
    DECLARE @d1 DATETIME2(7)
    ,  @d2      DATETIME2(7)
    ,  @d3      DATETIME2(7)
    ,  @d4      DATETIME2(7)
    ,  @d5      DATETIME2(7)
    ,  @z       INT = 999
    
    DECLARE  @heats   TABLE (
       i            INT     NOT NULL
    ,  LoopingUDF   INT     NOT NULL
    ,  CTEUDF       INT     NOT NULL
    ,  CTEXML       INT     NOT NULL
    ,  PureCTE      INT     NOT NULL
       )
    
    --  Set up the demo data
    
    DECLARE @dummy TABLE (
       seqnum       INT
    ,  testnum      INT
    ,  Fil_ID       INT
    ,  Allergen     VARCHAR(99)
       PRIMARY KEY (seqnum, testnum, Fil_ID, Allergen)
       )
    
    DECLARE @T_STG_Allergen TABLE (
       Fil_ID       INT
    ,  Allergen     VARCHAR(4999)
       )
    
    INSERT INTO @T_STG_Allergen(Fil_ID, Allergen) VALUES
       (15, 'One, Two, Three, Four, Five, Six, Seven, Eight, Nine, Ten, Eleven, Twelve, Thirteen, Fourteen, Fifteen')
    ,  (14, 'One, Two, Three, Four, Five, Six, Seven, Eight, Nine, Ten, Eleven, Twelve, Thirteen, Fourteen')
    ,  (13, 'One, Two, Three, Four, Five, Six, Seven, Eight, Nine, Ten, Eleven, Twelve, Thirteen')
    ,  (12, 'One, Two, Three, Four, Five, Six, Seven, Eight, Nine, Ten, Eleven, Twelve')
    ,  (11, 'One, Two, Three, Four, Five, Six, Seven, Eight, Nine, Ten, Eleven')
    ,  (10, 'One, Two, Three, Four, Five, Six, Seven, Eight, Nine, Ten')
    ,  ( 9, 'One, Two, Three, Four, Five, Six, Seven, Eight, Nine')
    ,  ( 8, 'One, Two, Three, Four, Five, Six, Seven, Eight')
    ,  ( 7, 'One, Two, Three, Four, Five, Six, Seven')
    ,  ( 6, 'One, Two, Three, Four, Five, Six')
    ,  ( 5, 'One, Two, Three, Four, Five')
    ,  ( 4, 'One, Two, Three, Four')
    ,  ( 3, 'One, Two, Three')
    ,  ( 2, 'One, Two')
    ,  ( 1, 'One')
    ,  ( 0, '')
    
    WHILE 0 < @z BEGIN
    --  Demo using looping UDF
    
    SET @d1 = SysDateTime()
    
    INSERT INTO @dummy
    SELECT @z, 1, Fil_ID, LTrim(item) AS Allergen
       FROM @T_STG_Allergen
       CROSS APPLY dbo.fnSplit(Allergen, ',')
    
    --  Demo using CTE UDF
    
    SET @d2 = SysDateTime()
    
    INSERT INTO @dummy
    SELECT @z, 2, Fil_ID, StringValue AS Allergen
       FROM @T_STG_Allergen
       CROSS APPLY dbo.ParseString(Allergen, ',')
    
    --  Demo using XML
    
    SET @d3 = SysDateTime()
    
    ; WITH AllergenList (Fil_id, Allergen, xmlname) AS (
        SELECT Fil_ID,
          Allergen,
          CONVERT(XML, '<Names><name>'  
    +        REPLACE(Allergen,', ', '</name><name>')
    +           '</name></Names>') AS xmlname
          FROM @T_STG_Allergen
    ), SpecificAllergens (Fil_ID, Allergen) AS
    (  SELECT Fil_ID, parsed.node.value('.', 'VARCHAR(99)') AS Allergen
          FROM AllergenList
    	  CROSS APPLY xmlname.nodes('/Names[1]/name') AS parsed(node)
    )
    INSERT INTO @dummy
       SELECT @z, 3, Fil_ID, Allergen
          FROM SpecificAllergens
    
    --  Demo using CTE
    
    SET @d4 = SysDateTime()
    
    ; WITH pre_process(source, separator, sl, Fil_ID) AS
    (  SELECT Allergen + ', ', ', ', Cast(DataLength(', ') AS BIGINT), Fil_ID
          FROM @T_STG_Allergen
    ), boundaries (Fil_ID, i, b, e) AS
    (  SELECT Fil_Id, 1, Cast(1 AS BIGINT), CharIndex(separator, source, 1)
          FROM pre_process UNION ALL
       SELECT pre_process.Fil_Id, 1 + i, e + sl       , CharIndex(separator, source, e + sl)
          FROM boundaries
    	  JOIN pre_process
    	     ON (pre_process.Fil_ID = boundaries.Fil_ID)
    	  WHERE e + sl < DataLength(source)
    ), results (i, parsed, Fil_ID) AS
    (  SELECT i, Substring(source, b, e - b), pre_process.Fil_ID
          FROM pre_process
    	  JOIN boundaries
    	     ON (pre_process.Fil_ID = boundaries.Fil_ID)
    )
    -- SELECT QuoteName(source), Quotename(separator), sl FROM pre_process
    -- SELECT i, b, e FROM boundaries
       INSERT INTO @dummy
          SELECT @z, 4, Fil_ID, parsed AS Allergen
          FROM results
    
    SET @d5 = SysDateTime()
    
    INSERT INTO @heats (i, LoopingUDF, CTEUDF, CTEXML, PureCTE)
    SELECT @z
    ,  DateDiff(ms, @d1, @d2)
    ,  DateDiff(ms, @d2, @d3)
    ,  DateDiff(ms, @d3, @d4)
    ,  DateDiff(ms, @d4, @d5)
    
    SET @z = @z - 1
    END
    
    SELECT Str(i, 3) AS i, LoopingUDF,      CTEUDF
    ,  CTEXML,      PureCTE  FROM @heats UNION ALL
    SELECT 'Sum',      Sum(LoopingUDF), Sum(CTEUDF)
    ,  Sum(CTEXML), Sum(PureCTE) FROM @heats UNION ALL
    SELECT 'Min',      Min(LoopingUDF), Min(CTEUDF)
    ,  Min(CTEXML), Min(PureCTE) FROM @heats UNION ALL
    SELECT 'Avg',      Avg(LoopingUDF), Avg(CTEUDF)
    ,  Avg(CTEXML), Avg(PureCTE) FROM @heats UNION ALL
    SELECT 'Max',      Max(LoopingUDF), Max(CTEUDF)
    ,  Max(CTEXML), Max(PureCTE) FROM @heats
    -PatP
    Last edited by Pat Phelan; 06-11-14 at 00:42. Reason: Modified to allow higher run count
    In theory, theory and practice are identical. In practice, theory and practice are unrelated.

  7. #7
    Join Date
    Jan 2007
    Location
    UK
    Posts
    11,434
    Provided Answers: 10
    Code:
    Sum	3228	3747	10562	2915
    Min	0	0	0	0
    Avg	3	3	10	2
    Max	16	16	16	16
    Not the best environment as I am semi-remote working.
    Code:
    Sum	3256	3632	12275	2880
    Min	0	0	0	0
    Avg	3	3	12	2
    Max	16	16	31	16
    Ran directly on the server... hmmm
    George
    Home | Blog

  8. #8
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Provided Answers: 1
    Horses for courses.
    The code I posted handles varchar(max).

    When fsSplit is changed to handle varchar(max) input and output, and the @T_STG_Allergen.Allergen column is defined as varchar(max), the playing field is leveled and the pure CTE does not do so well:

    Code:
    Test		LoopingUDF	CTEUDF	CTEXML	PureCTE
    AsWritten	3216		4518	12137	3367
    Varchar(max)	4944		4610	14643	4811
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  9. #9
    Join Date
    Dec 2007
    Location
    Richmond, VA
    Posts
    1,328
    Provided Answers: 5
    watch it you guys keep comparing how fast you're doing this, is gonna bring Tonkuma and Lenny in to try and outdo you
    Dave

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •