Results 1 to 15 of 15
  1. #1
    Join Date
    Jul 2003
    Location
    Michigan
    Posts
    1,941

    Unanswered: Brute-Force Substring Extraction

    In my product description field, I need to extract the data that looks like
    19-3-19 or 0-0-7. Sometimes that pattern is in the data, sometimes not.
    I only want those that contain the pattern, of course, but I have no other method of determining whether or not the pattern will be in the description.
    I thought I'd found a method that worked, until I encountered descriptions
    that had a '-' in it prior to the pattern.

    Here's what I have so far, it works for all but the last one.
    Code:
    CREATE TABLE #TMPPROD (
    	PROD_DESC VARCHAR(60)
    )
    INSERT INTO #TMPPROD
    SELECT '22-4-12 15%SCU'
    UNION ALL
    SELECT '14-4-8 W/F'
    UNION ALL
    SELECT 'UFLEXX 17-3-17 2%FE'
    UNION ALL
    SELECT 'FERT-PEST 19-19-19'
    
    SELECT  PROD_DESC,  SUBSTRING(PROD_DESC,CHARINDEX('-', PROD_DESC)-CASE WHEN   PATINDEX('[0-9]', SUBSTRING(PROD_DESC, CHARINDEX('-', PROD_DESC) - 2,1))<>0 THEN 2 
    	WHEN   PATINDEX('[0-9]', SUBSTRING(PROD_DESC, CHARINDEX('-', PROD_DESC) - 1,1))<>0 THEN 1 
    	ELSE 0 END,  CHARINDEX('-', SUBSTRING(PROD_DESC,
    	CHARINDEX('-',PROD_DESC)+1,2))+CASE WHEN   PATINDEX('[0-9]', SUBSTRING(PROD_DESC, CHARINDEX('-', PROD_DESC) - 2,1))<>0 THEN 2 
    	WHEN   PATINDEX('[0-9]', SUBSTRING(PROD_DESC, CHARINDEX('-', PROD_DESC) - 1,1))<>0 THEN 1 
    	ELSE 0 END+2+CASE WHEN   PATINDEX('[0-9]', SUBSTRING(PROD_DESC, CHARINDEX('-', PROD_DESC) +1,2))<>0 THEN 2 
    	WHEN   PATINDEX('[0-9]', SUBSTRING(PROD_DESC, CHARINDEX('-', PROD_DESC) - 1,1))<>0 THEN 1 
    	ELSE 0 END) AS 'ANALYSIS'
    FROM #TMPPROD
    WHERE PROD_DESC LIKE '%-[0-9]%-[0-9]%'
    
    DROP TABLE #TMPPROD
    Any pointers on what to try next, or have I severely over-complicated the process - as I am wont to do.

    Thanks
    Mark
    Inspiration Through Fermentation

  2. #2
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    Sorry fella - I could not be bothered figuring out the DML.
    When you say:
    19-3-19 or 0-0-7
    do you mean literals (i.e. these are the ONLY two values the field might contain) or these are examples? I suspect the latter since the former is trivial but to be sure...
    Testimonial:
    pootle flump
    ur codings are working excelent.

  3. #3
    Join Date
    Jul 2003
    Location
    Michigan
    Posts
    1,941
    Yes, it's the latter. The numbers themselves can be anywhere between 0 and 62
    Inspiration Through Fermentation

  4. #4
    Join Date
    Jul 2003
    Location
    Michigan
    Posts
    1,941
    Just in case it helps, the statement above returns:

    Prod_Desc...................Analysis
    22-4-12 15%SCU..........22-4-12
    14-4-8 W/F.................14-4-8
    UFLEXX 17-3-17 2%FE...17-3-17
    FERT-PEST 19-19-19......-P

    That last one aint what I'm looking for
    Inspiration Through Fermentation

  5. #5
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    Okey doo - ta

    So - I don't think this is a problem:
    Quote Originally Posted by RedNeckGeek
    I thought I'd found a method that worked, until I encountered descriptions
    that had a '-' in it prior to the pattern.
    Unless the pattern can also be followed by digits and/ or "-".

    Am I being a thicky pants?
    Testimonial:
    pootle flump
    ur codings are working excelent.

  6. #6
    Join Date
    Jul 2003
    Location
    Michigan
    Posts
    1,941
    Quote Originally Posted by pootle flump
    Unless the pattern can also be followed by digits and/ or "-".
    Not sure I follow, but.... Anything that follows the pattern is immaterial.
    I tried some test data with an extra '-' farther down the line, and it still worked as expected.
    Inspiration Through Fermentation

  7. #7
    Join Date
    Apr 2007
    Posts
    183
    Code:
    DECLARE	@TMPPROD TABLE (PROD_DESC VARCHAR(60))
    
    INSERT	@TMPPROD
    SELECT	'22-4-12 15%SCU' UNION ALL
    SELECT	'14-4-8 W/F' UNION ALL
    SELECT	'UFLEXX 17-3-17 2%FE' UNION ALL
    SELECT	'No-valid-data 17-17 x' UNION ALL
    SELECT	'FERT-PEST 19-19-19'
    
    SELECT	LEFT(d.q, CHARINDEX(' ', d.q + ' '))
    FROM	(
    		SELECT	SUBSTRING(PROD_DESC, PATINDEX('%[0-9]%-%[0-9]-%[0-9]%', PROD_DESC), 60) AS q
    		FROM	@TMPPROD
    	) AS d
    WHERE	d.q LIKE '[0-9]%-%[0-9]-%[0-9]%'

  8. #8
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    Ok, I'm going to preface this with the assertion that the code is uglier than a mud fence because it is doing a "brute force" dance around what I see as pattern matching limitations in the product. I also take a few liberties in accepting any one or two digits, not just 1 through 62, although you could also work around that easily enough if it was important to you.
    Code:
    --  20070828  ptp  Find/return a CNO formatted substring inside an argument string
    
    CREATE FUNCTION dbo.fCNO(@pArg VARCHAR(8000))
       RETURNS VARCHAR(9) AS
       BEGIN
          DECLARE
             @cResult		VARCHAR(9)
    ,        @i			INT
    
          SET @cResult = NULL
    
          SET @i = PatIndex('%[0-9]%-[0-9]%-[0-9]%', @pArg)
          WHILE @cResult IS NULL AND 0 < @i
             BEGIN
                SET @pArg = SubString(@pArg, @i, 8000)
                SET @cResult =
                   CASE
                      WHEN @pArg LIKE '[0-9][0-9]-[0-9][0-9]-[0-9][0-9]%' THEN SubString(@pArg, 1, 8)
                      WHEN @pArg LIKE '[0-9][0-9]-[0-9][0-9]-[0-9]%'      THEN SubString(@pArg, 1, 7)
                      WHEN @pArg LIKE '[0-9][0-9]-[0-9]-[0-9][0-9]%'      THEN SubString(@pArg, 1, 7)
                      WHEN @pArg LIKE '[0-9]-[0-9][0-9]-[0-9][0-9]%'      THEN SubString(@pArg, 1, 7)
                      WHEN @pArg LIKE '[0-9][0-9]-[0-9]-[0-9]%'           THEN SubString(@pArg, 1, 6)
                      WHEN @pArg LIKE '[0-9]-[0-9][0-9]-[0-9]%'           THEN SubString(@pArg, 1, 6)
                      WHEN @pArg LIKE '[0-9]-[0-9]-[0-9]%'                THEN SubString(@pArg, 1, 5)
                      ELSE NULL
                   END
    
                SET @i = PatIndex('%[0-9]%-[0-9]%-[0-9]%', @pArg)
             END
    
          RETURN @cResult
       END
    GO
    
    CREATE TABLE #TMPPROD (
    	PROD_DESC VARCHAR(60)
    )
    INSERT INTO #TMPPROD
    SELECT '22-4-12 15%SCU'
    UNION ALL
    SELECT '14-4-8 W/F'
    UNION ALL
    SELECT 'UFLEXX 17-3-17 2%FE'
    UNION ALL
    SELECT 'FERT-PEST 19-19-19'
    
    SELECT dbo.fCNO(PROD_DESC), PROD_DESC
       FROM #TMPPROD
       WHERE dbo.fCNO(PROD_DESC) IS NOT NULL
    
    DROP TABLE #TMPPROD
    -PatP

  9. #9
    Join Date
    Jul 2003
    Location
    Michigan
    Posts
    1,941
    WOOHOO!
    Two working solutions.
    Thanks guys!
    Inspiration Through Fermentation

  10. #10
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    Just be careful because Peso's solution runs quicker than mine does, but it also returns SSNs like 123-45-6789 and United States phone numbers like 312-555-1212. I read about a guy having that problem in this book once!

    -PatP

  11. #11
    Join Date
    Jul 2003
    Location
    Michigan
    Posts
    1,941
    Yes, I also found it returns a few oddball items like "19-19-19F" or
    if there is a series of 5 or more numbers prior to the pattern. That is
    helpful in finding descriptions that need some maintenance, but ultimately
    I don't want that showing up in my reports.
    Inspiration Through Fermentation

  12. #12
    Join Date
    Apr 2007
    Posts
    183

    Another approach. Fool proof?

    Code:
    -- Create and stage a permanent valid patterns table
    CREATE TABLE	#Patterns
    		(
    			Pattern VARCHAR(12) PRIMARY KEY CLUSTERED
    		)
    
    INSERT		#Patterns
    		(
    			Pattern
    		)
    SELECT		'% ' + CONVERT(VARCHAR(2), v1.Number) + '-' + CONVERT(VARCHAR(2), v2.Number) + '-' + CONVERT(VARCHAR(2), v3.Number) + ' %' AS x
    FROM		master..spt_values AS v1
    CROSS JOIN	master..spt_values AS v2
    CROSS JOIN	master..spt_values AS v3
    WHERE		v1.Type = 'p'
    		AND v1.Number BETWEEN 0 AND 62
    		AND v2.Type = 'p'
    		AND v2.Number BETWEEN 0 AND 62
    		AND v3.Type = 'p'
    		AND v3.Number BETWEEN 0 AND 62
    
    -- Prepare sample data
    DECLARE	@TMPPROD TABLE (PROD_DESC VARCHAR(60))
    
    INSERT	@TMPPROD
    SELECT	'22-4-12 15%SCU' UNION ALL
    SELECT	'14-4-8 W/F' UNION ALL
    SELECT	'UFLEXX 17-3-17 2%FE' UNION ALL
    SELECT	'No-valid-data 17-17 x' UNION ALL
    SELECT	'FERT-PEST 19-19-19'
    
    -- Show the expected output
    SELECT		p.PROD_DESC,
    		PATINDEX(pat.Pattern, ' ' + p.PROD_DESC + ' ') AS Position
    FROM		#Patterns AS pat
    INNER JOIN	@TMPPROD AS p ON ' ' + p.PROD_DESC + ' ' LIKE pat.Pattern

  13. #13
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    That approach is better than the one that I posted with two caveats... First of all, it misses leading zeros which tend to appear relatively often in live CNO data although there aren't any in RedNeckGeeks sample data. As a side issue, I think that slows the selection process down quite a bit, so I think that will run a good bit slower.

    -PatP

  14. #14
    Join Date
    Apr 2007
    Posts
    183
    Add this you your approach and you are more set!
    2 combinations on three places gives 8 variations in total.

    Here is the eight

    WHEN @pArg LIKE '[0-9]-[0-9][0-9]-[0-9]%' THEN SubString(@pArg, 1, 6)

  15. #15
    Join Date
    Apr 2007
    Posts
    183
    Quote Originally Posted by Pat Phelan
    As a side issue, I think that slows the selection process down quite a bit, so I think that will run a good bit slower.
    Yes, it is slower.

    BUT.. It handles the 0-62 only condition...

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •