Results 1 to 7 of 7
  1. #1
    Join Date
    Feb 2004
    Location
    Hamburg, Germany
    Posts
    22

    Unanswered: validating sequence rules

    I am preparing data for processing on an MVS host.
    The host programs need the data in a specific order and sequence depending on record type. So I would like to check data validity before sending to host.

    I have several record types per entity,
    e.g.

    accnt_no, type_code
    100,10
    100,10
    100,30
    100,60
    200,30
    200,50
    200,50


    There is a set of rules for data validity:

    a) all accounts / record types must be returned in ascending order (easy using order by)
    b) sequence rules:
    -each account needs a 10-record
    -a 10-record can be followed by 10, 20, 30, 50, 80 record
    -a 20-record can be followed by 20, 30, 40 record
    -a 30 record can be followed by 50, 80 record
    and so on.

    I need an algorithm that detects invalid sequences (and maybe even outputs what the problem is).
    With the samples above,

    account 100 is invalid, because record type 30 is followed by 60
    account 200 is invalid, because record type 10 is missing

    Does anyone have a nifty solution to such a problem?

  2. #2
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    Hi

    Is this 2005+? I think dense rank * 10 partitioned by accnt_no and type_code could do it.
    Testimonial:
    pootle flump
    ur codings are working excelent.

  3. #3
    Join Date
    Feb 2004
    Location
    Hamburg, Germany
    Posts
    22
    Quote Originally Posted by pootle flump View Post
    Hi

    Is this 2005+? I think dense rank * 10 partitioned by accnt_no and type_code could do it.
    yes, it is 2008. My problem is not creating the type_code but verifying its validity based on the rules.

    Using:

    Code:
    CREATE TABLE [dbo].[mytest](
    	[accnt_no] decimal(10,0),
    	[type_code] varchar(2)
    	)
    GO
    
    INSERT INTO [dbo].[mytest] (accnt_no,type_code) VALUES (100,'10')
    INSERT INTO [dbo].[mytest] (accnt_no,type_code) VALUES (100,'10')
    INSERT INTO [dbo].[mytest] (accnt_no,type_code) VALUES (100,'30')
    INSERT INTO [dbo].[mytest] (accnt_no,type_code) VALUES (100,'60')
    INSERT INTO [dbo].[mytest] (accnt_no,type_code) VALUES (200,'30')
    INSERT INTO [dbo].[mytest] (accnt_no,type_code) VALUES (200,'50')
    INSERT INTO [dbo].[mytest] (accnt_no,type_code) VALUES (200,'50')
    GO
    
    SELECT accnt_no, type_code, 
      DENSE_RANK() OVER (PARTITION BY accnt_no ORDER BY type_code) * 10 as Seq
    FROM [dbo].[mytest]
    GO
    
    DROP TABLE [dbo].[mytest]
    GO
    returns:

    Code:
    accnt_no	type_code	Seq
    100	10	10
    100	10	10
    100	30	20
    100	60	30
    200	30	10
    200	50	20
    200	50	20
    I cannot see the benefit of creating the sequence for validation purposes.

    kbk

  4. #4
    Join Date
    May 2009
    Posts
    508
    Provided Answers: 1
    kbk, Is there a type in the list of valid '20-records'?
    b) sequence rules:
    -each account needs a 10-record
    -a 10-record can be followed by 10, 20, 30, 50, 80 record
    -a 20-record can be followed by 20, 30, 40 record
    -a 30 record can be followed by 50, 80 record
    and so on.
    Should it be "a 20-record can be followed by 20, 30, 50 record"?

    What does "and so on" represent?

    You have 5 (or 6 if 40 is valid) listed (10, 20, 30, 50, 80). Is that all the valid TYPE_CODE values? If not have many can you have

    Can there be any number lower than 10?

    It looks like you can have multiple '10' and '20' records. Are these the only ones or can you have multiple '30', '50', and '80' records?

    Just to be sure, you want a list of invalid ACCT_NO values for output (and maybe why it is invalid)?

  5. #5
    Join Date
    Feb 2004
    Location
    Hamburg, Germany
    Posts
    22
    Hi,
    I only posted a subset of all rules. In total there exist 8 types of records:

    10,20,30,40,50,60,70,80.

    I am gathering these records per account from different tables and bringing them into ascending order. Next I have to check if the sequence is valid with respect to the sequence rules.
    This is the full set of rules:

    Code:
     sequence rules:
    -each account needs a 10-record
    -a 10-record can be followed by 10 or 20 or 30 or 50 or 80 record
    -a 20-record can be followed by 20 or 30 or 40 record
    -a 30-record can be followed by 50 or 80 record
    -a 40-record can be followed by 80 record
    -a 50-record can be followed by 50 or 60 or 70 or 80 record
    -a 60-record can be followed by 60 or 80 record
    -a 70-record can be followed by 80 record
    -a 80-record can be followed by nothing
    So actually for each account, I have to test if sequences are correct,
    e.g. (10,10,20,20,20,30,80) is o.k. based on rules listed above
    (10,10,20,30,30,50) is not o.k. because I have two 30-records which is not allowed.

    From above: duplicates are only allowed for record types 10, 20, 50 and 60.

    Yes, I need to know for which accounts the sequence is invalid and if possible which rule it is breaking.

    Hope the problem is more clear now.
    kbk

  6. #6
    Join Date
    May 2009
    Posts
    508
    Provided Answers: 1
    kbk, here is something that may get you in the direction of what you want. I don't have all the rules defined but you should be able to add to them. There may be better ways to do this but this is the first that came to mind:
    Code:
    WITH CTE (ACNT, CODE) AS
      ( SELECT 100, 10 FROM SYSIBM.SYSDUMMY1 UNION ALL
        SELECT 100, 20 FROM SYSIBM.SYSDUMMY1 UNION ALL
        SELECT 100, 40 FROM SYSIBM.SYSDUMMY1 UNION ALL
        SELECT 100, 80 FROM SYSIBM.SYSDUMMY1 UNION ALL
      
        SELECT 200, 20 FROM SYSIBM.SYSDUMMY1 UNION ALL
        SELECT 200, 60 FROM SYSIBM.SYSDUMMY1 UNION ALL
    
        SELECT 300, 10 FROM SYSIBM.SYSDUMMY1 UNION ALL
        SELECT 300, 80 FROM SYSIBM.SYSDUMMY1 UNION ALL
        SELECT 300, 80 FROM SYSIBM.SYSDUMMY1 UNION ALL
    
        SELECT 400, 10 FROM SYSIBM.SYSDUMMY1 UNION ALL
        SELECT 400, 90 FROM SYSIBM.SYSDUMMY1 UNION ALL
    
        SELECT 500, 10 FROM SYSIBM.SYSDUMMY1 UNION ALL
        SELECT 500, 30 FROM SYSIBM.SYSDUMMY1 UNION ALL
        SELECT 500, 40 FROM SYSIBM.SYSDUMMY1
      )
      , SMRY AS 
      ( SELECT ACNT
            , SUM(CASE WHEN CODE = 10 THEN 1 ELSE 0 END) AS CODE10
            , SUM(CASE WHEN CODE = 20 THEN 1 ELSE 0 END) AS CODE20
            , SUM(CASE WHEN CODE = 30 THEN 1 ELSE 0 END) AS CODE30
            , SUM(CASE WHEN CODE = 40 THEN 1 ELSE 0 END) AS CODE40
            , SUM(CASE WHEN CODE = 50 THEN 1 ELSE 0 END) AS CODE50
            , SUM(CASE WHEN CODE = 60 THEN 1 ELSE 0 END) AS CODE60
            , SUM(CASE WHEN CODE = 70 THEN 1 ELSE 0 END) AS CODE70
            , SUM(CASE WHEN CODE = 80 THEN 1 ELSE 0 END) AS CODE80
            , SUM(CASE WHEN CODE NOT IN(10,20,30,40
                                       ,50,60,70,80)
                                      THEN 1 ELSE 0 END) AS BAD_CODE
        FROM CTE
         GROUP BY ACNT
      )
      , VALD AS
      ( SELECT ACNT
             , CASE WHEN CODE10 = 0 THEN 'Y' ELSE ' ' END AS CD10
             , CASE WHEN CODE30 > 1 THEN 'Y' ELSE ' ' END AS CD30
             , CASE WHEN CODE40 > 1 THEN 'Y' ELSE ' ' END AS CD40
             , CASE WHEN CODE70 > 1 THEN 'Y' ELSE ' ' END AS CD70
             , CASE WHEN CODE80 > 1 THEN 'Y' ELSE ' ' END AS CD80
             , CASE WHEN CODE40 > 0
                     AND CODE20 = 0 THEN 'Y' ELSE ' ' END AS CD20_CD40
             , CASE WHEN CODE60 > 0
                     AND CODE50 = 0 THEN 'Y' ELSE ' ' END AS CD50_CD60
             , CASE WHEN BAD_CODE > 0 THEN 'Y' ELSE ' ' END AS UNKN
        FROM SMRY
      )
    SELECT *
    FROM VALD
    WHERE CD10 = 'Y'
       OR CD30 = 'Y'
       OR CD40 = 'Y'
       OR CD70 = 'Y'
       OR CD80 = 'Y'
       OR CD20_CD40 = 'Y'
       OR CD50_CD60 = 'Y'
       OR UNKN = 'Y'
    The Common Table Expression takes the data in your table and pivots it to get one row per account and to count the various codes it contains. I call this SMRY (a summary of the various accounts / codes):
    Code:
    ACNT        CODE10      CODE20      CODE30      CODE40      CODE50      CODE60      CODE70      CODE80      BAD_CODE   
    ----------- ----------- ----------- ----------- ----------- ----------- ----------- ----------- ----------- -----------
            100           1           1           0           1           0           0           0           1           0
            200           0           1           0           0           0           1           0           0           0
            300           1           0           0           0           0           0           0           2           0
            400           1           0           0           0           0           0           0           0           1
            500           1           0           1           1           0           0           0           0           0
    The next set of SQL I call VALD. This is the query that does the actual validation of the various Accounts and Codes. I do NOT have all the various combinations you list or that you have have.

    CASE 1 checks to see if any Code 10 values are missing (CODE10 = 0.
    Case 2-4 check Codes 30, 40, 70, and 80 to see if there are more of each code (CODExx > 0).
    CASE 6 checks for and CODE40 that does not have a CODE20 (CODE40 > 0 AND CODE20 = 0)
    CASE 7 is the same as 6 except it checks CODE60 and CODE50.
    CASE 8 checks to see if there are any codes that are not valid codes.

    Once the Common Table Expression does all this work, the final query just filters out any row that does NOT contain a Y in any of the 'check' columns.
    Code:
    ACNT        CD10 CD30 CD40 CD70 CD80 CD20_CD40 CD50_CD60 UNKN
    ----------- ---- ---- ---- ---- ---- --------- --------- ----
            200 Y                                  Y             
            300                     Y                            
            400                                              Y   
            500                          Y
    From this report, you can tell that:

    Account 200 is invalid because it does not have a CODE10 ) and it has a Code 60 without a Code 50.
    Account 300 is invalid because CODE80 has a problem (more than one code 80).
    Account 400 is invalid because it has an Unknown code (code 90).
    Account 500 is invalid because there is a CODE40 without a CODE20

    Only Account 100 passed all the checks (I had).

    You can add as may CASE checks as you can think of (and make more meaningful column names) as you like.

  7. #7
    Join Date
    Feb 2004
    Location
    Hamburg, Germany
    Posts
    22

    Thumbs up

    Stealth_DBA, thanks very much. I've adapted your solution and it works like a charm.

    kbk

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •