Results 1 to 7 of 7
  1. #1
    Join Date
    Jun 2008
    Posts
    6

    Question Unanswered: Script to find records which are not matching

    Here is the sample data

    alex_data.txt

    Bank ID|Acct_number
    ID123|2245,3456,002245,1234
    ID123|77778900000000
    ID123|112244
    ID123|4455,6666,0004455

    CFJANDATA.txt

    Bank ID|Acct_number
    ID123|2245-- This Should not be in the missing number file
    ID123|777789 -- This Should not be in the missing number file
    ID123|112244-- This Should not be in the missing number file
    ID123|6666--This should be in the missing file
    ID123|888889--This should be in the missing file
    ID123|667788--This should be in the missing file


    I am trying to find missing account numbers in table B that are not in table A using the script below, but it's not giving me the missing results.

    If you look at the data there are multiple account numbers, and also leading zeros

    set -x
    cp alex_data.txt tmp_file.txt
    rm not_found.txt new_file.txt
    cat CFJanData.txt| sed 's/\|/\ /g' | while read dino accnum ; do
    grep -vi $dino\|$accnum\| tmp_file.txt >> new_file.txt
    echo $dino'|'$accnum'|'`grep -c $dino\|$accnum\| alex_data.txt`'|' | grep \|0\| >> not_found.txt
    mv new_file.txt tmp_file.txt
    done
    mv tmp_file.txt new_file.txt


    Any suggestions will be helpful or a awlk command which can find me the missing account numbers

    Thanks a lot in advance.

  2. #2
    Join Date
    Jun 2007
    Location
    London
    Posts
    2,527
    Few quick questions:
    • Which is Table A and Table B?
    • Best to just list contents as you've done with alex_data.txt and then refer to it as alex_data.txt.
    • Is CFJANDATA.txt the input, the output or a mixture?
    • How many rows are in either table/file - are we talking 100s, 1000s or millions?
    • If it's in a database table then wouldn't it be easier to just do this in the database (assuming it's normalised properly).

  3. #3
    Join Date
    Jun 2008
    Posts
    6
    Hey Mike,

    Thanks for your help!


    I am trying to find missing account numbers in CFJANDATA.txt that are not in alex_data.txt.

    If you look at the records in alex_data.txt, the file has some multiple records in a row. Also few of the records has msiing leading zeros

    Ex: alex_data.txt
    00022345
    CFJANDATA.txt has the same record without missing zeros
    22345

    I am using Like condition in my sql but the above script does not returns exact match.


    unload to missing_acct_numbers1.txt
    select tempb.di_id, tempb.acct_number from tempb ,tempa where
    tempb.di_id = tempa.di_id and
    tempb.acct_number = tempa.acct_number
    and tempb.acct_number NOTLIKE '%'||tempa.acct_number||'%'};

    tempb = CFJANDATA.txt
    tempa = alex_data.txt
    Total records in each file should be betwen 5k -10k.

  4. #4
    Join Date
    Jun 2007
    Location
    London
    Posts
    2,527
    Code should be:
    Code:
    #!/bin/sh
    
    cat alex_data.txt | \
            sed 's/^.*|/,/' | \
            sed 's/$/,/' | \
            sed 's/,00*/,/' \
            > tmp.dat
    
    for LINE in `cat CFJANDATA.txt`
    do
            ACC=`echo $LINE | sed 's/^.*|0*//' | sed 's/ *$//'`
            ACC=",$ACC,"
            CNT=`grep -c $ACC tmp.dat`
    
            if test $CNT -eq 0
            then
                    echo $LINE not found
            fi
    done
    
    exit 0
    And it outputs
    Code:
    Bank not found
    ID123|777789 not found
    ID123|888889 not found
    ID123|667788 not found
    My results differ to yours because 6666 is in the file. Also 777789 is not in the file

    Mike

  5. #5
    Join Date
    Jun 2008
    Posts
    6
    Hey Mike,

    Thanks for the code. I tried running the code (Command jc.sh).
    The script executed but the output was all the records from alex_data.txt

    When I did a vi tmp.dat, all I found is below

    ,,
    ,,
    ,,
    ,,
    ,,

    IS there something else I have to do before running the script.

    Total records in CFJANDATA.txt =614
    Total records in alex_data.txt = 12245

    I am expecting only 80-100 records should not be found in tmp.data

    Please advice

    Appreciate your help.

  6. #6
    Join Date
    Jun 2007
    Location
    London
    Posts
    2,527
    I copied and pasted the code off the site just to make sure there wasn't an issue in me typing in the code. There wasn't. I suspect the issue is somewhere on your end - almost certainly in the format of the alex_data file. Here are all the files I used etc and an example of the run.

    This was the first data file
    > cat alex_data.txt
    Code:
    Bank ID|Acct_number
    ID123|2245,3456,002245,1234
    ID123|77778900000000
    ID123|112244
    ID123|4455,6666,0004455
    This was the 2nd data file
    > cat CFJANDATA.txt
    Code:
    Bank ID|Acct_number
    ID123|2245
    ID123|777789
    ID123|112244
    ID123|6666
    ID123|888889
    ID123|667788
    This was the script
    > cat tmp.sh
    Code:
    #!/bin/sh
    
    cat alex_data.txt | \
            sed 's/^.*|/,/' | \
            sed 's/$/,/' | \
            sed 's/,00*/,/' \
            > tmp.dat
    
    for LINE in `cat CFJANDATA.txt`
    do
            ACC=`echo $LINE | sed 's/^.*|0*//' | sed 's/ *$//'`
            ACC=",$ACC,"
            CNT=`grep -c $ACC tmp.dat`
    
            if test $CNT -eq 0
            then
                    echo $LINE not found
            fi
    done
    
    exit
    and this is what it produces
    > ./tmp.sh
    Code:
    Bank not found
    ID123|777789 not found
    ID123|888889 not found
    ID123|667788 not found
    I guess your data isn't in the format you specified but to debug it is simple. Run the cat through the first sed command and view the output. Then run the cat through the first 2 sed commands and view the output. When it suddenly goes blank - you've found the issue! I assume you're familiar with the sed command.

    Code:
    cat alex_data.txt | sed 's/^.*|/,/'
    then
    Code:
    cat alex_data.txt |  sed 's/^.*|/,/'  | sed 's/$/,/'
    etc
    Mike
    Last edited by mike_bike_kite; 06-10-08 at 13:00.

  7. #7
    Join Date
    Jun 2008
    Posts
    6
    Thanks Mike! Your code was a great help!

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •