If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > Data Access, Manipulation & Batch Languages > Unix Shell Scripts > Script to find records which are not matching

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 06-06-08, 17:28
jc2638 jc2638 is offline
Registered User
 
Join Date: Jun 2008
Posts: 6
Question Script to find records which are not matching

Here is the sample data

alex_data.txt

Bank ID|Acct_number
ID123|2245,3456,002245,1234
ID123|77778900000000
ID123|112244
ID123|4455,6666,0004455

CFJANDATA.txt

Bank ID|Acct_number
ID123|2245-- This Should not be in the missing number file
ID123|777789 -- This Should not be in the missing number file
ID123|112244-- This Should not be in the missing number file
ID123|6666--This should be in the missing file
ID123|888889--This should be in the missing file
ID123|667788--This should be in the missing file


I am trying to find missing account numbers in table B that are not in table A using the script below, but it's not giving me the missing results.

If you look at the data there are multiple account numbers, and also leading zeros

set -x
cp alex_data.txt tmp_file.txt
rm not_found.txt new_file.txt
cat CFJanData.txt| sed 's/\|/\ /g' | while read dino accnum ; do
grep -vi $dino\|$accnum\| tmp_file.txt >> new_file.txt
echo $dino'|'$accnum'|'`grep -c $dino\|$accnum\| alex_data.txt`'|' | grep \|0\| >> not_found.txt
mv new_file.txt tmp_file.txt
done
mv tmp_file.txt new_file.txt


Any suggestions will be helpful or a awlk command which can find me the missing account numbers

Thanks a lot in advance.
Reply With Quote
  #2 (permalink)  
Old 06-07-08, 06:38
mike_bike_kite mike_bike_kite is offline
vaguely human
 
Join Date: Jun 2007
Location: London
Posts: 2,519
Few quick questions:
  • Which is Table A and Table B?
  • Best to just list contents as you've done with alex_data.txt and then refer to it as alex_data.txt.
  • Is CFJANDATA.txt the input, the output or a mixture?
  • How many rows are in either table/file - are we talking 100s, 1000s or millions?
  • If it's in a database table then wouldn't it be easier to just do this in the database (assuming it's normalised properly).
Reply With Quote
  #3 (permalink)  
Old 06-07-08, 12:27
jc2638 jc2638 is offline
Registered User
 
Join Date: Jun 2008
Posts: 6
Hey Mike,

Thanks for your help!


I am trying to find missing account numbers in CFJANDATA.txt that are not in alex_data.txt.

If you look at the records in alex_data.txt, the file has some multiple records in a row. Also few of the records has msiing leading zeros

Ex: alex_data.txt
00022345
CFJANDATA.txt has the same record without missing zeros
22345

I am using Like condition in my sql but the above script does not returns exact match.


unload to missing_acct_numbers1.txt
select tempb.di_id, tempb.acct_number from tempb ,tempa where
tempb.di_id = tempa.di_id and
tempb.acct_number = tempa.acct_number
and tempb.acct_number NOTLIKE '%'||tempa.acct_number||'%'};

tempb = CFJANDATA.txt
tempa = alex_data.txt
Total records in each file should be betwen 5k -10k.
Reply With Quote
  #4 (permalink)  
Old 06-07-08, 13:47
mike_bike_kite mike_bike_kite is offline
vaguely human
 
Join Date: Jun 2007
Location: London
Posts: 2,519
Code should be:
Code:
#!/bin/sh

cat alex_data.txt | \
        sed 's/^.*|/,/' | \
        sed 's/$/,/' | \
        sed 's/,00*/,/' \
        > tmp.dat

for LINE in `cat CFJANDATA.txt`
do
        ACC=`echo $LINE | sed 's/^.*|0*//' | sed 's/ *$//'`
        ACC=",$ACC,"
        CNT=`grep -c $ACC tmp.dat`

        if test $CNT -eq 0
        then
                echo $LINE not found
        fi
done

exit 0
And it outputs
Code:
Bank not found
ID123|777789 not found
ID123|888889 not found
ID123|667788 not found
My results differ to yours because 6666 is in the file. Also 777789 is not in the file

Mike
Reply With Quote
  #5 (permalink)  
Old 06-09-08, 13:09
jc2638 jc2638 is offline
Registered User
 
Join Date: Jun 2008
Posts: 6
Hey Mike,

Thanks for the code. I tried running the code (Command jc.sh).
The script executed but the output was all the records from alex_data.txt

When I did a vi tmp.dat, all I found is below

,,
,,
,,
,,
,,

IS there something else I have to do before running the script.

Total records in CFJANDATA.txt =614
Total records in alex_data.txt = 12245

I am expecting only 80-100 records should not be found in tmp.data

Please advice

Appreciate your help.
Reply With Quote
  #6 (permalink)  
Old 06-09-08, 16:04
mike_bike_kite mike_bike_kite is offline
vaguely human
 
Join Date: Jun 2007
Location: London
Posts: 2,519
I copied and pasted the code off the site just to make sure there wasn't an issue in me typing in the code. There wasn't. I suspect the issue is somewhere on your end - almost certainly in the format of the alex_data file. Here are all the files I used etc and an example of the run.

This was the first data file
> cat alex_data.txt
Code:
Bank ID|Acct_number
ID123|2245,3456,002245,1234
ID123|77778900000000
ID123|112244
ID123|4455,6666,0004455
This was the 2nd data file
> cat CFJANDATA.txt
Code:
Bank ID|Acct_number
ID123|2245
ID123|777789
ID123|112244
ID123|6666
ID123|888889
ID123|667788
This was the script
> cat tmp.sh
Code:
#!/bin/sh

cat alex_data.txt | \
        sed 's/^.*|/,/' | \
        sed 's/$/,/' | \
        sed 's/,00*/,/' \
        > tmp.dat

for LINE in `cat CFJANDATA.txt`
do
        ACC=`echo $LINE | sed 's/^.*|0*//' | sed 's/ *$//'`
        ACC=",$ACC,"
        CNT=`grep -c $ACC tmp.dat`

        if test $CNT -eq 0
        then
                echo $LINE not found
        fi
done

exit
and this is what it produces
> ./tmp.sh
Code:
Bank not found
ID123|777789 not found
ID123|888889 not found
ID123|667788 not found
I guess your data isn't in the format you specified but to debug it is simple. Run the cat through the first sed command and view the output. Then run the cat through the first 2 sed commands and view the output. When it suddenly goes blank - you've found the issue! I assume you're familiar with the sed command.

Code:
cat alex_data.txt | sed 's/^.*|/,/'
then
Code:
cat alex_data.txt |  sed 's/^.*|/,/'  | sed 's/$/,/'
etc
Mike

Last edited by mike_bike_kite; 06-10-08 at 12:00.
Reply With Quote
  #7 (permalink)  
Old 06-11-08, 22:08
jc2638 jc2638 is offline
Registered User
 
Join Date: Jun 2008
Posts: 6
Thanks Mike! Your code was a great help!
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On