Results 1 to 4 of 4
  1. #1
    Join Date
    Jun 2006
    Posts
    4

    Unanswered: Join two files using AWK

    Two files need to be joined on a second column. File 1 is standard and it has only two columns. However, File 2 will be changing on the number of columns after 2nd column. One file can have 30 fields and other file has 80 fields. File 1 always will have two columns.

    File 1

    PHP Code:
    FRO     FRONTIER 
    MWC    MIDWEST 
    ORB     ORBITZ 
    ATR     AIRTRAN 
    BWI     BESTWESTERN 
    File 2

    PHP Code:
    M       AIRTRAN 8       TRIPS_YTD       A 
    M       BESTWESTERN 8       TRIPS_YTD       B       20      TRIPS_TOTAL     A       17 
    M       FRONTIER 8 
    M       CAESAR 8       TRIPS_YTD       D       10 
    M       ORBITZ 8       TRIPS 
    Desired Join Output:

    PHP Code:
    ATR       AIRTRAN 8       TRIPS_YTD       A 
    BWI       BESTWESTERN 8       TRIPS_YTD       B       20      TRIPS_TOTAL     A       17 
    FRO       FRONTIER 8 
    ORB       ORBITZ 8       TRIPS 
    Please let me know if there is a way to do this using NAWK...

  2. #2
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    This smells like homework to me...

    Yes, it can certainly be done using NAWK. Read the first file into a hash (a NAWK array) in the BEGIN block, then do a hash lookup to replace the first field of each line in the second file with the hashed lookup from the original file.

    -PatP

  3. #3
    Join Date
    Jun 2006
    Posts
    4
    Thank you Pat...

    This is definitely not a homework problem....I work for a bank and we have to do something like this....Since, I am not an expert in awk, I was trying to achieve it through something like this...

    Code:
    #!/usr/bin/ksh
    
    DIR=$1
    CODES_FILE=$2
    METADATA_FILE=$3
    
    cd ${DIR}
    
    sort -k 2,2 ${CODES_FILE} > f1.dat
    sort -k 2,2 ${METADATA_FILE} > f2.dat
    
    awk '
    BEGIN { OFS = "\t" } { print $2 , $1, $2 }
    ' f1.dat > temp_f1.dat
    
    awk '
    BEGIN { OFS = "\t" } { print $2 , $2, $3 ,$4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17, $18, $19, $20, $21, $22, $23, $24, $25, $26, $27  }
    ' f2.dat > temp_f2.dat
    
    join  -j1 1 -j2 1 -o 1.2 1.3 2.3 2.4 2.5 2.6 2.7 2.8 2.9 2.10 2.11 2.12 2.13 2.14 2.15 2.16 2.17 2.18 2.19 2.20 2.21 2.22 2.23 2.24 2.25 2.26 2.27 temp_f1.da
    t temp_f2.dat | tr -s ' ' '\t' > f3.dat
    The reason why I didn't like this is because File2 has dynamically changing columns. The current code takes only 27 fields and if I have 80 fields, then it would be a problem.

    File 1 is always constant and has only 2 columns.

    Please advice

  4. #4
    Join Date
    Jun 2006
    Posts
    4
    This is what I did....

    Code:
    nawk '
      NR==FNR{
        a[$2]=$1;
        next
     }a[$2]&&sub($1,a[$2])' ${CODES_FILE} ${METADATA_FILE} > f3.dat

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •