Hi,
Your suggestion of doing it in PERL is a good one since I wonder if there's a (series of) UNIX command(s) to perform what you want. But if you want to do it with a UNIX native tool and write as less code as possible consider AWK as the preferred alternative. AWK is developed (before PERL) to perform exactly tasks like this.
Consider the following files:
namefile
Quote:
Donald Duck
Mickey Mouse
Bugs Bunny
Peter Pan
Popeye the Sailorman
George Bush
|
insfile
Quote:
Name,Col1,Col2,Col3,Col4
Donald,aaa,bbb,ccc,ddd
Harry,aaa,bbb,ccc,ddd
George,aaa,bbb,ccc,ddd
Mickey,aaa,bbb,ccc,ddd
Popeye,aaa,bbb,ccc,ddd
Peter,aaa,bbb,ccc,ddd
|
These can be merged in the specified way by the program:
Code:
awk 'BEGIN {
split(ARGV[1], file, "=")
while(getline < file[2] > 0)
a[$1] = substr($0, index($0,$2))
FS = ","
}
{
if(NR == 1)
printf("%s,Surname,%s,%s,%s,%s\n",
$1, $2, $3, $4, $5)
else
printf("%s,%s,%s,%s,%s,%s\n",
$1, surname(), $2, $3, $4, $5)
}
function surname()
{
for (i in a)
if(i == $1)
return a[i]
return ""
}' f=namefile insfile
Comments:
The AWK program is invoked with the arguments:
f=namefile -> By this AWK doesn't concern "namefile" as an inputfile that has to be processed but "f" as a variable with the value "namefile".
insfile -> is the inputfile that has to be processed in the action section (between the second set of braces).
BEGIN section:
First the first argument (f=namefile) is split by the "=" character into an array named "file". On most systems this is the way to get the value of the variable "f" because scope of the variable is excluded to the action section only; it's not available in the BEGIN or END section. On some systems ARGV[1] contains only the value of "f", then there's no need for the split function.
The getline function reads the file with the filename that's stored in the second member of the "file" array until there's no input left (when EOF is reached getline returns 0).
With getline the builtin variables for the current record ($0) and subsequent fields in that record ($1, $2, etc.) are set. So with each record the associative array "a" can be subsequently filled with the part of the current record from the second field on (substr() function) and indexed with the content of the first field.
Untill now the default fieldsplitter (space or tab) is used, but the insfile has comma delimited fields so the fieldseparator is redefined with FS = ",".
I think the rest is quite clear. Except maybe for the function surname, where the first field of the currently processed insfile record is compared to the idexes of the associative array a. When corresponding the function returns the array member, which is the surname from the namefile.
In PERL it's done in a similar way I guess except for the builtin variables $0, $1, etc. which are not available.
Regards.