Results 1 to 6 of 6
  1. #1
    Join Date
    Jul 2005

    Unanswered: file / insert manipulation


    I need some help on inserting. I've got a file containing X amount of columns, each column has a description/title at the top, followed by info below. I need to insert an additional column (including description) in between columns "a" and "b". The column i'm inserting has to align with column "a" . And, this has to be done with out changing the order of any other columns. For instance, column "a" has a title of first names, below that title is a list of names. The column I'm inserting would be last names. So, I would have to search column a, grab all first names, check each first name with another file to see what last name matches the first. And, insert the last names next to each first name in a new column. I also need to include the description/title at the top of this column. Can anyone help me out with this?


  2. #2
    Join Date
    Jun 2003
    West Palm Beach, FL


    And you intend to do this using what language?

    Why don't you just load the data into a spreadsheet?

    The person who says it can't be done should not interrupt the person doing it. -- Chinese proverb

  3. #3
    Join Date
    Jul 2005
    The data will be loaded into a spreadsheet after the changes have been made on the unix side. As far as language, I'm electing to do this with a shell script. If not, then I'll probably go with perl. I'm open to suggestions.

  4. #4
    Join Date
    May 2005
    South Africa
    Provided Answers: 1
    maybe the join command might help

  5. #5
    Join Date
    Aug 2006
    The Netherlands
    Your suggestion of doing it in PERL is a good one since I wonder if there's a (series of) UNIX command(s) to perform what you want. But if you want to do it with a UNIX native tool and write as less code as possible consider AWK as the preferred alternative. AWK is developed (before PERL) to perform exactly tasks like this.

    Consider the following files:
    Donald Duck
    Mickey Mouse
    Bugs Bunny
    Peter Pan
    Popeye the Sailorman
    George Bush
    These can be merged in the specified way by the program:
    awk 'BEGIN  {
                    split(ARGV[1], file, "=")
                    while(getline < file[2] > 0)
                        a[$1] = substr($0, index($0,$2))
                    FS = ","
                    if(NR == 1)
    				$1, $2, $3, $4, $5)
    				$1, surname(), $2, $3, $4, $5)
                function surname()
                    for (i in a)
                        if(i == $1)
                            return a[i]
                    return ""
                }' f=namefile insfile
    The AWK program is invoked with the arguments:
    f=namefile -> By this AWK doesn't concern "namefile" as an inputfile that has to be processed but "f" as a variable with the value "namefile".
    insfile -> is the inputfile that has to be processed in the action section (between the second set of braces).

    BEGIN section:
    First the first argument (f=namefile) is split by the "=" character into an array named "file". On most systems this is the way to get the value of the variable "f" because scope of the variable is excluded to the action section only; it's not available in the BEGIN or END section. On some systems ARGV[1] contains only the value of "f", then there's no need for the split function.

    The getline function reads the file with the filename that's stored in the second member of the "file" array until there's no input left (when EOF is reached getline returns 0).
    With getline the builtin variables for the current record ($0) and subsequent fields in that record ($1, $2, etc.) are set. So with each record the associative array "a" can be subsequently filled with the part of the current record from the second field on (substr() function) and indexed with the content of the first field.
    Untill now the default fieldsplitter (space or tab) is used, but the insfile has comma delimited fields so the fieldseparator is redefined with FS = ",".

    I think the rest is quite clear. Except maybe for the function surname, where the first field of the currently processed insfile record is compared to the idexes of the associative array a. When corresponding the function returns the array member, which is the surname from the namefile.

    In PERL it's done in a similar way I guess except for the builtin variables $0, $1, etc. which are not available.


  6. #6
    Join Date
    Jul 2005
    Thanks Tyveleyn! I just sent you a PM.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts