Results 1 to 14 of 14

Thread: File Conversion

  1. #1
    Join Date
    Jun 2004
    Posts
    8

    Unanswered: File Conversion

    I have a datafile with 'n' rows and m columns which are of fixed width.
    i do have a data specification file with the length of each column.
    The file do have a header and trailer.
    I have to eliminate the header and footer and also i need the file to be delimited by comma.

    Any help would be greatly apprecited.

    Datafile
    EX:

    a b c
    --- ----- ----
    111 TX Texas
    222 CA California

    2 rows selected

    File Specification:

    col1(1,3)
    col2(4,6)
    col3(8,25)

    the specification file can be either way
    col1(3)
    col2(2)
    col3(20)

    I need to write a generalised script for the above 2 conditions seperately.

    whatever the file may be the output should be

    111,tx,texas
    222,ca,california

    Thanks in advance.

  2. #2
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    Perl or AWK could do this nicely. Did the teacher give you any insight into which they'd prefer for this assignment?

    -PatP

  3. #3
    Join Date
    Jun 2004
    Posts
    8

    File conversion

    Hi,

    I am supposed to do that in awk.
    and also i am not that good in awk programming,i knew only the basic unix commands.

  4. #4
    Join Date
    Oct 2003
    Location
    Germany
    Posts
    138
    Quote Originally Posted by Sachin9
    I have a datafile with 'n' rows and m columns which are of fixed width.
    i do have a data specification file with the length of each column.
    The file do have a header and trailer.
    I have to eliminate the header and footer and also i need the file to be delimited by comma.

    Any help would be greatly apprecited.

    Datafile
    EX:

    a b c
    --- ----- ----
    111 TX Texas
    222 CA California

    2 rows selected

    File Specification:

    col1(1,3)
    col2(4,6)
    col3(8,25)

    the specification file can be either way
    col1(3)
    col2(2)
    col3(20)

    I need to write a generalised script for the above 2 conditions seperately.

    whatever the file may be the output should be

    111,tx,texas
    222,ca,california

    Thanks in advance.
    Hi,
    I thing I can help you in awk. But I have some questions.
    1. Is the number of cols always the same (col1 - col3) ???
    2. Between col1..2..3 ist there allways a blank ??
    3. Can it happen that inside the colls a blank is written ???
    Greetings from germany
    Peter F.

  5. #5
    Join Date
    Jun 2004
    Posts
    8
    Quote Originally Posted by fla5do
    Hi,
    I thing I can help you in awk. But I have some questions.
    1. Is the number of cols always the same (col1 - col3) ???
    2. Between col1..2..3 ist there allways a blank ??
    3. Can it happen that inside the colls a blank is written ???
    Thank u very much.

    No,the no. of columns will not always be the same,there could be more columns also.
    Yeah there is always a blank in between the cols.but the the columns are fixed width format(length of the col size).
    EX:
    Col1 col2 col3
    col1 can be 1 character,col2 can be 10 characters,col3 can be 20 characters
    but there will be blanks in between the cols.
    Yeah,there can be spaces in the columns values,in that case we need to leave a blank(i.e the length of the column )
    EX:if the length of col10 is 20 and if it is blank the file should look like
    col9,20 spaces,col11

    Once again thanks in advance.

  6. #6
    Join Date
    Oct 2003
    Location
    Germany
    Posts
    138
    Quote Originally Posted by Sachin9
    Thank u very much.

    No,the no. of columns will not always be the same,there could be more columns also.
    Yeah there is always a blank in between the cols.but the the columns are fixed width format(length of the col size).
    EX:
    Col1 col2 col3
    col1 can be 1 character,col2 can be 10 characters,col3 can be 20 characters
    but there will be blanks in between the cols.
    Yeah,there can be spaces in the columns values,in that case we need to leave a blank(i.e the length of the column )
    EX:if the length of col10 is 20 and if it is blank the file should look like
    col9,20 spaces,col11

    Once again thanks in advance.
    no idea for this specification :-) ???
    buy
    Greetings from germany
    Peter F.

  7. #7
    Join Date
    Apr 2004
    Location
    Boston, MA
    Posts
    325

    Wink

    I thought I've already posted on a similar thread. What is this - more than 1 person taking the same class?

    # columns with widths of 3, 2 and chars
    # change the widths of your columns on the invokation line

    nawk -v FIELDWIDTHS='3 2 20' -f sachin19.awk file2convert

    here's the content for achin19.awk'

    Code:
    function setFieldsByWidth(   i,n,FWS,start,copyd0) {
    # Licensed under GPL Peter S Tillier, 2003
    # NB corrupts $0
      copyd0 = $0                             # make copy of $0 to work on
      if (length(FIELDWIDTHS) == 0) {
        print "You need to set the width of the fields that you require" > "/dev/stderr"
        print "in the variable FIELDWIDTHS (NB: Upper case!)" > "/dev/stderr"
        exit(1)
      }
    
      if (!match(FIELDWIDTHS,/^[0-9 ]+$/)) {
        print "The variable FIELDWIDTHS must contain digits, separated" > "/dev/stderr"
        print "by spaces." > "/dev/stderr"
        exit(1)
      }
    
      n = split(FIELDWIDTHS,FWS)
    
      if (n == 1) {
        print "Warning: FIELDWIDTHS contains only one field width." > "/dev/stderr"
        print "Attempting to continue." > "/dev/stderr"
      }
    
      start = 1
      for (i=1; i <= n; i++) {
        $i = trim(substr(copyd0,start,FWS[i]))
        start = start + FWS[i]+1
      }
      return n;
    }
    
    # Note that the "/dev/stderr" entries in some lines have wrapped.
    #
    # I then call setFieldsByWidth() in my main awk code as follows:
    
    function trim(str)
    {
      sub("^[ ]*", "", str);
      sub("[ ]*$", "", str);
      return str;
    }
    
    BEGIN {
      OFS=","
    }
    !/^[  ]*$/ && FNR > 2 && !/record\(s\) selected./ {
      saveDollarZero = $0 # if you want it later
      numFields = setFieldsByWidth()
      # now we can manipulate $0, NF and $1 .. $NF as we wish
      for(i=1; i <= numFields; i++)
         printf("%s%s", $i, (i != numFields) ? OFS : ORS);
    }
    The code for specifying widths in the configuration file is left as exercise for the person taking the class.
    vlad
    +-----------------------+
    | #include <disclaimer.h> |
    +-----------------------+

  8. #8
    Join Date
    Jun 2004
    Posts
    8
    Hi,

    what does /dev/stderr mean.

    i appreciate ur help.

  9. #9
    Join Date
    Apr 2004
    Location
    Boston, MA
    Posts
    325
    "/dev/stderr" - standard error

    you can sabstitute: > "/dev/stderr"

    with

    | "cat 1>&2"
    vlad
    +-----------------------+
    | #include <disclaimer.h> |
    +-----------------------+

  10. #10
    Join Date
    Jun 2004
    Posts
    8
    awk -v FIELDWIDTHS='2 20' -f vg.awk state
    syntax error The source line is 9. The function is setFieldsByWidth.
    The error context is
    print "You need to set the width of the fields that you require" > >>> | <<< "cat 1>&2"
    awk: The statement cannot be correctly parsed.
    The source line is 9. The function is setFieldsByWidth.
    syntax error The source line is 10. The function is setFieldsByWidth.

    This is the error i am getting when i am trying to run ur program.

    Thanks in advance

  11. #11
    Join Date
    Apr 2004
    Location
    Boston, MA
    Posts
    325
    I've modified the code incorrectly.
    Here's the modified version:

    Code:
    function setFieldsByWidth(   i,n,FWS,start,copyd0) {
    # Licensed under GPL Peter S Tillier, 2003
    # NB corrupts $0
      copyd0 = $0                             # make copy of $0 to work on
      if (length(FIELDWIDTHS) == 0) {
        print "You need to set the width of the fields that you require" | stderr
        print "in the variable FIELDWIDTHS (NB: Upper case!)" | stderr
        exit(1)
      }
    
      if (!match(FIELDWIDTHS,/^[0-9 ]+$/)) {
        print "The variable FIELDWIDTHS must contain digits, separated" | stderr
        print "by spaces." | stderr
        exit(1)
      }
    
      n = split(FIELDWIDTHS,FWS)
    
      if (n == 1) {
        print "Warning: FIELDWIDTHS contains only one field width." | stderr
        print "Attempting to continue." | stderr
      }
    
      start = 1
      for (i=1; i <= n; i++) {
        $i = trim(substr(copyd0,start,FWS[i]))
        start = start + FWS[i]+1
      }
      return n;
    }
    
    # Note that the "/dev/stderr" entries in some lines have wrapped.
    #
    # I then call setFieldsByWidth() in my main awk code as follows:
    
    function trim(str)
    {
      sub("^[ ]*", "", str);
      sub("[ ]*$", "", str);
      return str;
    }
    
    BEGIN {
      OFS=","
      stderr="cat 1>&2"
    }
    !/^[  ]*$/ && FNR > 2 && !/record\(s\) selected./ {
      saveDollarZero = $0 # if you want it later
      numFields = setFieldsByWidth()
      # now we can manipulate $0, NF and $1 .. $NF as we wish
      for(i=1; i <= numFields; i++)
         printf("%s%s", $i, (i != numFields) ? OFS : ORS);
    }
    vlad
    +-----------------------+
    | #include <disclaimer.h> |
    +-----------------------+

  12. #12
    Join Date
    Jun 2004
    Posts
    8
    cheers vgersh,
    the code is perfect.
    Thanks a million for ur help.

  13. #13
    Join Date
    Jun 2004
    Posts
    8
    Hi vgersh,
    Does ur code work for only 3 columns or is it a generalise one.

  14. #14
    Join Date
    Apr 2004
    Location
    Boston, MA
    Posts
    325
    what do you think?
    try it.
    vlad
    +-----------------------+
    | #include <disclaimer.h> |
    +-----------------------+

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •