If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > Data Access, Manipulation & Batch Languages > Unix Shell Scripts > File Conversion

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 06-24-04, 11:09
Sachin9 Sachin9 is offline
Registered User
 
Join Date: Jun 2004
Posts: 8
File Conversion

I have a datafile with 'n' rows and m columns which are of fixed width.
i do have a data specification file with the length of each column.
The file do have a header and trailer.
I have to eliminate the header and footer and also i need the file to be delimited by comma.

Any help would be greatly apprecited.

Datafile
EX:

a b c
--- ----- ----
111 TX Texas
222 CA California

2 rows selected

File Specification:

col1(1,3)
col2(4,6)
col3(8,25)

the specification file can be either way
col1(3)
col2(2)
col3(20)

I need to write a generalised script for the above 2 conditions seperately.

whatever the file may be the output should be

111,tx,texas
222,ca,california

Thanks in advance.
Reply With Quote
  #2 (permalink)  
Old 06-24-04, 12:10
Pat Phelan Pat Phelan is online now
Resident Curmudgeon
 
Join Date: Feb 2004
Location: In front of the computer
Posts: 12,613
Perl or AWK could do this nicely. Did the teacher give you any insight into which they'd prefer for this assignment?

-PatP
Reply With Quote
  #3 (permalink)  
Old 06-24-04, 12:14
Sachin9 Sachin9 is offline
Registered User
 
Join Date: Jun 2004
Posts: 8
File conversion

Hi,

I am supposed to do that in awk.
and also i am not that good in awk programming,i knew only the basic unix commands.
Reply With Quote
  #4 (permalink)  
Old 06-24-04, 14:49
fla5do fla5do is offline
Registered User
 
Join Date: Oct 2003
Location: Germany
Posts: 138
Quote:
Originally Posted by Sachin9
I have a datafile with 'n' rows and m columns which are of fixed width.
i do have a data specification file with the length of each column.
The file do have a header and trailer.
I have to eliminate the header and footer and also i need the file to be delimited by comma.

Any help would be greatly apprecited.

Datafile
EX:

a b c
--- ----- ----
111 TX Texas
222 CA California

2 rows selected

File Specification:

col1(1,3)
col2(4,6)
col3(8,25)

the specification file can be either way
col1(3)
col2(2)
col3(20)

I need to write a generalised script for the above 2 conditions seperately.

whatever the file may be the output should be

111,tx,texas
222,ca,california

Thanks in advance.
Hi,
I thing I can help you in awk. But I have some questions.
1. Is the number of cols always the same (col1 - col3) ???
2. Between col1..2..3 ist there allways a blank ??
3. Can it happen that inside the colls a blank is written ???
__________________
Greetings from germany
Peter F.
Reply With Quote
  #5 (permalink)  
Old 06-24-04, 15:18
Sachin9 Sachin9 is offline
Registered User
 
Join Date: Jun 2004
Posts: 8
Quote:
Originally Posted by fla5do
Hi,
I thing I can help you in awk. But I have some questions.
1. Is the number of cols always the same (col1 - col3) ???
2. Between col1..2..3 ist there allways a blank ??
3. Can it happen that inside the colls a blank is written ???
Thank u very much.

No,the no. of columns will not always be the same,there could be more columns also.
Yeah there is always a blank in between the cols.but the the columns are fixed width format(length of the col size).
EX:
Col1 col2 col3
col1 can be 1 character,col2 can be 10 characters,col3 can be 20 characters
but there will be blanks in between the cols.
Yeah,there can be spaces in the columns values,in that case we need to leave a blank(i.e the length of the column )
EX:if the length of col10 is 20 and if it is blank the file should look like
col9,20 spaces,col11

Once again thanks in advance.
Reply With Quote
  #6 (permalink)  
Old 06-24-04, 17:17
fla5do fla5do is offline
Registered User
 
Join Date: Oct 2003
Location: Germany
Posts: 138
Quote:
Originally Posted by Sachin9
Thank u very much.

No,the no. of columns will not always be the same,there could be more columns also.
Yeah there is always a blank in between the cols.but the the columns are fixed width format(length of the col size).
EX:
Col1 col2 col3
col1 can be 1 character,col2 can be 10 characters,col3 can be 20 characters
but there will be blanks in between the cols.
Yeah,there can be spaces in the columns values,in that case we need to leave a blank(i.e the length of the column )
EX:if the length of col10 is 20 and if it is blank the file should look like
col9,20 spaces,col11

Once again thanks in advance.
no idea for this specification :-) ???
buy
__________________
Greetings from germany
Peter F.
Reply With Quote
  #7 (permalink)  
Old 06-24-04, 18:00
vgersh99 vgersh99 is offline
Registered User
 
Join Date: Apr 2004
Location: Boston, MA
Posts: 325
Wink

I thought I've already posted on a similar thread. What is this - more than 1 person taking the same class?

# columns with widths of 3, 2 and chars
# change the widths of your columns on the invokation line

nawk -v FIELDWIDTHS='3 2 20' -f sachin19.awk file2convert

here's the content for achin19.awk'

Code:
function setFieldsByWidth(   i,n,FWS,start,copyd0) {
# Licensed under GPL Peter S Tillier, 2003
# NB corrupts $0
  copyd0 = $0                             # make copy of $0 to work on
  if (length(FIELDWIDTHS) == 0) {
    print "You need to set the width of the fields that you require" > "/dev/stderr"
    print "in the variable FIELDWIDTHS (NB: Upper case!)" > "/dev/stderr"
    exit(1)
  }

  if (!match(FIELDWIDTHS,/^[0-9 ]+$/)) {
    print "The variable FIELDWIDTHS must contain digits, separated" > "/dev/stderr"
    print "by spaces." > "/dev/stderr"
    exit(1)
  }

  n = split(FIELDWIDTHS,FWS)

  if (n == 1) {
    print "Warning: FIELDWIDTHS contains only one field width." > "/dev/stderr"
    print "Attempting to continue." > "/dev/stderr"
  }

  start = 1
  for (i=1; i <= n; i++) {
    $i = trim(substr(copyd0,start,FWS[i]))
    start = start + FWS[i]+1
  }
  return n;
}

# Note that the "/dev/stderr" entries in some lines have wrapped.
#
# I then call setFieldsByWidth() in my main awk code as follows:

function trim(str)
{
  sub("^[ ]*", "", str);
  sub("[ ]*$", "", str);
  return str;
}

BEGIN {
  OFS=","
}
!/^[  ]*$/ && FNR > 2 && !/record\(s\) selected./ {
  saveDollarZero = $0 # if you want it later
  numFields = setFieldsByWidth()
  # now we can manipulate $0, NF and $1 .. $NF as we wish
  for(i=1; i <= numFields; i++)
     printf("%s%s", $i, (i != numFields) ? OFS : ORS);
}
The code for specifying widths in the configuration file is left as exercise for the person taking the class.
__________________
vlad
+-----------------------+
| #include <disclaimer.h> |
+-----------------------+
Reply With Quote
  #8 (permalink)  
Old 06-25-04, 14:36
Sachin9 Sachin9 is offline
Registered User
 
Join Date: Jun 2004
Posts: 8
Hi,

what does /dev/stderr mean.

i appreciate ur help.
Reply With Quote
  #9 (permalink)  
Old 06-25-04, 14:49
vgersh99 vgersh99 is offline
Registered User
 
Join Date: Apr 2004
Location: Boston, MA
Posts: 325
"/dev/stderr" - standard error

you can sabstitute: > "/dev/stderr"

with

| "cat 1>&2"
__________________
vlad
+-----------------------+
| #include <disclaimer.h> |
+-----------------------+
Reply With Quote
  #10 (permalink)  
Old 06-25-04, 15:32
Sachin9 Sachin9 is offline
Registered User
 
Join Date: Jun 2004
Posts: 8
awk -v FIELDWIDTHS='2 20' -f vg.awk state
syntax error The source line is 9. The function is setFieldsByWidth.
The error context is
print "You need to set the width of the fields that you require" > >>> | <<< "cat 1>&2"
awk: The statement cannot be correctly parsed.
The source line is 9. The function is setFieldsByWidth.
syntax error The source line is 10. The function is setFieldsByWidth.

This is the error i am getting when i am trying to run ur program.

Thanks in advance
Reply With Quote
  #11 (permalink)  
Old 06-25-04, 15:41
vgersh99 vgersh99 is offline
Registered User
 
Join Date: Apr 2004
Location: Boston, MA
Posts: 325
I've modified the code incorrectly.
Here's the modified version:

Code:
function setFieldsByWidth(   i,n,FWS,start,copyd0) {
# Licensed under GPL Peter S Tillier, 2003
# NB corrupts $0
  copyd0 = $0                             # make copy of $0 to work on
  if (length(FIELDWIDTHS) == 0) {
    print "You need to set the width of the fields that you require" | stderr
    print "in the variable FIELDWIDTHS (NB: Upper case!)" | stderr
    exit(1)
  }

  if (!match(FIELDWIDTHS,/^[0-9 ]+$/)) {
    print "The variable FIELDWIDTHS must contain digits, separated" | stderr
    print "by spaces." | stderr
    exit(1)
  }

  n = split(FIELDWIDTHS,FWS)

  if (n == 1) {
    print "Warning: FIELDWIDTHS contains only one field width." | stderr
    print "Attempting to continue." | stderr
  }

  start = 1
  for (i=1; i <= n; i++) {
    $i = trim(substr(copyd0,start,FWS[i]))
    start = start + FWS[i]+1
  }
  return n;
}

# Note that the "/dev/stderr" entries in some lines have wrapped.
#
# I then call setFieldsByWidth() in my main awk code as follows:

function trim(str)
{
  sub("^[ ]*", "", str);
  sub("[ ]*$", "", str);
  return str;
}

BEGIN {
  OFS=","
  stderr="cat 1>&2"
}
!/^[  ]*$/ && FNR > 2 && !/record\(s\) selected./ {
  saveDollarZero = $0 # if you want it later
  numFields = setFieldsByWidth()
  # now we can manipulate $0, NF and $1 .. $NF as we wish
  for(i=1; i <= numFields; i++)
     printf("%s%s", $i, (i != numFields) ? OFS : ORS);
}
__________________
vlad
+-----------------------+
| #include <disclaimer.h> |
+-----------------------+
Reply With Quote
  #12 (permalink)  
Old 06-25-04, 15:56
Sachin9 Sachin9 is offline
Registered User
 
Join Date: Jun 2004
Posts: 8
cheers vgersh,
the code is perfect.
Thanks a million for ur help.
Reply With Quote
  #13 (permalink)  
Old 06-28-04, 09:52
Sachin9 Sachin9 is offline
Registered User
 
Join Date: Jun 2004
Posts: 8
Hi vgersh,
Does ur code work for only 3 columns or is it a generalise one.
Reply With Quote
  #14 (permalink)  
Old 06-28-04, 10:51
vgersh99 vgersh99 is offline
Registered User
 
Join Date: Apr 2004
Location: Boston, MA
Posts: 325
what do you think?
try it.
__________________
vlad
+-----------------------+
| #include <disclaimer.h> |
+-----------------------+
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On