If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > Data Access, Manipulation & Batch Languages > Unix Shell Scripts > URGENT:awk, extract multiple cloumns from multiple files

Reply
 
LinkBack Thread Tools Display Modes
  #1 (permalink)  
Old 08-19-09, 19:01
biobee07 biobee07 is offline
Registered User
 
Join Date: Aug 2009
Posts: 3
URGENT:awk, extract multiple cloumns from multiple files

Hello,

I urgently need help with this shell scripting problem. I would really appreciate if someone can help me correct my code that i have written below.

PROBLEM:
I am trying to write a bash shell script that does the following:

1.Finds all *.txt files within my directory of interest (files are in sub-directories)
2. reads each of the files (25 files) one by one (tab-delimited format and have the same data format)
3. skips the first 10 rows of the file
4. extracts and prints out columns 2,14 , 15 into one output file
5. adds a new column to the final output file with the name of the txt file from where the data was extracted.

I have written a shell script which is not working properly and doesnot have the code for the part to skip 10 rows.

Below I have pasted a sample input file, output file and my code


Input file format: The actual data starts from the line: DATA 1 1 1 0

Code:
FEATURES	FeatureNum	Row	Col	chr_coord	SubTypeMask	SubTypeName	ProbeUID	ControlType	ProbeName	GeneName	SystematicName	Description	PositionX	PositionY
DATA	1	1	1		0		0	1	miRNABrightCorner30	miRNABrightCorner30	miRNABrightCorner30		6774.29	228.723
DATA	2	1	2		66	Structural	2	1	DarkCorner	DarkCorner	DarkCorner		6800.2	229.421
DATA	3	1	3	chr14:100595916-100595897	0		3	0	A_25_P00010115	hsa-miR-154*	hsa-miR-154*	NA	6826.51	228.385
DATA	4	1	4	chr8:135881995-135882010	0		5	0	A_25_P00010390	hsa-miR-30b	hsa-miR-30b	NA	6850.48	228.853
DATA	5	1	5	chr14:100558179-100558161	0		7	0	A_25_P00010956	hsa-miR-379	hsa-miR-379	NA	6875.37	228.408
DATA	6	1	6	chr19:058916206-058916186	0		8	0	A_25_P00011941	hsa-miR-517b	hsa-miR-517b	NA	6900.98	229.321
DATA	7	1	7	chr17:062213733-062213718	0		10	0	A_25_P00010912	hsa-miR-634	hsa-miR-634	NA	6926.91	228.768
DATA	8	1	8	chr14:100583440-100583424	0		12	0	A_25_P00010147	hsa-miR-539	hsa-miR-539	NA	6952.65	229.587
DATA	10	1	10	chr14:100601751-100601731	0		14	0	A_25_P00010023	hsa-miR-369-3p	hsa-miR-369-3p	NA	7003.36	228.794
Output format: tab delimited file. The last column shows the filename from which the data was extracted.
Code:
col2 col14 col15 filename
1 6774.29 228.723 ABC.txt 
2 6800.2 229.421 ABC.txt 
3 6826.51 228.385 DEF.txt 
4 6850.48 228.853 DEF.txt 
5 6875.37 228.408 XYZ.txt 
6 6900.98 229.321 XYZ.txt
My incomplete code: It is missing the skipping rows steps. Also it throws an error:

'test1.sh: line 3: syntax error near unexpected token `do
'test1.sh: line 3: `do

Code:
for filename in $(find -iname '*.txt') 
do
 awk -F"\t" ' 
    BEGIN {OFS="|"} {print $2,$14,$15,FILENAME}
    ' $filename > output.txt
done
Reply With Quote
  #2 (permalink)  
Old 08-20-09, 10:12
LKBrwn_DBA LKBrwn_DBA is offline
Registered User
 
Join Date: Jun 2003
Location: West Palm Beach, FL
Posts: 2,413
Cool Ofs?

Try this:
Code:
>output.txt
for filename in *.txt
do
 awk -F"\t" '
    BEGIN {OFS="|"} NR > 10 {print $2,$14,$15,FILENAME}
    ' $filename >> output.txt
done
__________________
The person who says it can't be done should not interrupt the person doing it. -- Chinese proverb
Reply With Quote
  #3 (permalink)  
Old 08-30-09, 09:08
malcomex999 malcomex999 is offline
Registered User
 
Join Date: Aug 2009
Posts: 2
Quote:
Originally Posted by LKBrwn_DBA
Try this:
Code:
>output.txt
for filename in *.txt
do
 awk -F"\t" '
    BEGIN {OFS="|"} NR > 10 {print $2,$14,$15,FILENAME}
    ' $filename >> output.txt
done
I think it is better to use FNR instead of NR as he is trying to process many files and he wants to escape the first 10 rows from each file, if i get that correctly. So FNR resets it's value each time it reads a new file while NR doesnt.

Code:
for filename in *.txt
do
 awk -F"\t" '
    BEGIN {OFS="|"} FNR > 10 {print $2,$14,$15,FILENAME}
    ' $filename >> output.txt
done
Reply With Quote
  #4 (permalink)  
Old 08-30-09, 13:20
mike_bike_kite mike_bike_kite is offline
vaguely human
 
Join Date: Jun 2007
Location: London
Posts: 2,517
Quote:
Originally Posted by malcomex999
I think it is better to use FNR instead of NR as he is trying to process many files and he wants to escape the first 10 rows from each file, if i get that correctly. So FNR resets it's value each time it reads a new file while NR doesnt.
Good point but to be fair if the original poster hasn't responded to say thanks for any help given (it's been 10 days) then it's hardly worth helping them any more.
Reply With Quote
  #5 (permalink)  
Old 08-30-09, 13:27
malcomex999 malcomex999 is offline
Registered User
 
Join Date: Aug 2009
Posts: 2
Quote:
Originally Posted by mike_bike_kite
Good point but to be fair if the original poster hasn't responded to say thanks for any help given (it's been 10 days) then it's hardly worth helping them any more.
Yes, you are right but you dont have to expect any thing in return while u help, even if it is thanks. But it might be necessary that the poster confirms that it is working for him or not.
Reply With Quote
  #6 (permalink)  
Old 08-31-09, 16:10
LKBrwn_DBA LKBrwn_DBA is offline
Registered User
 
Join Date: Jun 2003
Location: West Palm Beach, FL
Posts: 2,413
Talking URGENT...get me an answer fast! Help me, help me.

Quote:
Originally Posted by malcomex999
Yes, you are right but you dont have to expect any thing in return while u help, even if it is thanks. But it might be necessary that the poster confirms that it is working for him or not.
Agree,

Most un-profesional from the "original poster" not to respond to an "URGENT" request...

BTH: Good observation about FNR.
__________________
The person who says it can't be done should not interrupt the person doing it. -- Chinese proverb
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On