Results 1 to 6 of 6
  1. #1
    Join Date
    Aug 2009
    Posts
    3

    Unanswered: URGENT:awk, extract multiple cloumns from multiple files

    Hello,

    I urgently need help with this shell scripting problem. I would really appreciate if someone can help me correct my code that i have written below.

    PROBLEM:
    I am trying to write a bash shell script that does the following:

    1.Finds all *.txt files within my directory of interest (files are in sub-directories)
    2. reads each of the files (25 files) one by one (tab-delimited format and have the same data format)
    3. skips the first 10 rows of the file
    4. extracts and prints out columns 2,14 , 15 into one output file
    5. adds a new column to the final output file with the name of the txt file from where the data was extracted.

    I have written a shell script which is not working properly and doesnot have the code for the part to skip 10 rows.

    Below I have pasted a sample input file, output file and my code


    Input file format: The actual data starts from the line: DATA 1 1 1 0

    Code:
    FEATURES	FeatureNum	Row	Col	chr_coord	SubTypeMask	SubTypeName	ProbeUID	ControlType	ProbeName	GeneName	SystematicName	Description	PositionX	PositionY
    DATA	1	1	1		0		0	1	miRNABrightCorner30	miRNABrightCorner30	miRNABrightCorner30		6774.29	228.723
    DATA	2	1	2		66	Structural	2	1	DarkCorner	DarkCorner	DarkCorner		6800.2	229.421
    DATA	3	1	3	chr14:100595916-100595897	0		3	0	A_25_P00010115	hsa-miR-154*	hsa-miR-154*	NA	6826.51	228.385
    DATA	4	1	4	chr8:135881995-135882010	0		5	0	A_25_P00010390	hsa-miR-30b	hsa-miR-30b	NA	6850.48	228.853
    DATA	5	1	5	chr14:100558179-100558161	0		7	0	A_25_P00010956	hsa-miR-379	hsa-miR-379	NA	6875.37	228.408
    DATA	6	1	6	chr19:058916206-058916186	0		8	0	A_25_P00011941	hsa-miR-517b	hsa-miR-517b	NA	6900.98	229.321
    DATA	7	1	7	chr17:062213733-062213718	0		10	0	A_25_P00010912	hsa-miR-634	hsa-miR-634	NA	6926.91	228.768
    DATA	8	1	8	chr14:100583440-100583424	0		12	0	A_25_P00010147	hsa-miR-539	hsa-miR-539	NA	6952.65	229.587
    DATA	10	1	10	chr14:100601751-100601731	0		14	0	A_25_P00010023	hsa-miR-369-3p	hsa-miR-369-3p	NA	7003.36	228.794
    Output format: tab delimited file. The last column shows the filename from which the data was extracted.
    Code:
    col2 col14 col15 filename
    1 6774.29 228.723 ABC.txt 
    2 6800.2 229.421 ABC.txt 
    3 6826.51 228.385 DEF.txt 
    4 6850.48 228.853 DEF.txt 
    5 6875.37 228.408 XYZ.txt 
    6 6900.98 229.321 XYZ.txt
    My incomplete code: It is missing the skipping rows steps. Also it throws an error:

    'test1.sh: line 3: syntax error near unexpected token `do
    'test1.sh: line 3: `do

    Code:
    for filename in $(find -iname '*.txt') 
    do
     awk -F"\t" ' 
        BEGIN {OFS="|"} {print $2,$14,$15,FILENAME}
        ' $filename > output.txt
    done

  2. #2
    Join Date
    Jun 2003
    Location
    West Palm Beach, FL
    Posts
    2,713

    Cool Ofs?

    Try this:
    Code:
    >output.txt
    for filename in *.txt
    do
     awk -F"\t" '
        BEGIN {OFS="|"} NR > 10 {print $2,$14,$15,FILENAME}
        ' $filename >> output.txt
    done
    The person who says it can't be done should not interrupt the person doing it. -- Chinese proverb

  3. #3
    Join Date
    Aug 2009
    Posts
    2
    Quote Originally Posted by LKBrwn_DBA
    Try this:
    Code:
    >output.txt
    for filename in *.txt
    do
     awk -F"\t" '
        BEGIN {OFS="|"} NR > 10 {print $2,$14,$15,FILENAME}
        ' $filename >> output.txt
    done
    I think it is better to use FNR instead of NR as he is trying to process many files and he wants to escape the first 10 rows from each file, if i get that correctly. So FNR resets it's value each time it reads a new file while NR doesnt.

    Code:
    for filename in *.txt
    do
     awk -F"\t" '
        BEGIN {OFS="|"} FNR > 10 {print $2,$14,$15,FILENAME}
        ' $filename >> output.txt
    done

  4. #4
    Join Date
    Jun 2007
    Location
    London
    Posts
    2,527
    Quote Originally Posted by malcomex999
    I think it is better to use FNR instead of NR as he is trying to process many files and he wants to escape the first 10 rows from each file, if i get that correctly. So FNR resets it's value each time it reads a new file while NR doesnt.
    Good point but to be fair if the original poster hasn't responded to say thanks for any help given (it's been 10 days) then it's hardly worth helping them any more.

  5. #5
    Join Date
    Aug 2009
    Posts
    2
    Quote Originally Posted by mike_bike_kite
    Good point but to be fair if the original poster hasn't responded to say thanks for any help given (it's been 10 days) then it's hardly worth helping them any more.
    Yes, you are right but you dont have to expect any thing in return while u help, even if it is thanks. But it might be necessary that the poster confirms that it is working for him or not.

  6. #6
    Join Date
    Jun 2003
    Location
    West Palm Beach, FL
    Posts
    2,713

    Talking URGENT...get me an answer fast! Help me, help me.

    Quote Originally Posted by malcomex999
    Yes, you are right but you dont have to expect any thing in return while u help, even if it is thanks. But it might be necessary that the poster confirms that it is working for him or not.
    Agree,

    Most un-profesional from the "original poster" not to respond to an "URGENT" request...

    BTH: Good observation about FNR.
    The person who says it can't be done should not interrupt the person doing it. -- Chinese proverb

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •