Hello,
I urgently need help with this shell scripting problem. I would really appreciate if someone can help me correct my code that i have written below.
PROBLEM:
I am trying to write a bash shell script that does the following:
1.Finds all *.txt files within my directory of interest (files are in sub-directories)
2. reads each of the files (25 files) one by one (tab-delimited format and have the same data format)
3. skips the first 10 rows of the file
4. extracts and prints out columns 2,14 , 15 into one output file
5. adds a new column to the final output file with the name of the txt file from where the data was extracted.
I have written a shell script which is not working properly and doesnot have the code for the part to skip 10 rows.
Below I have pasted a sample input file, output file and my code
Input file format: The actual data starts from the line: DATA 1 1 1 0
Code:
FEATURES FeatureNum Row Col chr_coord SubTypeMask SubTypeName ProbeUID ControlType ProbeName GeneName SystematicName Description PositionX PositionY
DATA 1 1 1 0 0 1 miRNABrightCorner30 miRNABrightCorner30 miRNABrightCorner30 6774.29 228.723
DATA 2 1 2 66 Structural 2 1 DarkCorner DarkCorner DarkCorner 6800.2 229.421
DATA 3 1 3 chr14:100595916-100595897 0 3 0 A_25_P00010115 hsa-miR-154* hsa-miR-154* NA 6826.51 228.385
DATA 4 1 4 chr8:135881995-135882010 0 5 0 A_25_P00010390 hsa-miR-30b hsa-miR-30b NA 6850.48 228.853
DATA 5 1 5 chr14:100558179-100558161 0 7 0 A_25_P00010956 hsa-miR-379 hsa-miR-379 NA 6875.37 228.408
DATA 6 1 6 chr19:058916206-058916186 0 8 0 A_25_P00011941 hsa-miR-517b hsa-miR-517b NA 6900.98 229.321
DATA 7 1 7 chr17:062213733-062213718 0 10 0 A_25_P00010912 hsa-miR-634 hsa-miR-634 NA 6926.91 228.768
DATA 8 1 8 chr14:100583440-100583424 0 12 0 A_25_P00010147 hsa-miR-539 hsa-miR-539 NA 6952.65 229.587
DATA 10 1 10 chr14:100601751-100601731 0 14 0 A_25_P00010023 hsa-miR-369-3p hsa-miR-369-3p NA 7003.36 228.794
Output format: tab delimited file. The last column shows the filename from which the data was extracted.
Code:
col2 col14 col15 filename
1 6774.29 228.723 ABC.txt
2 6800.2 229.421 ABC.txt
3 6826.51 228.385 DEF.txt
4 6850.48 228.853 DEF.txt
5 6875.37 228.408 XYZ.txt
6 6900.98 229.321 XYZ.txt
My incomplete code: It is missing the skipping rows steps. Also it throws an error:
'test1.sh: line 3: syntax error near unexpected token `do
'test1.sh: line 3: `do
Code:
for filename in $(find -iname '*.txt')
do
awk -F"\t" '
BEGIN {OFS="|"} {print $2,$14,$15,FILENAME}
' $filename > output.txt
done