If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > Data Access, Manipulation & Batch Languages > Unix Shell Scripts > Parse a set of files...

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 08-01-10, 17:52
urs_77 urs_77 is offline
Member
 
Join Date: Jan 2003
Location: Schaumburg, IL
Posts: 79
Parse a set of files...

Hello,

I am trying to do the following:


Step 1: grep the timestamp from the filename

Example:
tbs_orms_07292010_03_18_02.out
tbs_orms_07292010_08_18_41.out
tbs_orms_07292010_13_19_39.out
tbs_orms_07292010_18_20_41.out
tbs_orms_07292010_23_22_26.out

I want to capture the portion 03_18_02.

Step 2: Open that file
Step 3: Read the file from the 4th line until it finds a blank line

Example:

TBSP_NAME TSORMS
--------------- --------------------
TAB32K 7
TBS4K_INVENTORY 6
TBS16K_ADDRESS 6
TBS4K_PAYMENT 5
IDX4K_TABINVENT 5
IDX4K_ORDERITEM 5
IDX4K_XORDER_1 5
IDX4K_XORDER 5
TBS8K_ACCOUNT 4
TBS4K_GARBAGE 4
TBS4K_MEMBER 4
TBS4K_XORDER 4
IDX4K_ORDPAYINF 4
CD_TAB32K 4
SYSCATSPACE 3
TBS4K_CALCULATI 3
TBS4K_INTERNAL_ 3
TBS4K_MARKETING 3
TBS4K_MISC 3
TBS4K_TRADING 3
TBS4K_ORDERITEM 3
TBS4K_ORDPAYINF 3
IDX4K_USERS 3
CD_TAB16K 3
TBS4K_XORDER_1 3
TBS8K_XCATENTRY 3
IDX4K_MGHMDLVRY 3
TEMP_TS_8K 2
TAB16K 2
TBS4K_CATALOG 2
TBS4K_STORE 2
TBS4K_WORKSPACE 2
IDX16K_ADDRESS 2
TBS4K_TABORDERS 2
IDX8K_XCATENTRY 2

35 record(s) selected.

So, I want to capture the line after --------------- and till IDX8K_XCATENTRY 2.

Each of the above said files have different # of records.

Step 4: For each of these lines, insert the timestamp captured in Step 1 as the first column and save it in the specified file.

I will appreciate any help rendered.

Thanks!
Naveen.
__________________
Naveen Urs
DBA Manager
IBM Certified Solutions Expert - DB2 LUW V7, V9
Reply With Quote
  #2 (permalink)  
Old 08-02-10, 04:27
stolze stolze is offline
Registered User
 
Join Date: Jan 2007
Location: Jena, Germany
Posts: 2,662
You can use the "cut" command and work on the positions of the file name. That is good if you always have exactly the same length of the file names. Then you can use "awk" with '_' and '.' as field separators. And then you could use "sed" to strip away everything from the file name, leaving only the timestamp. For example, with sed it could be something like this:
Code:
sed -e 's/.*_\([0-9]\{2\}_[0-9]\{2\}_[0-9]\{2\}\).*/\1/'
Opening the file and reading from it can be done with "awk" where you count the lines and only start processing if the line counter is larger than 3 and then terminate if $0 is empty. Or you use "tail" with "-n +4" to start with the 4th line, combined with "head -n -3" to remove the last 3 lines. For step 4, you print the date (it is not a timestamp!) from step 1, following by the actual line being processed.

Or you implement this in a script language like Perl.
__________________
Knut Stolze
IBM DB2 Analytics Accelerator
IBM Germany Research & Development
Reply With Quote
  #3 (permalink)  
Old 08-02-10, 07:34
pdreyer pdreyer is offline
Registered User
 
Join Date: May 2005
Location: South Africa
Posts: 1,268
Code:
for f in tbs_orms_*2010_*.out 
do
g="${f#*2010_}"
sed -n "/^$/,/$/d;4,$ s/^/${g%.out} /p" $f >$f.new
done

Last edited by pdreyer; 08-02-10 at 08:02.
Reply With Quote
  #4 (permalink)  
Old 08-02-10, 15:08
urs_77 urs_77 is offline
Member
 
Join Date: Jan 2003
Location: Schaumburg, IL
Posts: 79
The script worked like a charm. You are my life saver as I had to parse through 4000 files and it would have been a nightmare if I had to do it manually.

Thanks!
__________________
Naveen Urs
DBA Manager
IBM Certified Solutions Expert - DB2 LUW V7, V9
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On