Results 1 to 4 of 4
  1. #1
    Join Date
    Jan 2003
    Location
    Schaumburg, IL
    Posts
    79

    Unanswered: Parse a set of files...

    Hello,

    I am trying to do the following:


    Step 1: grep the timestamp from the filename

    Example:
    tbs_orms_07292010_03_18_02.out
    tbs_orms_07292010_08_18_41.out
    tbs_orms_07292010_13_19_39.out
    tbs_orms_07292010_18_20_41.out
    tbs_orms_07292010_23_22_26.out

    I want to capture the portion 03_18_02.

    Step 2: Open that file
    Step 3: Read the file from the 4th line until it finds a blank line

    Example:

    TBSP_NAME TSORMS
    --------------- --------------------
    TAB32K 7
    TBS4K_INVENTORY 6
    TBS16K_ADDRESS 6
    TBS4K_PAYMENT 5
    IDX4K_TABINVENT 5
    IDX4K_ORDERITEM 5
    IDX4K_XORDER_1 5
    IDX4K_XORDER 5
    TBS8K_ACCOUNT 4
    TBS4K_GARBAGE 4
    TBS4K_MEMBER 4
    TBS4K_XORDER 4
    IDX4K_ORDPAYINF 4
    CD_TAB32K 4
    SYSCATSPACE 3
    TBS4K_CALCULATI 3
    TBS4K_INTERNAL_ 3
    TBS4K_MARKETING 3
    TBS4K_MISC 3
    TBS4K_TRADING 3
    TBS4K_ORDERITEM 3
    TBS4K_ORDPAYINF 3
    IDX4K_USERS 3
    CD_TAB16K 3
    TBS4K_XORDER_1 3
    TBS8K_XCATENTRY 3
    IDX4K_MGHMDLVRY 3
    TEMP_TS_8K 2
    TAB16K 2
    TBS4K_CATALOG 2
    TBS4K_STORE 2
    TBS4K_WORKSPACE 2
    IDX16K_ADDRESS 2
    TBS4K_TABORDERS 2
    IDX8K_XCATENTRY 2

    35 record(s) selected.

    So, I want to capture the line after --------------- and till IDX8K_XCATENTRY 2.

    Each of the above said files have different # of records.

    Step 4: For each of these lines, insert the timestamp captured in Step 1 as the first column and save it in the specified file.

    I will appreciate any help rendered.

    Thanks!
    Naveen.
    Naveen Urs
    DBA Manager
    IBM Certified Solutions Expert - DB2 LUW V7, V9

  2. #2
    Join Date
    Jan 2007
    Location
    Jena, Germany
    Posts
    2,721
    You can use the "cut" command and work on the positions of the file name. That is good if you always have exactly the same length of the file names. Then you can use "awk" with '_' and '.' as field separators. And then you could use "sed" to strip away everything from the file name, leaving only the timestamp. For example, with sed it could be something like this:
    Code:
    sed -e 's/.*_\([0-9]\{2\}_[0-9]\{2\}_[0-9]\{2\}\).*/\1/'
    Opening the file and reading from it can be done with "awk" where you count the lines and only start processing if the line counter is larger than 3 and then terminate if $0 is empty. Or you use "tail" with "-n +4" to start with the 4th line, combined with "head -n -3" to remove the last 3 lines. For step 4, you print the date (it is not a timestamp!) from step 1, following by the actual line being processed.

    Or you implement this in a script language like Perl.
    Knut Stolze
    IBM DB2 Analytics Accelerator
    IBM Germany Research & Development

  3. #3
    Join Date
    May 2005
    Location
    South Africa
    Posts
    1,365
    Provided Answers: 1
    Code:
    for f in tbs_orms_*2010_*.out 
    do
    g="${f#*2010_}"
    sed -n "/^$/,/$/d;4,$ s/^/${g%.out} /p" $f >$f.new
    done
    Last edited by pdreyer; 08-02-10 at 09:02.

  4. #4
    Join Date
    Jan 2003
    Location
    Schaumburg, IL
    Posts
    79
    The script worked like a charm. You are my life saver as I had to parse through 4000 files and it would have been a nightmare if I had to do it manually.

    Thanks!
    Naveen Urs
    DBA Manager
    IBM Certified Solutions Expert - DB2 LUW V7, V9

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •