Results 1 to 13 of 13
  1. #1
    Join Date
    Mar 2006
    Location
    Tuticorin, India
    Posts
    103

    Unanswered: ESE - Backup Logic

    We are running DB2 v8.2 on AIX 5.3 with DPF (10 logical partitions). My question is specifically to backup.

    We have 10 filesystems to store the DB2 backup image namely
    /db2backup/p0 (To store co-ordinator node's image)
    /db2backup/p1 (To store node#1's image)
    /db2backup/p2 (To store node#2's image)
    .
    .
    .
    /db2backup/p9 (To store node#9's image)

    I am planning to setup the DB2 backup as below
    [1] Backup the co-ordinator node with target path as /db2backup/p0
    db2_all "<<+0<db2 backup db mydb /db2backup/p0 compress include logs ;"


    [2] Backup rest of the nodes (1-9) in parallel with the backup image written to its corresponding file system.
    Problem: If I use db2_all "||<<-0<db2 backup db ..." it would invoke the backup in parallel on all the nodes except catalog node but I am not sure how to specify the backup path.

    Hence it would be of help if dbforum user can help me on this.

  2. #2
    Join Date
    Jun 2003
    Location
    Toronto, Canada
    Posts
    5,516
    Provided Answers: 1
    I think you can use () and ## on the command line to substitute with the machine number and node number respectively.
    ---
    "It does not work" is not a valid problem statement.

  3. #3
    Join Date
    Mar 2006
    Location
    Tuticorin, India
    Posts
    103
    I am not sure what you mean by command substitution. Hence can you pls either provide me an example or any link which explains this method. Thanks

  4. #4
    Join Date
    Jun 2003
    Location
    Toronto, Canada
    Posts
    5,516
    Provided Answers: 1
    Code:
    db2_all "<<+0<db2 backup db mydb /db2backup/p## compress include logs ;"
    Details can be found in the administration manual, as is quite often the case.
    ---
    "It does not work" is not a valid problem statement.

  5. #5
    Join Date
    May 2007
    Location
    somewhere in dbforums
    Posts
    221
    From AIX point of view I can give you a couple of options -

    [1]You can write a shell script which will execute the commands in a loop
    count=1
    while [ $count < 9 ]
    do
    db2_all "<<+$count<db2 backup db mydb /db2backup/p$count compress include logs ;"
    done

    [2] write a cronjob to do the same.
    IBM Certified Database Associate, DB2 9 for LUW

  6. #6
    Join Date
    Mar 2006
    Location
    Tuticorin, India
    Posts
    103

    ESE - Backup failing when used with db2_all

    For some strange reason db2 backup is failing when used in db2_all command on DB2 V8.2/AIX 5.3

    Error message : No start db manager command is issued though I manually started the db2 before issuing db2_all backup command.

    Please advice.
    Script
    db2start >> $BKUP_LOG
    echo "started again explicitly" >> $BKUP_LOG
    print "\n`date '+%m/%d/%Y-%H:%M'` DB2 Backup Started !! \n" >> $BKUP_LOG

    db2_all "<<+30<db2 backup db ${DATABASE} ${BKUP_TYPE} ${CTLG_BKUP_PATH} ${BKUP_COMPRESS} ${INCL_LOGS
    } ;" >> $BKUP_LOG

    while [ $MY_NODE_NUM -lt 30 ]
    do
    if [ $MY_NODE_NUM -lt 10 ]
    then
    BKUP_PATH=" to /db2bkup/p0$MY_NODE_NUM/c01/bkup_full, /db2bkup/p0$MY_NODE_NUM/c02/bkup_full"
    else
    BKUP_PATH=" to /db2bkup/p$MY_NODE_NUM/c01/bkup_full, /db2bkup/p$MY_NODE_NUM/c02/bkup_full"
    fi
    echo "MY_NODE_NUM=$MY_NODE_NUM going to call db2_all "
    echo MY_NODE_NUM=$MY_NODE_NUM >> $BKUP_LOG
    echo BKUP_PATH=$BKUP_PATH >> $BKUP_LOG

    db2_all "||<<+$MY_NODE_NUM<db2 backup db ${DATABASE} ${BKUP_TYPE} ${BKUP_PATH} ${BKUP_COMPRESS} ${INCL_LOGS} ;" >> $BKUP_LOG
    echo "backup done for this node" >> $BKUP_LOG
    MY_NODE_NUM=`expr $MY_NODE_NUM + 1`
    time >> $BKUP_LOG
    done
    Log file:
    SQL1026N The database manager is already active.
    started again explicitly

    12/11/2008-22:25 DB2 Backup Started !!



    SQL1032N No start database manager command was issued.
    dbsu099: db2 backup db SSPOFRD1 ... completed rc=4

    12/11/2008-22:25 Catalog Backup was successful!
    MY_NODE_NUM=1
    BKUP_PATH= to /db2bkup/p01/c01/bkup_full, /db2bkup/p01/c02/bkup_full


    dbsu099: SQL1032N No start database manager command was issued.
    dbsu099: db2 backup db SSPOFRD1 ... completed rc=4 backup done for this node
    MY_NODE_NUM=2
    BKUP_PATH= to /db2bkup/p02/c01/bkup_full, /db2bkup/p02/c02/bkup_full


    dbsu099: SQL1032N No start database manager command was issued.
    dbsu099: db2 backup db SSPOFRD1 ... completed rc=4 backup done for this node

    12/11/2008-22:25Backup Partitions were successful!

    12/11/2008-22:25 All DB2 Backup Ended !!

  7. #7
    Join Date
    Jun 2003
    Location
    Toronto, Canada
    Posts
    5,516
    Provided Answers: 1
    db2_all "<<+30<db2 backup db???
    ---
    "It does not work" is not a valid problem statement.

  8. #8
    Join Date
    Mar 2006
    Location
    Tuticorin, India
    Posts
    103
    My mistake though. Just for illustration & easy understating initially I stated that we have just 10 logical partitions but in reality we have 30 logical partitions with 30 being the co-ordinator node. My apologies for this confusion.

    Though the instance was started manually, it throws SQL1032N error for the backup command issued via db2_all.

  9. #9
    Join Date
    May 2007
    Location
    somewhere in dbforums
    Posts
    221
    couple of things though....

    [1]does db2start and db2stop list all the 30 nodes as started or stopped??
    [2]are you able to backup the db using the same command from the prompt??
    [3]does the user who is executing the script pointing to the correct DB2INSTANCE variable in the user env???
    IBM Certified Database Associate, DB2 9 for LUW

  10. #10
    Join Date
    Mar 2006
    Location
    Tuticorin, India
    Posts
    103
    Sorry for the late reply as I was on an unplanned vacation. I am not sure what was the problem with the old script (I tried to fix an existing script initially) but I wrote a new script from scratch and it worked without any problem. However I added nohup to the nodes 1-29 so that backup would happen in parallel. Is this acceptable ?

    db2 "force application all"
    db2_all "<<+30<db2 backup db SSPOFRD1 to /db2bkup/p01/c01 compress "
    BKUP_LOG=parallel_backup.out
    MY_NODE_NUM=1
    while [ $MY_NODE_NUM -lt 30 ]
    do
    if [ $MY_NODE_NUM -lt 10 ]
    then
    BKUP_PATH=" to /db2bkup/p0$MY_NODE_NUM/c01/bkup_full, /db2bkup/p0$MY_NODE_NUM/c02/bkup_full"
    else
    BKUP_PATH=" to /db2bkup/p$MY_NODE_NUM/c01/bkup_full, /db2bkup/p$MY_NODE_NUM/c02/bkup_full"
    fi

    nohup db2_all "||<<+$MY_NODE_NUM<db2 backup db SSPOFRD1 ${BKUP_PATH} compress ;" >> $BKUP_LOG &
    echo "backup done for this node" >> $BKUP_LOG
    MY_NODE_NUM=`expr $MY_NODE_NUM + 1`
    date >> $BKUP_LOG
    done
    Script runs without any problem but following are my observation. CPU utilization is around 88% when it runs and the top 30 processes are "wait" as per ps command. Any idea what this "wait" process mean ? Why there is no db2 related process as the top 30 process as I'm taking the db2 backup

    VMSTAT Output
    kthr memory page faults cpu
    ----- ----------- ------------------------ ------------ -----------
    r b avm fre re pi po fr sr cy in sy cs us sy id wa
    663 1 21386126 32529 0 0 0 41270 130836 0 3600 2677194 16199 88 12 0 0
    656 5 21386131 31196 0 0 0 42648 133154 0 4654 2542985 16064 86 14 0 0
    660 1 21386401 31665 0 0 0 42086 138963 0 4289 2500877 15897 86 14 0 0
    656 2 21386128 32698 0 0 0 42374 149101 0 4567 2410039 16327 87 13 0 0
    656 4 21386138 32530 0 0 0 41184 139874 0 4381 2264641 15789 88 12 0 0
    643 17 21386093 32116 0 0 0 43556 168367 0 4320 2630629 16552 86 14 0 0
    646 15 21386120 32243 0 0 0 41042 128239 0 4412 2437481 16334 87 13 0 0
    649 2 21386121 32460 0 0 0 41126 430383 1 4347 2501896 15713 86 14 0 0
    649 2 21386161 31501 0 0 0 39531 169566 0 4041 2463584 15478 87 13 0 0
    645 2 21386175 31048 0 0 0 43033 126226 0 4380 2534696 15663 86 14 0 0
    640 2 21386184 31909 0 0 0 42158 126715 0 4294 2347123 16355 88 12 0 0
    638 1 21386221 30601 0 0 0 40536 112443 0 3787 2481669 16415 88 12 0 0
    649 2 21386219 32314 0 0 0 45622 126512 0 4166 2661286 16911 87 13 0 0
    641 1 21386239 31571 0 0 0 39738 111826 0 3795 2518187 16162 88 12 0 0
    639 1 21386257 31208 0 0 0 42907 118479 0 3716 2520047 15920 87 13 0 0
    651 2 21386418 31986 0 0 0 41926 114243 0 4583 2394619 15614 85 15 0 0
    651 1 21386253 31505 0 0 0 42774 120269 0 4513 2490062 17557 83 17 0 0
    635 1 21386250 31764 0 0 0 45228 121619 0 3899 2699436 17769 87 13 0 0
    636 2 21387269 32508 0 0 0 42796 116595 0 3554 2618091 16170 88 12 0 0
    642 1 21387338 32707 0 0 0 43256 115321 0 4252 2461782 15826 87 13 0 0
    Top 30 processes
    $ ps aux |head -1 ;ps aux |sort -rn +2 |head -40
    USER PID %CPU %MEM SZ RSS TTY STAT STIME TIME COMMAND
    db2fenc1 3244908 2.8 0.0 4468 4328 - A Oct 04 95352:44 db2fmp (C) 25
    root 65568 2.7 0.0 40 40 - A Aug 21 147369:08 wait
    root 61470 2.7 0.0 40 40 - A Aug 21 144621:15 wait
    root 57372 2.7 0.0 40 40 - A Aug 21 147700:50 wait
    root 53274 2.7 0.0 40 40 - A Aug 21 144055:33 wait
    root 49176 2.7 0.0 40 40 - A Aug 21 148437:13 wait
    root 40980 2.7 0.0 40 40 - A Aug 21 145921:06 wait
    root 37724 2.7 0.0 40 36 - A Aug 21 147767:28 wait
    root 37448 2.7 0.0 40 40 - A Aug 21 147695:57 wait
    root 37172 2.7 0.0 40 40 - A Aug 21 147977:15 wait
    root 33626 2.7 0.0 40 40 - A Aug 21 144634:06 wait
    root 33350 2.7 0.0 40 40 - A Aug 21 144274:39 wait
    root 29528 2.7 0.0 40 40 - A Aug 21 147847:51 wait
    root 29252 2.7 0.0 40 40 - A Aug 21 146982:16 wait
    root 28976 2.7 0.0 40 40 - A Aug 21 147696:12 wait
    root 25430 2.7 0.0 40 40 - A Aug 21 146338:06 wait
    root 25154 2.7 0.0 40 40 - A Aug 21 143775:44 wait
    root 24878 2.7 0.0 40 36 - A Aug 21 143594:27 wait
    root 21332 2.7 0.0 40 40 - A Aug 21 148069:20 wait
    root 21056 2.7 0.0 40 40 - A Aug 21 147761:07 wait
    root 20780 2.7 0.0 40 40 - A Aug 21 147810:51 wait
    root 17234 2.7 0.0 40 40 - A Aug 21 145808:15 wait
    root 13136 2.7 0.0 40 40 - A Aug 21 147589:30 wait
    root 12860 2.7 0.0 40 40 - A Aug 21 146962:09 wait
    root 12584 2.7 0.0 40 40 - A Aug 21 147627:35 wait
    root 45078 2.6 0.0 40 40 - A Aug 21 140879:00 wait
    root 33074 2.6 0.0 40 36 - A Aug 21 141810:18 wait
    root 16958 2.6 0.0 40 40 - A Aug 21 140022:50 wait
    root 16682 2.6 0.0 40 40 - A Aug 21 143198:31 wait
    root 842 2.6 0.0 40 40 - A Aug 21 141211:26 wait
    root 566 2.6 0.0 40 40 - A Aug 21 142539:38 wait
    root 290 2.6 0.0 40 40 - A Aug 21 141270:20 wait
    root 8196 2.4 0.0 40 40 - A Aug 21 127486:56 wait
    root 221868 0.2 0.0 15756 14172 - A Aug 21 9811:01 /opt/netcool/s
    root 5697698 0.1 0.0 14388 13980 - A Aug 24 2812:24 /opt/CA/SC/ccs
    root 16392 0.1 0.0 88 68 - A Aug 21 2787:29 lrud
    iefops 26566870 0.1 0.0 104636 104676 - A Dec 09 182:34 java -Xmx4096m
    db2iops1 28713068 0.1 0.0 672 708 - A 00:50:35 0:14 db2med.6869602
    db2iops1 28655766 0.1 0.0 1804 1816 - A 00:50:33 0:27 db2bm.20832874
    db2iops1 28532748 0.1 0.0 1780 1816 - A 00:50:30 0:31 db2bm.23609764
    Any thoughts on this ?

  11. #11
    Join Date
    May 2007
    Location
    somewhere in dbforums
    Posts
    221
    Quote Originally Posted by db2udbgirl
    Is this acceptable ?
    well it is acceptable.... but what i am more interested in is whether all the information is logged properly into the $BKUP_LOG file...as it is important that the failures get logged too.



    Quote Originally Posted by db2udbgirl
    Any thoughts on this ?
    UPDATE - check my explanation below
    Last edited by nick.ncs; 12-18-08 at 01:30.
    IBM Certified Database Associate, DB2 9 for LUW

  12. #12
    Join Date
    Mar 2006
    Location
    Tuticorin, India
    Posts
    103
    Thanks for the detailed explanation, now I understand the reason for it.

    Yes all the messages are logged. But unfortunately backup completed successfully in 24 partitions but it failed in 6 partitions with the following error message.
    ssh_exchange_identification: Connection closed by remote host
    I suspect that this is something related to the number of concurrent SSH requests as I'm spawning 29 simultaneous requests using nohup. Pls let me know if you have any other tips for resolving this SSH error.

  13. #13
    Join Date
    May 2007
    Location
    somewhere in dbforums
    Posts
    221
    couple of things....about your processes which were waiting for the wait thread I got it a bit wrong.... I had carried out a testing from my end which was a bit different from what you did... but on same lines....so there was some mistake in the explanation.....


    If you check carefully the those wait threads were spawned on Aug 21 so obviously that were not the thread which were caused by your script they are some other threads....so that clears out that point.... your threads will be much lower in the list as it'll be consuming less CPU %

    As for the tips... i can recommend you another option -

    suppose each of your backup process takes 10 mins [you need to know this for the process to work] so what you can do is after your nohup command you can introduce a sleep period of 1 min (sleep 60) in your script so it will take 1 min to start your next parellel backup and your backups' parellel processing will basically be throttled

    So for eg -

    Min-- Backup Carried out
    1---- 1
    2---- 1,2
    3---- 1,2,3
    4---- 1,2,3,4
    5---- 1,2,3,4,5
    6---- 1,2,3,4,5,6
    7---- 1,2,3,4,5,6,7
    8---- 1,2,3,4,5,6,7,8
    9---- 1,2,3,4,5,6,7,8,9
    10--- 2,3,4,5,6,7,8,9,10 -> backup of 1 will finish here
    11--- 3,4,5,6,7,8,9,10,11 -> backup of 2 will finish here
    and so on and so forth.....

    so basically you are limiting the no. of backups at any point of time to 9..... this can however be further fine tuned for optimum performance and you can acheive your goal
    IBM Certified Database Associate, DB2 9 for LUW

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •