If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > Database Server Software > DB2 > ESE - Backup Logic

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 12-08-08, 15:27
db2udbgirl db2udbgirl is offline
Registered User
 
Join Date: Mar 2006
Location: Tuticorin, India
Posts: 100
ESE - Backup Logic

We are running DB2 v8.2 on AIX 5.3 with DPF (10 logical partitions). My question is specifically to backup.

We have 10 filesystems to store the DB2 backup image namely
/db2backup/p0 (To store co-ordinator node's image)
/db2backup/p1 (To store node#1's image)
/db2backup/p2 (To store node#2's image)
.
.
.
/db2backup/p9 (To store node#9's image)

I am planning to setup the DB2 backup as below
[1] Backup the co-ordinator node with target path as /db2backup/p0
db2_all "<<+0<db2 backup db mydb /db2backup/p0 compress include logs ;"


[2] Backup rest of the nodes (1-9) in parallel with the backup image written to its corresponding file system.
Problem: If I use db2_all "||<<-0<db2 backup db ..." it would invoke the backup in parallel on all the nodes except catalog node but I am not sure how to specify the backup path.

Hence it would be of help if dbforum user can help me on this.
Reply With Quote
  #2 (permalink)  
Old 12-08-08, 16:15
n_i n_i is offline
:-)
 
Join Date: Jun 2003
Location: Toronto, Canada
Posts: 4,449
I think you can use () and ## on the command line to substitute with the machine number and node number respectively.
Reply With Quote
  #3 (permalink)  
Old 12-08-08, 16:54
db2udbgirl db2udbgirl is offline
Registered User
 
Join Date: Mar 2006
Location: Tuticorin, India
Posts: 100
I am not sure what you mean by command substitution. Hence can you pls either provide me an example or any link which explains this method. Thanks
Reply With Quote
  #4 (permalink)  
Old 12-08-08, 17:16
n_i n_i is offline
:-)
 
Join Date: Jun 2003
Location: Toronto, Canada
Posts: 4,449
Code:
db2_all "<<+0<db2 backup db mydb /db2backup/p## compress include logs ;"
Details can be found in the administration manual, as is quite often the case.
Reply With Quote
  #5 (permalink)  
Old 12-09-08, 09:36
nick.ncs nick.ncs is offline
Registered User
 
Join Date: May 2007
Location: somewhere in dbforums
Posts: 221
From AIX point of view I can give you a couple of options -

[1]You can write a shell script which will execute the commands in a loop
count=1
while [ $count < 9 ]
do
db2_all "<<+$count<db2 backup db mydb /db2backup/p$count compress include logs ;"
done

[2] write a cronjob to do the same.
__________________
IBM Certified Database Associate, DB2 9 for LUW
Reply With Quote
  #6 (permalink)  
Old 12-11-08, 23:42
db2udbgirl db2udbgirl is offline
Registered User
 
Join Date: Mar 2006
Location: Tuticorin, India
Posts: 100
ESE - Backup failing when used with db2_all

For some strange reason db2 backup is failing when used in db2_all command on DB2 V8.2/AIX 5.3

Error message : No start db manager command is issued though I manually started the db2 before issuing db2_all backup command.

Please advice.
Script
Quote:
db2start >> $BKUP_LOG
echo "started again explicitly" >> $BKUP_LOG
print "\n`date '+%m/%d/%Y-%H:%M'` DB2 Backup Started !! \n" >> $BKUP_LOG

db2_all "<<+30<db2 backup db ${DATABASE} ${BKUP_TYPE} ${CTLG_BKUP_PATH} ${BKUP_COMPRESS} ${INCL_LOGS
} ;" >> $BKUP_LOG

while [ $MY_NODE_NUM -lt 30 ]
do
if [ $MY_NODE_NUM -lt 10 ]
then
BKUP_PATH=" to /db2bkup/p0$MY_NODE_NUM/c01/bkup_full, /db2bkup/p0$MY_NODE_NUM/c02/bkup_full"
else
BKUP_PATH=" to /db2bkup/p$MY_NODE_NUM/c01/bkup_full, /db2bkup/p$MY_NODE_NUM/c02/bkup_full"
fi
echo "MY_NODE_NUM=$MY_NODE_NUM going to call db2_all "
echo MY_NODE_NUM=$MY_NODE_NUM >> $BKUP_LOG
echo BKUP_PATH=$BKUP_PATH >> $BKUP_LOG

db2_all "||<<+$MY_NODE_NUM<db2 backup db ${DATABASE} ${BKUP_TYPE} ${BKUP_PATH} ${BKUP_COMPRESS} ${INCL_LOGS} ;" >> $BKUP_LOG
echo "backup done for this node" >> $BKUP_LOG
MY_NODE_NUM=`expr $MY_NODE_NUM + 1`
time >> $BKUP_LOG
done
Log file:
Quote:
SQL1026N The database manager is already active.
started again explicitly

12/11/2008-22:25 DB2 Backup Started !!



SQL1032N No start database manager command was issued.
dbsu099: db2 backup db SSPOFRD1 ... completed rc=4

12/11/2008-22:25 Catalog Backup was successful!
MY_NODE_NUM=1
BKUP_PATH= to /db2bkup/p01/c01/bkup_full, /db2bkup/p01/c02/bkup_full


dbsu099: SQL1032N No start database manager command was issued.
dbsu099: db2 backup db SSPOFRD1 ... completed rc=4 backup done for this node
MY_NODE_NUM=2
BKUP_PATH= to /db2bkup/p02/c01/bkup_full, /db2bkup/p02/c02/bkup_full


dbsu099: SQL1032N No start database manager command was issued.
dbsu099: db2 backup db SSPOFRD1 ... completed rc=4 backup done for this node

12/11/2008-22:25Backup Partitions were successful!

12/11/2008-22:25 All DB2 Backup Ended !!
Reply With Quote
  #7 (permalink)  
Old 12-12-08, 07:09
n_i n_i is offline
:-)
 
Join Date: Jun 2003
Location: Toronto, Canada
Posts: 4,449
db2_all "<<+30<db2 backup db???
Reply With Quote
  #8 (permalink)  
Old 12-12-08, 12:39
db2udbgirl db2udbgirl is offline
Registered User
 
Join Date: Mar 2006
Location: Tuticorin, India
Posts: 100
My mistake though. Just for illustration & easy understating initially I stated that we have just 10 logical partitions but in reality we have 30 logical partitions with 30 being the co-ordinator node. My apologies for this confusion.

Though the instance was started manually, it throws SQL1032N error for the backup command issued via db2_all.
Reply With Quote
  #9 (permalink)  
Old 12-13-08, 23:15
nick.ncs nick.ncs is offline
Registered User
 
Join Date: May 2007
Location: somewhere in dbforums
Posts: 221
couple of things though....

[1]does db2start and db2stop list all the 30 nodes as started or stopped??
[2]are you able to backup the db using the same command from the prompt??
[3]does the user who is executing the script pointing to the correct DB2INSTANCE variable in the user env???
__________________
IBM Certified Database Associate, DB2 9 for LUW
Reply With Quote
  #10 (permalink)  
Old 12-17-08, 02:16
db2udbgirl db2udbgirl is offline
Registered User
 
Join Date: Mar 2006
Location: Tuticorin, India
Posts: 100
Sorry for the late reply as I was on an unplanned vacation. I am not sure what was the problem with the old script (I tried to fix an existing script initially) but I wrote a new script from scratch and it worked without any problem. However I added nohup to the nodes 1-29 so that backup would happen in parallel. Is this acceptable ?

Quote:
db2 "force application all"
db2_all "<<+30<db2 backup db SSPOFRD1 to /db2bkup/p01/c01 compress "
BKUP_LOG=parallel_backup.out
MY_NODE_NUM=1
while [ $MY_NODE_NUM -lt 30 ]
do
if [ $MY_NODE_NUM -lt 10 ]
then
BKUP_PATH=" to /db2bkup/p0$MY_NODE_NUM/c01/bkup_full, /db2bkup/p0$MY_NODE_NUM/c02/bkup_full"
else
BKUP_PATH=" to /db2bkup/p$MY_NODE_NUM/c01/bkup_full, /db2bkup/p$MY_NODE_NUM/c02/bkup_full"
fi

nohup db2_all "||<<+$MY_NODE_NUM<db2 backup db SSPOFRD1 ${BKUP_PATH} compress ;" >> $BKUP_LOG &
echo "backup done for this node" >> $BKUP_LOG
MY_NODE_NUM=`expr $MY_NODE_NUM + 1`
date >> $BKUP_LOG
done
Script runs without any problem but following are my observation. CPU utilization is around 88% when it runs and the top 30 processes are "wait" as per ps command. Any idea what this "wait" process mean ? Why there is no db2 related process as the top 30 process as I'm taking the db2 backup

VMSTAT Output
Quote:
kthr memory page faults cpu
----- ----------- ------------------------ ------------ -----------
r b avm fre re pi po fr sr cy in sy cs us sy id wa
663 1 21386126 32529 0 0 0 41270 130836 0 3600 2677194 16199 88 12 0 0
656 5 21386131 31196 0 0 0 42648 133154 0 4654 2542985 16064 86 14 0 0
660 1 21386401 31665 0 0 0 42086 138963 0 4289 2500877 15897 86 14 0 0
656 2 21386128 32698 0 0 0 42374 149101 0 4567 2410039 16327 87 13 0 0
656 4 21386138 32530 0 0 0 41184 139874 0 4381 2264641 15789 88 12 0 0
643 17 21386093 32116 0 0 0 43556 168367 0 4320 2630629 16552 86 14 0 0
646 15 21386120 32243 0 0 0 41042 128239 0 4412 2437481 16334 87 13 0 0
649 2 21386121 32460 0 0 0 41126 430383 1 4347 2501896 15713 86 14 0 0
649 2 21386161 31501 0 0 0 39531 169566 0 4041 2463584 15478 87 13 0 0
645 2 21386175 31048 0 0 0 43033 126226 0 4380 2534696 15663 86 14 0 0
640 2 21386184 31909 0 0 0 42158 126715 0 4294 2347123 16355 88 12 0 0
638 1 21386221 30601 0 0 0 40536 112443 0 3787 2481669 16415 88 12 0 0
649 2 21386219 32314 0 0 0 45622 126512 0 4166 2661286 16911 87 13 0 0
641 1 21386239 31571 0 0 0 39738 111826 0 3795 2518187 16162 88 12 0 0
639 1 21386257 31208 0 0 0 42907 118479 0 3716 2520047 15920 87 13 0 0
651 2 21386418 31986 0 0 0 41926 114243 0 4583 2394619 15614 85 15 0 0
651 1 21386253 31505 0 0 0 42774 120269 0 4513 2490062 17557 83 17 0 0
635 1 21386250 31764 0 0 0 45228 121619 0 3899 2699436 17769 87 13 0 0
636 2 21387269 32508 0 0 0 42796 116595 0 3554 2618091 16170 88 12 0 0
642 1 21387338 32707 0 0 0 43256 115321 0 4252 2461782 15826 87 13 0 0
Top 30 processes
Quote:
$ ps aux |head -1 ;ps aux |sort -rn +2 |head -40
USER PID %CPU %MEM SZ RSS TTY STAT STIME TIME COMMAND
db2fenc1 3244908 2.8 0.0 4468 4328 - A Oct 04 95352:44 db2fmp (C) 25
root 65568 2.7 0.0 40 40 - A Aug 21 147369:08 wait
root 61470 2.7 0.0 40 40 - A Aug 21 144621:15 wait
root 57372 2.7 0.0 40 40 - A Aug 21 147700:50 wait
root 53274 2.7 0.0 40 40 - A Aug 21 144055:33 wait
root 49176 2.7 0.0 40 40 - A Aug 21 148437:13 wait
root 40980 2.7 0.0 40 40 - A Aug 21 145921:06 wait
root 37724 2.7 0.0 40 36 - A Aug 21 147767:28 wait
root 37448 2.7 0.0 40 40 - A Aug 21 147695:57 wait
root 37172 2.7 0.0 40 40 - A Aug 21 147977:15 wait
root 33626 2.7 0.0 40 40 - A Aug 21 144634:06 wait
root 33350 2.7 0.0 40 40 - A Aug 21 144274:39 wait
root 29528 2.7 0.0 40 40 - A Aug 21 147847:51 wait
root 29252 2.7 0.0 40 40 - A Aug 21 146982:16 wait
root 28976 2.7 0.0 40 40 - A Aug 21 147696:12 wait
root 25430 2.7 0.0 40 40 - A Aug 21 146338:06 wait
root 25154 2.7 0.0 40 40 - A Aug 21 143775:44 wait
root 24878 2.7 0.0 40 36 - A Aug 21 143594:27 wait
root 21332 2.7 0.0 40 40 - A Aug 21 148069:20 wait
root 21056 2.7 0.0 40 40 - A Aug 21 147761:07 wait
root 20780 2.7 0.0 40 40 - A Aug 21 147810:51 wait
root 17234 2.7 0.0 40 40 - A Aug 21 145808:15 wait
root 13136 2.7 0.0 40 40 - A Aug 21 147589:30 wait
root 12860 2.7 0.0 40 40 - A Aug 21 146962:09 wait
root 12584 2.7 0.0 40 40 - A Aug 21 147627:35 wait
root 45078 2.6 0.0 40 40 - A Aug 21 140879:00 wait
root 33074 2.6 0.0 40 36 - A Aug 21 141810:18 wait
root 16958 2.6 0.0 40 40 - A Aug 21 140022:50 wait
root 16682 2.6 0.0 40 40 - A Aug 21 143198:31 wait
root 842 2.6 0.0 40 40 - A Aug 21 141211:26 wait
root 566 2.6 0.0 40 40 - A Aug 21 142539:38 wait
root 290 2.6 0.0 40 40 - A Aug 21 141270:20 wait
root 8196 2.4 0.0 40 40 - A Aug 21 127486:56 wait
root 221868 0.2 0.0 15756 14172 - A Aug 21 9811:01 /opt/netcool/s
root 5697698 0.1 0.0 14388 13980 - A Aug 24 2812:24 /opt/CA/SC/ccs
root 16392 0.1 0.0 88 68 - A Aug 21 2787:29 lrud
iefops 26566870 0.1 0.0 104636 104676 - A Dec 09 182:34 java -Xmx4096m
db2iops1 28713068 0.1 0.0 672 708 - A 00:50:35 0:14 db2med.6869602
db2iops1 28655766 0.1 0.0 1804 1816 - A 00:50:33 0:27 db2bm.20832874
db2iops1 28532748 0.1 0.0 1780 1816 - A 00:50:30 0:31 db2bm.23609764
Any thoughts on this ?
Reply With Quote
  #11 (permalink)  
Old 12-17-08, 04:18
nick.ncs nick.ncs is offline
Registered User
 
Join Date: May 2007
Location: somewhere in dbforums
Posts: 221
Quote:
Originally Posted by db2udbgirl
Is this acceptable ?
well it is acceptable.... but what i am more interested in is whether all the information is logged properly into the $BKUP_LOG file...as it is important that the failures get logged too.



Quote:
Originally Posted by db2udbgirl
Any thoughts on this ?
UPDATE - check my explanation below
__________________
IBM Certified Database Associate, DB2 9 for LUW

Last edited by nick.ncs; 12-18-08 at 00:30.
Reply With Quote
  #12 (permalink)  
Old 12-17-08, 16:55
db2udbgirl db2udbgirl is offline
Registered User
 
Join Date: Mar 2006
Location: Tuticorin, India
Posts: 100
Thanks for the detailed explanation, now I understand the reason for it.

Yes all the messages are logged. But unfortunately backup completed successfully in 24 partitions but it failed in 6 partitions with the following error message.
Quote:
ssh_exchange_identification: Connection closed by remote host
I suspect that this is something related to the number of concurrent SSH requests as I'm spawning 29 simultaneous requests using nohup. Pls let me know if you have any other tips for resolving this SSH error.
Reply With Quote
  #13 (permalink)  
Old 12-18-08, 00:27
nick.ncs nick.ncs is offline
Registered User
 
Join Date: May 2007
Location: somewhere in dbforums
Posts: 221
couple of things....about your processes which were waiting for the wait thread I got it a bit wrong.... I had carried out a testing from my end which was a bit different from what you did... but on same lines....so there was some mistake in the explanation.....


If you check carefully the those wait threads were spawned on Aug 21 so obviously that were not the thread which were caused by your script they are some other threads....so that clears out that point.... your threads will be much lower in the list as it'll be consuming less CPU %

As for the tips... i can recommend you another option -

suppose each of your backup process takes 10 mins [you need to know this for the process to work] so what you can do is after your nohup command you can introduce a sleep period of 1 min (sleep 60) in your script so it will take 1 min to start your next parellel backup and your backups' parellel processing will basically be throttled

So for eg -

Min-- Backup Carried out
1---- 1
2---- 1,2
3---- 1,2,3
4---- 1,2,3,4
5---- 1,2,3,4,5
6---- 1,2,3,4,5,6
7---- 1,2,3,4,5,6,7
8---- 1,2,3,4,5,6,7,8
9---- 1,2,3,4,5,6,7,8,9
10--- 2,3,4,5,6,7,8,9,10 -> backup of 1 will finish here
11--- 3,4,5,6,7,8,9,10,11 -> backup of 2 will finish here
and so on and so forth.....

so basically you are limiting the no. of backups at any point of time to 9..... this can however be further fine tuned for optimum performance and you can acheive your goal
__________________
IBM Certified Database Associate, DB2 9 for LUW
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On