Hi Forum,
Having tried many things I keep having a problem with a Emulex 10000 HBA. When doing a backup of our database with Netbackup, we see backup is a lot slower at 1 HBA.
The symptoms are that IO's go OK until a certain point where IO's 'stall'. With iostat one sees the device in question is giving 100% busy while IO's as well as service times remain 0. This lasts for 30-50 seconds and then IO's are picking up again. See listing below. When forcing IO traffic to go through the other HBA there are no such busy %.
Anyone could give me a clue what's happening?
Listing of iostat -zxn 5, where the c6 controller is the one having the problem:
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.0 6.2 0.0 43.8 0.0 0.1 0.0 23.0 0 2 c0t0d0
0.0 6.2 0.0 43.8 0.0 0.2 0.0 31.7 0 3 c1t0d0
1056.0 0.0 32649.3 0.0 0.0 1.6 0.0 1.5 1 92 c6t1d6
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.0 0.2 0.0 1.6 0.0 0.0 0.0 9.5 0 0 c0t0d0
0.0 0.2 0.0 1.6 0.0 0.0 0.0 19.4 0 0 c1t0d0
1074.4 0.0 32673.3 0.0 0.0 1.6 0.0 1.5 1 93 c6t1d6
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.0 0.2 0.0 1.6 0.0 0.0 0.0 9.0 0 0 c0t0d0
0.0 0.2 0.0 1.6 0.0 0.0 0.0 17.7 0 0 c1t0d0
124.6 0.0 3781.7 0.0 0.0 1.1 0.0 8.6 0 99 c6t1d6
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0 100 c6t1d6
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.0 1.4 0.0 14.6 0.0 0.0 0.0 10.3 0 1 c0t0d0
0.0 1.4 0.0 14.6 0.0 0.0 0.0 13.2 0 1 c1t0d0
0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0 100 c6t1d6
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.0 1.8 0.0 12.6 0.0 0.0 0.0 8.5 0 1 c0t0d0
0.0 1.8 0.0 12.6 0.0 0.0 0.0 20.7 0 1 c1t0d0
0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0 100 c6t1d6
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.0 3.6 0.0 29.0 0.0 0.0 0.0 10.7 0 1 c0t0d0
0.0 3.6 0.0 29.0 0.0 0.1 0.0 18.4 0 2 c1t0d0
0.0 0.2 0.0 0.4 0.0 0.0 0.1 21.7 0 0 c6t1d0
60.8 0.0 1884.8 0.0 0.0 1.0 0.0 17.0 0 100 c6t1d6
The 100% busy remains there for 30-60 secs, then there is a burst of normal IO traffic with 30 MBs/sec for 20 secs, then 100% busy again, etc.
Our configuration is:
- Solaris 9, latest patches applied, on Fujitsu 1500 hardware.
- 2 Emulex 10000 Light pulse cards with latest driver software 6.02h
- Veritas Volume Manager 4.1 with latest service pack MP1
- Veritas Netbackup 5.1 with latest maintenance pack MP4S01
- Datbase: Informix 9.40 FC5XG. Backup goes through Netbackup/onbar scripts. Database is held on raw devices in the San, that is accessed through Veritas Volume Manager.
If anyone can give me a clue, it will be greatly apreciated.
Thanks Listman.