From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; rv:1.7.3) Gecko/20041001
Description of problem:
After seeing kswapd go to 100% load on our development IMAP servers
with the U3 kernel we have been following the beta kernels to see if
it fixed the problem. While the kswapd problem is gone now the kernel
randomly hangs during file IO. We have done high cpu things on the
box and it is fine, however after as little as an hour the box may
hang with certain disk activity. It will consistantly hang before it
reaches 24 hrs of activity. We are using EMC powerpath with QLA2310
cards to talk to storage. In order to re-create the problem all we
have to do is do a restore to the disk. We can also re-create it by
simulating Cyrus IMAPD activity. There is no messaged printed to
console about a panic, sysrq keys don't work, so I'm not sure how to
get more debugging out of the situation.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. run anything that requires sustained disk i/o on fibre chanel
Actual Results: kernel hangs, no pings, no ctrl+alt+delete, no sysrq
emcphr 9592 4
emcpmpap 104320 4
emcpmpaa 71680 4
emcpmpc 91552 4
emcpmp 55360 4
emcp 542488 5 [emcphr emcpmpap emcpmpaa emcpmpc
emcpsf 6788 0 [emcpmpap emcp]
openafs 562640 2
audit 90520 2 (autoclean)
autofs 13620 1 (autoclean)
e1000 75808 1
iptable_filter 2412 1 (autoclean)
ip_tables 16544 1 [iptable_filter]
floppy 57520 0 (autoclean)
sg 37228 2 (autoclean)
microcode 6848 0 (autoclean)
loop 12696 0 (autoclean)
lvm-mod 64864 0
keybdev 2976 0 (unused)
mousedev 5624 0 (unused)
hid 22276 0 (unused)
input 6144 0 [keybdev mousedev hid]
usb-uhci 26956 0 (unused)
usbcore 80928 1 [hid usb-uhci]
ext3 89960 7
jbd 55060 7 [ext3]
qla2300 311580 4
aic79xx 187420 7
sd_mod 13360 20
scsi_mod 112680 5 [emcpmpap emcpmpaa emcpmpc emcpmp
emcp emcpsf sg qla2300 aic79xx sd_mod]
total used free shared buffers cached
Mem: 4124056 460168 3663888 0 63128 153260
-/+ buffers/cache: 243780 3880276
Swap: 2040212 0 2040212
Linux myserver 2.4.21-15.0.4.ELsmp #1 SMP Sat Jul 31 01:25:25 EDT 2004
i686 i686 i386 GNU/Linux
Typical loads during imap test crashes:
CPU states: cpu user nice system irq softirq iowait idle
total 60.0% 0.0% 6.8% 0.4% 6.4% 58.4% 266.4%
cpu00 10.9% 0.0% 1.3% 0.5% 4.9% 8.3% 73.8%
cpu01 13.9% 0.0% 1.7% 0.0% 0.9% 22.0% 61.2%
cpu02 15.6% 0.0% 1.3% 0.0% 0.0% 7.5% 75.3%
cpu03 19.8% 0.0% 2.3% 0.0% 0.7% 20.8% 56.1%
Machine is a dual xenon machine running hyperthreaded.
Ryan, please get the system in the state you describe then get me
AltSysrq-M, AltSysrq-T and AltSystq-W outputs so I can see where the
memory is located and what each process and CPU are doing.
Also, please reproduce this without Powerpath, or any other modules
that taint the kernel.
Ryan, any luck reproducing this problem without Powerpath?
Sorry but I missed the fact that AltSysrq doesnt work. If those keys
dont even work we really need to reproduce this without Powerpath so
we can see what is causing the system to hang.
Sorry for the delay, I wanted to make sure that this ran for at least
48 hours before I said that it didn't happen again. I thought that in
the past I'd re-created this without powerpath running, but that may
have been an earlier kernel. When I tried a fresh install and no
powerpath installed the problem did not come back. Sorry about that.
Any ideas how long it will be for the U4 kernel to be finished so that
we can encourage EMC to look at the problem ?
The RHEL3-U4 kernel is due to be released in about one month.
However, we certainly can get EMC a kernel right away so they can
start looking into this issue.