Bug 140624 - kernel-smp-2.4.21-25.EL hangs on disk io
Summary: kernel-smp-2.4.21-25.EL hangs on disk io
Alias: None
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel   
(Show other bugs)
Version: 3.0
Hardware: i686
OS: Linux
Target Milestone: ---
Assignee: Larry Woodman
QA Contact: Brian Brock
Depends On:
TreeView+ depends on / blocked
Reported: 2004-11-23 21:44 UTC by Ryan Linn
Modified: 2007-11-30 22:07 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2004-11-30 18:56:14 UTC
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

Description Ryan Linn 2004-11-23 21:44:53 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; rv:1.7.3) Gecko/20041001

Description of problem:
After seeing kswapd go to 100% load on our development IMAP servers
with the U3 kernel we have been following the beta kernels to see if
it fixed the problem.  While the kswapd problem is gone now the kernel
randomly hangs during file IO.  We have done high cpu things on the
box and it is fine, however after as little as an hour the box may
hang with certain disk activity.  It will consistantly hang before it
reaches 24 hrs of activity.  We are using EMC powerpath with QLA2310
cards to talk to storage.  In order to re-create the problem all we
have to do is do a restore to the disk.  We can also re-create it by
simulating Cyrus IMAPD activity. There is no messaged printed to
console about a panic, sysrq keys don't work, so I'm not sure how to
get more debugging out of the situation.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. run anything that requires sustained disk i/o on fibre chanel
2. wait


Actual Results:  kernel hangs, no pings, no ctrl+alt+delete, no sysrq

Additional info:

Modules loaded:
emcphr                  9592   4
emcpmpap              104320   4
emcpmpaa               71680   4
emcpmpc                91552   4
emcpmp                 55360   4
emcp                  542488   5  [emcphr emcpmpap emcpmpaa emcpmpc
emcpsf                  6788   0  [emcpmpap emcp]
openafs               562640   2
audit                  90520   2  (autoclean)
autofs                 13620   1  (autoclean)
e1000                  75808   1
iptable_filter          2412   1  (autoclean)
ip_tables              16544   1  [iptable_filter]
floppy                 57520   0  (autoclean)
sg                     37228   2  (autoclean)
microcode               6848   0  (autoclean)
loop                   12696   0  (autoclean)
lvm-mod                64864   0
keybdev                 2976   0  (unused)
mousedev                5624   0  (unused)
hid                    22276   0  (unused)
input                   6144   0  [keybdev mousedev hid]
usb-uhci               26956   0  (unused)
usbcore                80928   1  [hid usb-uhci]
ext3                   89960   7
jbd                    55060   7  [ext3]
qla2300               311580   4
aic79xx               187420   7
sd_mod                 13360  20
scsi_mod              112680   5  [emcpmpap emcpmpaa emcpmpc emcpmp
emcp emcpsf sg qla2300 aic79xx sd_mod]

             total       used       free     shared    buffers     cached
Mem:       4124056     460168    3663888          0      63128     153260
-/+ buffers/cache:     243780    3880276
Swap:      2040212          0    2040212
Linux myserver 2.4.21-15.0.4.ELsmp #1 SMP Sat Jul 31 01:25:25 EDT 2004
i686 i686 i386 GNU/Linux

Typical loads during imap test crashes:
CPU states:  cpu    user    nice  system    irq  softirq  iowait    idle
           total   60.0%    0.0%    6.8%   0.4%     6.4%   58.4%  266.4%
           cpu00   10.9%    0.0%    1.3%   0.5%     4.9%    8.3%   73.8%
           cpu01   13.9%    0.0%    1.7%   0.0%     0.9%   22.0%   61.2%
           cpu02   15.6%    0.0%    1.3%   0.0%     0.0%    7.5%   75.3%
           cpu03   19.8%    0.0%    2.3%   0.0%     0.7%   20.8%   56.1%

Machine is a dual xenon machine running hyperthreaded.

Comment 1 Larry Woodman 2004-11-24 02:46:07 UTC
Ryan, please get the system in the state you describe then get me
AltSysrq-M, AltSysrq-T and AltSystq-W outputs so I can see where the
memory is located and what each process and CPU are doing.

Larry Woodman

Comment 2 Tom Coughlan 2004-11-24 13:01:01 UTC
Also, please reproduce this without Powerpath, or any other modules
that taint the kernel.

Comment 3 Larry Woodman 2004-11-30 15:02:00 UTC
Ryan, any luck reproducing this problem without Powerpath?  

Sorry but I missed the fact that AltSysrq doesnt work.  If those keys
dont even work we really need to reproduce this without Powerpath so
we can see what is causing the system to hang.

Larry Woodman

Comment 4 Ryan Linn 2004-11-30 18:40:47 UTC
Sorry for the delay, I wanted to make sure that this ran for at least
48 hours before I said that it didn't happen again.  I thought that in
the past I'd re-created this without powerpath running, but that may
have been an earlier kernel.  When I tried a fresh install and no
powerpath installed the problem did not come back.  Sorry about that.
Any ideas how long it will be for the U4 kernel to be finished so that
we can encourage EMC to look at the problem ? 

Comment 5 Larry Woodman 2004-11-30 18:56:14 UTC
The RHEL3-U4 kernel is due to be released in about one month. 
However, we certainly can get EMC a kernel right away so they can
start looking into this issue.

Larry Woodman

Note You need to log in before you can comment on or make changes to this bug.