Bug 166974 - System hangs: problems with shared memory?
Summary: System hangs: problems with shared memory?
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel
Version: 3.0
Hardware: i386
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Dave Anderson
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2005-08-29 10:35 UTC by Terje Rosten
Modified: 2007-11-30 22:07 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2007-10-19 18:55:09 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Terje Rosten 2005-08-29 10:35:27 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; nb-NO; rv:1.7.10) Gecko/20050720 Fedora/1.0.6-1.1.fc4 Firefox/1.0.6

Description of problem:
In testing of a High Availability database we have see kernel related problems with shared memory.

The database is a heavy user of SVr4 shared memory.

In the process of taking down the database and freeing shared memory we has seen complete hangs of the system. On 4-ways (2 Xeon w/HT) systems we see two processes each eating up one CPU.

The commands

$ ipcs -s
$ ipcrm

then just hangs, however 

$ ipcs -m 

works. The two processes eating CPU can't be killed and we has to reboot to recover the machine.

On two-way systems we see more complete hangs as is't not possible to login, however we get answer from ping. We has to reboot to get the machine in shape.

We wonder if this is a timing/race problem in the shared memory kernel code as it's not possible to reproduce the problem at will, but happens from time to time. 

These machines are running RHEL AS 3 U4 with kernel-smp-2.4.21-27.EL, the hardware is Xeon 2.8 GHz and 1 GB RAM.

BTW: We have not seen these problems on Solaris 9 and 10 and Windows 2003 Enterprise Edition.

Version-Release number of selected component (if applicable):
kernel-smp-2.4.21-27.EL

How reproducible:
Sometimes

Steps to Reproduce:
See Description.


Additional info:

Comment 1 Dave Anderson 2005-08-29 19:40:51 UTC
Obviously we'll need more information to go on.

Preferably, can you set the machine up with netdump and/or diskdump,
and then forcibly crash the machine with alt-sysrq-c when the hang
occurs?

Short of that, when it's in that state, can you capture the output
of alt-sysrq-w?

Comment 2 RHEL Program Management 2007-10-19 18:55:09 UTC
This bug is filed against RHEL 3, which is in maintenance phase.
During the maintenance phase, only security errata and select mission
critical bug fixes will be released for enterprise products. Since
this bug does not meet that criteria, it is now being closed.
 
For more information of the RHEL errata support policy, please visit:
http://www.redhat.com/security/updates/errata/
 
If you feel this bug is indeed mission critical, please contact your
support representative. You may be asked to provide detailed
information on how this bug is affecting you.


Note You need to log in before you can comment on or make changes to this bug.