Bug 426048

Summary: Frequent reports of BUG: soft lockup detected
Product: Red Hat Enterprise Linux 5 Reporter: Ben Webb <ben>
Component: kernelAssignee: Red Hat Kernel Manager <kernel-mgr>
Status: CLOSED NOTABUG QA Contact: Martin Jenner <mjenner>
Severity: medium Docs Contact:
Priority: low    
Version: 5.1CC: ursula
Target Milestone: rc   
Target Release: ---   
Hardware: ia64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-12-26 08:08:58 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Excerpt of /var/log/messages none

Description Ben Webb 2007-12-18 00:45:07 UTC
Description of problem:
We get frequent kernel bug errors in the logs on a newly-installed ia64 system
with 4 physical CPUs. These look like:
kernel: BUG: soft lockup detected on CPU#3!

See attached logfile for the full call traces. Each lockup results in the system
becoming unresponsive for a while, so this is obviously problematic for our servers.

Version-Release number of selected component (if applicable):
kernel-2.6.18-53.1.4.el5

How reproducible:
Occurs apparently randomly, every 10-15 minutes. The system is not heavily
loaded (CPU or disk IO) - in fact, it was only recently installed and is not yet
running any production services.

Steps to Reproduce:
1. Boot system normally.
  
Actual results:
System reports kernel bugs in the logs.

Expected results:
System boots and does not report kernel bugs in the logs.


Additional info:
We didn't see this problem with the original installed kernel (2.6.18-53.el5) -
the problem only started to occur once we 'yum update'd to EL5.1. Our temporary
workaround will be to downgrade the kernel - I'll follow up on this bug report
whether this fixes the problem for us.

This particular machine was previously running RHAS3 with no problems. (We
reinstalled, not upgraded, for RHEL5.)

Comment 1 Ben Webb 2007-12-18 00:45:07 UTC
Created attachment 289835 [details]
Excerpt of /var/log/messages

Comment 2 Ursula Pieper 2007-12-18 22:44:09 UTC
We have since added the kernel option "nosoftlockups", which resulted in
frequent spontaneous reboots of the server. 

We then downgraded to 2.6.18-53.el5. The server still has softlockups, but not
as frequent, when there is no load on the machine. 

Comment 3 Luming Yu 2007-12-25 00:58:16 UTC
Any chances to try base kernel also?

Comment 4 Ben Webb 2007-12-25 03:51:00 UTC
Sorry - I guess we forgot to update this. Turned out that some new memory had
recently been installed in that server, and it wasn't correctly seated. Once it
was correctly installed, the soft lockup problem disappeared. So I guess you can
close this bug report, unless you think that this is still a genuine kernel bug
in that it should have reacted differently to this misperforming hardware.

Comment 5 Luming Yu 2007-12-26 08:08:58 UTC
Thanks for the update. It sounds more like firmware's responsiblity to correctly
report physical memory map...