Description of problem:
We get frequent kernel bug errors in the logs on a newly-installed ia64 system
with 4 physical CPUs. These look like:
kernel: BUG: soft lockup detected on CPU#3!
See attached logfile for the full call traces. Each lockup results in the system
becoming unresponsive for a while, so this is obviously problematic for our servers.
Version-Release number of selected component (if applicable):
Occurs apparently randomly, every 10-15 minutes. The system is not heavily
loaded (CPU or disk IO) - in fact, it was only recently installed and is not yet
running any production services.
Steps to Reproduce:
1. Boot system normally.
System reports kernel bugs in the logs.
System boots and does not report kernel bugs in the logs.
We didn't see this problem with the original installed kernel (2.6.18-53.el5) -
the problem only started to occur once we 'yum update'd to EL5.1. Our temporary
workaround will be to downgrade the kernel - I'll follow up on this bug report
whether this fixes the problem for us.
This particular machine was previously running RHAS3 with no problems. (We
reinstalled, not upgraded, for RHEL5.)
Created attachment 289835 [details]
Excerpt of /var/log/messages
We have since added the kernel option "nosoftlockups", which resulted in
frequent spontaneous reboots of the server.
We then downgraded to 2.6.18-53.el5. The server still has softlockups, but not
as frequent, when there is no load on the machine.
Any chances to try base kernel also?
Sorry - I guess we forgot to update this. Turned out that some new memory had
recently been installed in that server, and it wasn't correctly seated. Once it
was correctly installed, the soft lockup problem disappeared. So I guess you can
close this bug report, unless you think that this is still a genuine kernel bug
in that it should have reacted differently to this misperforming hardware.
Thanks for the update. It sounds more like firmware's responsiblity to correctly
report physical memory map...