Red Hat Bugzilla – Bug 176997
kernel-smp-2.6.14-1.1653_FC4smp locks up hard
Last modified: 2015-01-04 17:24:04 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.12) Gecko/20050923 Fedora/1.7.12-1.5.1
Description of problem:
I am observing hard kernel hangers/lockup with kernel-smp-2.6.14-1.1653_FC4 shortly (<1min) after booting up the system.
Symptoms are: Shortly after booting the system doesn't react anymore. Neither console input/loggins nor remote loggins are possible.
Unfortunately /var/log/messages doesn't provide any helpful information related to the breakdown. No oops nor other indication of what might be going wrong.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
2. wait for a minute
=> system is inaccessable.
This is an old dual PII/266MHz, SCSI-only system.
All previous FC4 kernel-smp kernels up to kernel-smp-2.6.14-1.1644_FC4, I had installed, seemed to haved worked flawlessly (currenly running 2.6.14-1.1644_FC4).
can you try booting with pci=noacpi ? (or if that doesn't work, acpi=off)
any chance you can boot (making sure 'quiet' is removed from the boot command
line, and let us know the last few things that appear on the screen ? (Even a
digital camera pic would be useful).
(In reply to comment #1)
> can you try booting with pci=noacpi ?
> (or if that doesn't work, acpi=off)
Nope, neither pci=noacpi nor acpi=off, seem to help.
> any chance you can boot (making sure 'quiet' is removed from the boot command
> line, and let us know the last few things that appear on the screen ? (Even a
> digital camera pic would be useful).
Well, I would have done so, if there was anything useful.
In most cases, the system boots up normally and ends up with a normal console
login. Then, after some time of "seemingly normal operation", the system becomes
completely non-responsive. Everything seems "frozen", not even a "3 finger
In some (less frequently), the system hangs while booting.
After rebooting into an older kernel, /var/log/messages doesn't show anything
unusual concerning the "hanger", no oops, no errors, no warnings - just normal logs.
However, meanwhile I am suspecting (beware: wild guess!) something related to
networking, because these "hangers" always seem occur during network access.
In those cases, it hangs while booting, the last boot msg in most cases is
autofs's, some nfs or yp daemon's startup message.
In those cases, it hangs after a successful bootup, I can (almost)
deterministically cause the system to hang by logging in from remote and running
"yum update" as root.
Network driver problem? Compiler miscompiling driver?
have you tried running memtest86 over this box for a while ? A large percentage
of hangs we get reported turn out to be bad ram.
something that may trigger a backtrace when it hangs is booting with nmi_watchdog=1
(In reply to comment #3)
> have you tried running memtest86 over this box for a while ?
Not recently. However, this machine (ca. 8 years old) has had almost every
Fedora and RHL kernels since RH-8.0 installed, and (except of occasional kernel
bugs) so far has been rock-solid.
It currently is running 2.6.14-1.1644_FC4smp without any problems ;)
> something that may trigger a backtrace when it hangs is booting with
I can give this a try.
For the record: 2.6.14-1.1656_FC4smp exposes this issue, too.
could you try out the kernel just pushed out to updates-testing too please ?
(In reply to comment #5)
> could you try out the kernel just pushed out to updates-testing too please ?
Initial results (uptime 1 hour) look promissing: The box survived several boot
ups, an e2fsck during bootup, a "yum update" and shoveling around several megs
of data over the network.
Diffing the dmesg of *1644, *1653 and 1824 kernels shows some presumably
noteworthy differences related to DMA and APIC (This box is known to have a
"broken" APIC implementation - Yes, I mean APIC not ACPI).
The only thing related to the NIC, I can spot, is this:
8139too Fast Ethernet driver 0.9.27
-eth0: RealTek RTL8139 at 0xe800, 00:0b:2b:00:c0:9d, IRQ 185
+eth0: RealTek RTL8139 at 0xd0818000, 00:0b:2b:00:c0:9d, IRQ 185
Any explanation for the lockups with 165* ?
the RTL8139 diff is because we enabled memory mapped IO, which is faster, and
should be stable, I'd be surprised if that was causing lockups, though its
could you paste the other diffs from the two dmesg's ? There have been some
changes in the area of APIC, but unless you're passing boot command line
options, they should make no difference.
Created attachment 123221 [details]
dmseg of booting with 2.6.14-1.1644_FC4smp
Created attachment 123222 [details]
dmesg of booting 2.6.14-1.1653_FC4 with acpi=off
Created attachment 123223 [details]
dmesg of booting with 2.6.15-1.1824_FC4smp
(In reply to comment #7)
> could you paste the other diffs from the two dmesg's ?
I've added the dmesg's of booting the system with those different kernels being
discussed. Attachment #123222 [details] contains the dmesg of a boot that hung shortly
afterwards, both other worked without major problems.
[BTW: Uptime with *1824_FC4smp now: 2.5 days]
This is a mass-update to all currently open kernel bugs.
A new kernel update has been released (Version: 2.6.15-1.1830_FC4)
based upon a new upstream kernel release.
Please retest against this new kernel, as a large number of patches
go into each upstream release, possibly including changes that
may address this problem.
This bug has been placed in NEEDINFO_REPORTER state.
Due to the large volume of inactive bugs in bugzilla, if this bug is
still in this state in two weeks time, it will be closed.
Should this bug still be relevant after this period, the reporter
can reopen the bug at any time. Any other users on the Cc: list
of this bug can request that the bug be reopened by adding a
comment to the bug.
If this bug is a problem preventing you from installing the
release this version is filed against, please see bug 169613.
Sounds like it was fixed with 2.6.15-1.1824_FC4smp.