Red Hat Bugzilla – Bug 212777
frequent spontaneous lockups
Last modified: 2008-01-30 22:51:14 EST
Description of problem:
after upgrade to FC6, frequent spontaneous, complete lockups from desktop idle
and while using applications (browser, etc.). No information about lockup is
ever available in system log.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Wait 0.5-20 hours and it will happen
Complete lockup. Screen state freezes and no response to keyboard, mouse, or
... no lockup...
Tyan S2895 board recently upgraded to two dual core Opterons, memory upgraded as
well to 2 2GB DIMMs (4GB total). Appeared to work well after hardware upgrade
with FC5 2.6.18-1.2200, although it wasn't active that long.
Memory has been well tested with Memtest86+, without problems. Tried Nvidia
graphics driver and Xorg nv driver with same lockup results.
Not much to go on, unfortunately, other than OS and hardware changes.
I've observed lockups on a 2.4 GHz Core 2 Duo Intel machine, also with a Nvidia
graphics card (only used Xorg driver). There seems to be no problem when idle
(I've gone for a couple of days with no programs running without freezing up),
but with activity, e.g., trying to add rpm's with pirut, it rarely goes as much
as one hour without freezing up. What is interesting is that I've seen it
freeze up while being connected via ssh. When that happens, a number of
ordinary commands, e.g., top or less, give you "Input/output error" or "Bus
error" messages instead of any info. Trying to ssh in when it is frozen gives
the message: "ssh_exchange_identification: Connection closed by remote host".
It did respond properly to ping requests, even though it was frozen. When I
went to the console following the freezeup, no keystrokes seemed to be
recognized, i.e., no letters produced any output in the login window, and trying
for a text window with CTRL-ALT-F2 did nothing. Only a hardware reset enabled
me to reboot.
I've run the memtest86 to verify that there is no problem with the CPU,
motherboard or RAM.
I have observed this behavior as well using 2.6.18-1.2798.fc6.i686. I also had
it happen when the 2.6.18-1.2798.fc6.i586 kernel was installed. My hardware is
an Athlon-XP 2500+, Nvidia graphics card, and Asus A7N8X-deluxe nforce2
motherboard. Thinking that this was an acpi problem, I have tried booting with
acpi=off with no improvement in the situation. The only hint that I have seen
occurred the other morning when I woke-up my display to check email, and for
some reason httpd had eaten all the memory on the machine and other processes
were being killed. I have httpd running on a local home lan with only me
accessing it occassionally. This never occurred with fc5.
Went to the trouble of recovering FC5's 2.6.18-1.2200, and acheived lockup there
as well. Installed 2.6.17-1.2187 from FC5 and have not had any lockups/freezes
so far. But it _has_ taken longer to freeze before. We shall see.
Is this due to the problem 2.6.18 had over some part of September where the
backtrace search (for x86_64 only?) upon a kernel bug was itself buggy? I.e. is
the lockup really due to the reporting system, but precipitated by another
kernel bug I can't see? Or is 2.6.18-1.2200/1.2187 already patched for that?
I wouldn't think this is x86_64 specific since I am seeing it with an Athlon-XP
2500+. Then again, there could be two separate bugs with similar symptoms.
It's hard to tell with no evidence besides the outcome to go on :( .
Is there any way to force the kernel to run in uniprocessor mode instead of SMP
mode? Under FC5, I had a few machines that were hyper-threaded and wanted to
run under SMP, but crashed every few days when I used SMP. When I forced them
to boot in uniprocessor mode, then they never crashed.
(In reply to comment #3)
> Went to the trouble of recovering FC5's 2.6.18-1.2200, and acheived lockup there
> as well. Installed 2.6.17-1.2187 from FC5 and have not had any lockups/freezes
> so far. But it _has_ taken longer to freeze before. We shall see.
2.6.17-1.2187 is still running. I feel pretty confident now in concluding that
the 2.6.18 kernel is the lockup culprit, and not my hardware or other software.
It looks like the kernel shipped with FC6 is _not stable_ and a new release is
required for me to be able to use it, and for FC5 as well, since it has
transitioned to 2.6.18.
I'm going to try installing the latest kernel from kernel.org, which is
2.6.19-rc3-git8 at the time of this writing to see if it still lockups up with me.
Created attachment 139863 [details]
Output of lscpi
I had 2.6.19-rc3-git8 lockup on me this morning, so the problem seems to still
exist in the latest kernel. I will slowly now try to determine where this bug
entered the kernel by testing various versions. Could some people who are
having this problem post the output of lspci to try to determine if we have
common hardware. Mine is attached.
Created attachment 139871 [details]
Output of lspci
(In reply to comment #9)
> I had 2.6.19-rc3-git8 lockup on me this morning, so the problem seems to still
> exist in the latest kernel. I will slowly now try to determine where this bug
> entered the kernel by testing various versions. Could some people who are
> having this problem post the output of lspci to try to determine if we have
> common hardware. Mine is attached.
Other than the fact that the motherboard chipset is _made_ by NVIDIA there is
very little in common, as one would expect given the different processor
families that are supported. I guess the TYAN 2895 has a TI firewire chip
instead of NVIDIA...
I have now installed 2.6.18-rc5. Maybe this kernel isn't haunted like the
others. We'll see...
Andrew/Delamart, can you attach your dmesg outputs, and /var/log/Xorg.0.log files ?
Created attachment 139923 [details]
My Xorg.0.log file
Created attachment 139925 [details]
Here is my dmesg output, however this is with booting from a 2.6.18-rc5 kernel
that I compiled (using the .config from 2.7.18-1.2798). So far (3.5hrs), I
haven't had a lockup with this kernel. If you need me to reboot into
2.7.18-1.2798 and get that dmesg, let me know.
Created attachment 139988 [details]
Created attachment 139990 [details]
Xorg.0.log under 2.6.17-2187_FC5
Xorg.0.log under 2.6.17-2187_FC5 (some bad devices)
(In reply to comment #6)
> (In reply to comment #3)
> > Went to the trouble of recovering FC5's 2.6.18-1.2200, and acheived lockup there
> > as well. Installed 2.6.17-1.2187 from FC5 and have not had any lockups/freezes
> > so far. But it _has_ taken longer to freeze before. We shall see.
> 2.6.17-1.2187 is still running. I feel pretty confident now in concluding that
> the 2.6.18 kernel is the lockup culprit, and not my hardware or other software.
Looks like I must eat those words, since 2.6.17-1.2187 finally locked after
approximately 48 hours of running without trouble or reported errors. Then
again, and again. It appears less likely to freeze in idle, as I am usually
doing something on the desktop in a web browser or terminal when it happens.
Now I have found that my particular board had problems about a year ago (i.e.
August 2005) surrounding the first ethernet port, possibly some problems
involving BIOS APIC settings.
It is _hardly_ conclusive from spotty information but those who had more trouble
seem to have had faster procs (>1.8GHz) which is a transition I made in the
Various complaints about the "forcedeth" ethernet driver causing lockups/freezes
float about on the net, and some successes after bug fixes, but it is difficult
to pin down versions and dates.
I put a _little_ stress on the network port, but am unable with my current setup
to impose a lockup of the system. I've also forced the clock on the procs high
(performance governor), but that doesn't reliably lock things, although the
system locked _once_ while I was setting the governor.
It is interesting that you mention forcedeth may cause problems since I have
that as well. Although I don't have it connected (I use a wireless rt61 pci
card), the module for it is loaded.
Well, I rebooted into 2.7.18-1.2798, and removed the forcedeth module. I now
have an uptime of 9 hours, which is a pretty good sign, especially since I've
had it under high loads at some points today.
Well, of course not more than 10 minutes after I wrote the previous reply, I had
a lockup. I'll go back to trying to find the most recent kernel that doesn't
lockup on me.
I just had a 2.6.17 kernel lockup on me as well. That is odd, because I'm quite
sure I had a 2.6.17 kernel running in fc5 with no problems.
Created attachment 140366 [details]
My updated .config file
Well, I recompiled 2.6.19-rc3-git8 (which had locked up using the default fc6
.config) after removing a lot of config options I don't need. I now have an
uptime of 1 day 20 hours, so maybe the problem is now solved. I have attached
my .config. I plan on slowly adding things back to the way they were default to
figure out what is causing the problems.
I installed Kubuntu 6.10 (Edgy), and I have the exact same problem. Maybe this
is xorg related? Both FC6 and Kubuntu Edgy use xorg 7.1.
Might I suggest configuring kdump to see if you can get a vmcore when these
problems hit? Might better help to illustrate where the problem really is...
I've tried booting with the "noapic" flag and so far this seems to have solved
the problem for me.
(In reply to comment #25)
> Might I suggest configuring kdump to see if you can get a vmcore when these
> problems hit? Might better help to illustrate where the problem really is...
This sounds promising, but unfortunately after following the instructions I
can't get it to work. I'm using the FC6 kdump kernel. Forcing a crash produces
a syslog message that "Kexec: Warning: crash image not loaded" and when I go on
to manually run kexec to load the crash image (kexec -p ...) I get:
Invalid memory segment 0x1000000 - 0x1324fff
This seems to be a recent problem that many posters are having (maybe with
Please tell me if there's presently a way to get around this difficulty!
Adding the noapic flag in grub.conf *seemed* to solve the problem for me for a
while; I had this machine up for over 24 hrs w/o problem. Now, however, the
lockups are back, sometimes within less than an hour of eachother.
Created attachment 142082 [details]
.config for Kubuntu 6.10 that locks up like fc6
Created attachment 142083 [details]
.config for Kubuntu 6.10 that doesn't lock up
I am also getting frequent lockups. I have a Core 2 Duo e6300 @ 2.8 GHz,
Gigabyte 965P-DS3 motherboard, and an nVidia 7900GS. I get the lockup with the
CPU running at default speeds and overclocked, and with both the nv and nvidia
The issue has gone away with the replacement of the RAM for a completely new
uniform set of DIMMs. Memcheck86+ repeatedly passed the bad memory, so it was
an expensive guess to come to this successful conclusion.