Red Hat Bugzilla – Bug 242385
complete system halt (unpredictable from my perspective)
Last modified: 2007-12-13 13:45:48 EST
Description of problem:
F7 hangs after some indeterminate time period (sometimes very quickly and
sometimes I get an hour or so before the complete hang - keyboard, screen, caps
lock light frozen on, num lock light frozen on)
no core dump in any obvious places
Version-Release number of selected component (if applicable):
f7 - brand new install - complete install (removed previous partitions)
reinstalled three times now -
since I don't know what's wrong, I can't predictably reproduce it - but it does
reproduce itself well enough
Steps to Reproduce:
1. Turn on machine
2. login in at Session login screen
3. perform regular work till complete system halt - right now, all I'm trying to
do is get updates and install software via add/remove menu item
hangs every single time - I've almost thrown the machine out the window - this
is my production laptop - the only reason I upgraded was because f7 installed
very smoothly on my desktop and ran very nicely
a working system
attached are two files - the output of dmesg and lspci -vvv
I am more than willing to help out - just tell me what to do
Created attachment 156038 [details]
dmesg > dmesg.txt
Created attachment 156039 [details]
lspci -vvv > lspci.txt
I have a similar problem on a Toshiba laptop that seems to relate to a broken
ACPI in the 2.6.21 kernel.
adding the 'acpi=off' kernel parameter seems to correct the problem, but
obviously deactivates ACPI.
can you try nohz=off instead? and see if that helps.
clocksource=acpi_pm may also be interesting.
I will try nohz=off but it would be nice to know what this is accomplishing.
I'm starting to wonder if it has anything to do with the ATI video card - this
card has always been a major hassle.
However, it was extremely gratifying to see that F7 identified it correctly and
seems to have installed a nicely working driver - that even gets the 1440x900
screen ratio right
As soon as I reboot I'll report back - well, after an hour or two to see if
anything is different.
OK - I added nohz=off and the system seems to be running just fine
now, could someone explain why this was a solution
I've also been affected by seemingly random freezes, on the same system which
worked perfectly on previous Fedora releases (and hasn't changed hardware since
FC6). It's not an immediate lockup, either - it's a slow death that takes about
15 seconds before everything finally seizes up. Nothing in the logs to indicate
Running newer test kernels didn't improve the situation (last tried
2.6.21-1.3209). Nor did using the untainted 'nv' display driver vs the closed
The only variable which seemed to affect crash frequency was CPU load. After
disabling my Folding@Home client from starting up (about 2 days after
installation) my uptime went from an annoying random couple of hours, to days...
until just a moment ago, which prompted me to search bugzilla again, and I'm
glad I did.
I will definitely test the nohz=off option upon my next boot, with F@H running
to stress it... just as soon as my raid resync is complete. So, hopefully
disabling the new tickless kernel feature does the trick - i'll report back
sooner rather than later if it doesn't. :)
And here's my smolt profile, if it helps to identify the thing in commmon:
nohz=off seems to have worked. it's been an "abnormally long" 10 hours of uptime
under full CPU load without the expected lockup (knock on wood).
I'll be keeping an eye on the davej kernel changelog's about fixes for this new
I've also had random freeze problems. Yesterday I rebooted 3 times in an hour!
I've an nvidia card with closed driver but the problem seems unrelated (see
Kevin's comment on ATI).
In the evening I'll try the nohz option and I'll report here the results.
I've the ide/sata led on from the boot but the HD is not working. When I open
and then close the DVD reader the led turns off. I don't know if this depends on
the same issue: the system hangs with either light is on or off.
[dome@mozart ~]$ uptime
14:10:05 up 3:31, 2 users, load average: 2.99, 3.00, 2.82
with the nohz option: no freezes.
Still remains the led on.
nohz=off is still necessary with kernel-2.6.21-1.3228.fc7 as I only managed a ~5
hour uptime without it.
The upgrade to kernel-2.6.21-1.3228.fc7 broke the system again. I am now
getting the randon hangs again. The system works for a while then "total
What other information can I provide to help resolve this?
i have the same problems:
i have tryed it with different kernels: (2.6.21-1.3232.fc7, 2.6.21-1.3228.fc7,
2.6.21-1.3194.fc7) and different boot options (noacpi, nohz=off, noapic,
highres=off) but nothing helps.
i think the xen kernel (2.6.20-2925.9.fc7xen) is the only kernel, which is
working without any problems, but I have tested it not very detailed.
my CPU: Intel(R) Core(TM)2 CPU 6400 @ 2.13GHz stepping 06
this is disillusioning. I'll post my smolt profile (although I thought it was
already sent in)
here's the smolt profile of the machine that hangs:
here's the smolt profile of another machine that doesn't hang (even though they
both have F7 installed)
The system still comes to a screeching halt sporadically. The nohz=off has
helped but not solved the problem.
What else can I do to help?
(In reply to comment #17)
> The system still comes to a screeching halt sporadically. The nohz=off has
> helped but not solved the problem.
> What else can I do to help?
Have you tried acpi=off
On my machine (Toshiba Satellite L10) the system hangs when the fan is activated
- hence the seemingly random nature of the hang.
Resetting the thermal trip points to higher values does give some relief, but as
soon as the machine hits the new trip point it hangs again.
Disabling ACPI prevents this, but obviously disables all ACPI functions.
tickless kernel-2.6.21-1.3255.fc7 still locks up. managed an ~8hr uptime.
back to using "nohz=off" for stability.
kernel-220.127.116.11-27.fc7 in updates-testing seems to have fixed the lockups; my
current uptime is 1 day, 10 hours. On all previous F7 2.6.21 kernels I needed
the "nohz=off" in order to prevent a random lockup after a couple hours - no longer.
I declare the .21 kernels the worst I've run into in a long while (on my
hardware) with the new libata & tickless ticking me off.
I've turned back to 2.6.20-1.2962.fc6 'cause I'm having problems.
I've to use the nohz option. The new libata can't drive correctly my hdd led,
I've problems reading DVD and I can't burn DVD disks!
The juju patch for firewire can't drive my camera!
I'm reviewing this bug as part of the kernel bug triage project, an attempt to
isolate current bugs in the fedora kernel.
I am CC'ing myself to this bug and will try and assist you in resolving it if I can.
There hasn't been much activity on this bug for a while. Could you tell me if
you are still having problems with the latest kernel? If so, you might want to
try the following:
# For boot related issues we need as much info as possible, so removing quiet
from the boot flags.
# Slowing down the speed of text output with boot_delay=1000 (the number may
need to be tweaked higher/lower to suit) may allow the user to take a digital
camera photo of the last thing on screen.
# Booting with vga=791 (or even just vga=1 if the video card won't support 791)
will put the framebuffer into high resolution mode to get more lines of text on
screen, allowing more context for bug analysis.
# initcall_debug will allow to see the last thing the kernel tried to initialise
before it hung.
# There are numerous switches that change which at times have proven to be
useful to diagnose failures by disabling various features.
* acpi=off is a big hammer, and if that works, narrowing down by trying
pci=noacpi instead may yield clues
* nolapic and noapic are sometimes useful
* Given it's new and still seeing quite a few changes, nohz=off may be worth
testing. (Though this is F7 and above only)
# If you get no output at all from the kernel, sometimes booting with
earlyprintk=vga can sometimes yield something of interest.
# If the kernel locks up with a 'soft lockup' report, booting with nosoftlockup
will disable this check allowing booting to continue.
If the problem has gone away then please close this bug or I'll do so in a few
days if there is no additional information lodged.
Closing INSUFFICIENT_DATA as indicated. Please re-open as required.