Bug 242385 - complete system halt (unpredictable from my perspective)
complete system halt (unpredictable from my perspective)
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
i686 Linux
low Severity urgent
: ---
: ---
Assigned To: Kernel Maintainer List
Brian Brock
Depends On:
  Show dependency treegraph
Reported: 2007-06-03 17:44 EDT by Kevin Crocker
Modified: 2007-12-13 13:45 EST (History)
6 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2007-12-13 13:45:48 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
dmesg > dmesg.txt (24.37 KB, text/plain)
2007-06-03 17:44 EDT, Kevin Crocker
no flags Details
lspci -vvv > lspci.txt (10.84 KB, text/plain)
2007-06-03 17:46 EDT, Kevin Crocker
no flags Details

  None (edit)
Description Kevin Crocker 2007-06-03 17:44:34 EDT
Description of problem:
F7 hangs after some indeterminate time period (sometimes very quickly and
sometimes I get an hour or so before the complete hang - keyboard, screen, caps
lock light frozen on, num lock light frozen on)

no core dump in any obvious places

Version-Release number of selected component (if applicable):
f7 - brand new install - complete install (removed previous partitions)
reinstalled three times now - 

How reproducible:
since I don't know what's wrong, I can't predictably reproduce it - but it does
reproduce itself well enough

Steps to Reproduce:
1. Turn on machine
2. login in at Session login screen
3. perform regular work till complete system halt - right now, all I'm trying to
do is get updates and install software via add/remove menu item
Actual results:
hangs every single time - I've almost thrown the machine out the window - this
is my production laptop - the only reason I upgraded was because f7 installed
very smoothly on my desktop and ran very nicely

Expected results:
a working system

Additional info:
attached are two files - the output of dmesg and lspci -vvv

I am more than willing to help out - just tell me what to do
Comment 1 Kevin Crocker 2007-06-03 17:44:34 EDT
Created attachment 156038 [details]
dmesg > dmesg.txt
Comment 2 Kevin Crocker 2007-06-03 17:46:00 EDT
Created attachment 156039 [details]
lspci -vvv > lspci.txt
Comment 3 Mike Thompson 2007-06-05 08:56:56 EDT
I have a similar problem on a Toshiba laptop that seems to relate to a broken 
ACPI in the 2.6.21 kernel.
adding the 'acpi=off' kernel parameter seems to correct the problem, but 
obviously deactivates ACPI.
Comment 4 Dave Jones 2007-06-05 17:29:23 EDT
can you try nohz=off instead? and see if that helps.
clocksource=acpi_pm may also be interesting.
Comment 5 Kevin Crocker 2007-06-05 18:42:10 EDT
I will try nohz=off    but it would be nice to know what this is accomplishing.

I'm starting to wonder if it has anything to do with the ATI video card - this 
card has always been a major hassle.

However, it was extremely gratifying to see that F7 identified it correctly and 
seems to have installed a nicely working driver - that even gets the 1440x900 
screen ratio right

As soon as I reboot I'll report back - well, after an hour or two to see if 
anything is different.
Comment 6 Kevin Crocker 2007-06-06 00:02:33 EDT
OK - I added nohz=off and the system seems to be running just fine

now, could someone explain why this was a solution
Comment 7 Jason Farrell 2007-06-06 04:40:24 EDT
I've also been affected by seemingly random freezes, on the same system which
worked perfectly on previous Fedora releases (and hasn't changed hardware since
FC6). It's not an immediate lockup, either - it's a slow death that takes about
15 seconds before everything finally seizes up. Nothing in the logs to indicate
what happened.

Running newer test kernels didn't improve the situation (last tried
2.6.21-1.3209). Nor did using the untainted 'nv' display driver vs the closed
'nvidia' kmod.

The only variable which seemed to affect crash frequency was CPU load. After
disabling my Folding@Home client from starting up (about 2 days after
installation) my uptime went from an annoying random couple of hours, to days...
until just a moment ago, which prompted me to search bugzilla again, and I'm
glad I did.

I will definitely test the nohz=off option upon my next boot, with F@H running
to stress it... just as soon as my raid resync is complete. So, hopefully
disabling the new tickless kernel feature does the trick - i'll report back
sooner rather than later if it doesn't. :)
Comment 8 Jason Farrell 2007-06-06 04:42:37 EDT
And here's my smolt profile, if it helps to identify the thing in commmon:

Comment 9 Jason Farrell 2007-06-06 13:51:56 EDT
nohz=off seems to have worked. it's been an "abnormally long" 10 hours of uptime
under full CPU load without the expected lockup (knock on wood).

I'll be keeping an eye on the davej kernel changelog's about fixes for this new
dyntick stuff.
Comment 10 Domenico Ferrari 2007-06-08 03:56:11 EDT
I've also had random freeze problems. Yesterday I rebooted 3 times in an hour!
I've an nvidia card with closed driver but the problem seems unrelated (see
Kevin's comment on ATI).
In the evening I'll try the nohz option and I'll report here the results.

I've the ide/sata led on from the boot but the HD is not working. When I open
and then close the DVD reader the led turns off. I don't know if this depends on
the same issue: the system hangs with either light is on or off.
Comment 11 Domenico Ferrari 2007-06-09 08:13:11 EDT
[dome@mozart ~]$ uptime
 14:10:05 up  3:31,  2 users,  load average: 2.99, 3.00, 2.82

with the nohz option: no freezes.

Still remains the led on.
Comment 12 Jason Farrell 2007-06-14 09:55:39 EDT
nohz=off is still necessary with kernel-2.6.21-1.3228.fc7 as I only managed a ~5
hour uptime without it.
Comment 13 Kevin Crocker 2007-06-16 17:09:23 EDT
The upgrade to kernel-2.6.21-1.3228.fc7 broke the system again. I am now 
getting the randon hangs again. The system works for a while then "total 

What other information can I provide to help resolve this?
Comment 14 Johannes Stummer 2007-06-17 11:44:00 EDT
i have the same problems:
i have tryed it with different kernels: (2.6.21-1.3232.fc7, 2.6.21-1.3228.fc7,
2.6.21-1.3194.fc7) and different boot options (noacpi, nohz=off, noapic,
highres=off) but nothing helps.

i think the xen kernel (2.6.20-2925.9.fc7xen) is the only kernel, which is
working without any problems, but I have tested it not very detailed.

my CPU: Intel(R) Core(TM)2 CPU 6400  @ 2.13GHz stepping 06
Comment 15 Kevin Crocker 2007-06-17 11:53:29 EDT
this is disillusioning. I'll post my smolt profile (although I thought it was 
already sent in)
Comment 16 Kevin Crocker 2007-06-17 12:19:20 EDT
here's the smolt profile of the machine that hangs:


here's the smolt profile of another machine that doesn't hang (even though they 
both have F7 installed)

Comment 17 Kevin Crocker 2007-06-25 20:27:02 EDT
The system still comes to a screeching halt sporadically. The nohz=off has
helped but not solved the problem.

What else can I do to help?
Comment 18 Mike Thompson 2007-06-26 02:19:20 EDT
(In reply to comment #17)
> The system still comes to a screeching halt sporadically. The nohz=off has
> helped but not solved the problem.
> What else can I do to help?

Have you tried acpi=off
On my machine (Toshiba Satellite L10) the system hangs when the fan is activated
- hence the seemingly random nature of the hang.
Resetting the thermal trip points to higher values does give some relief, but as
soon as the machine hits the new trip point it hangs again.
Disabling ACPI prevents this, but obviously disables all ACPI functions.
Comment 19 Jason Farrell 2007-07-07 15:21:26 EDT
tickless kernel-2.6.21-1.3255.fc7 still locks up. managed an ~8hr uptime.

back to using "nohz=off" for stability.
Comment 20 Jason Farrell 2007-07-21 02:18:33 EDT
kernel- in updates-testing seems to have fixed the lockups; my
current uptime is 1 day, 10 hours. On all previous F7 2.6.21 kernels I needed
the "nohz=off" in order to prevent a random lockup after a couple hours - no longer.

I declare the .21 kernels the worst I've run into in a long while (on my
hardware) with the new libata & tickless ticking me off.
Comment 21 Domenico Ferrari 2007-07-21 04:05:26 EDT
I've turned back to 2.6.20-1.2962.fc6 'cause I'm having problems.
I've to use the nohz option. The new libata can't drive correctly my hdd led,
I've problems reading DVD and I can't burn DVD disks!
The juju patch for firewire can't drive my camera!

Comment 22 Christopher Brown 2007-09-13 16:12:52 EDT

I'm reviewing this bug as part of the kernel bug triage project, an attempt to
isolate current bugs in the fedora kernel.


I am CC'ing myself to this bug and will try and assist you in resolving it if I can.

There hasn't been much activity on this bug for a while. Could you tell me if
you are still having problems with the latest kernel? If so, you might want to
try the following:

# For boot related issues we need as much info as possible, so removing quiet
from the boot flags.

# Slowing down the speed of text output with boot_delay=1000 (the number may
need to be tweaked higher/lower to suit) may allow the user to take a digital
camera photo of the last thing on screen.

# Booting with vga=791 (or even just vga=1 if the video card won't support 791)
will put the framebuffer into high resolution mode to get more lines of text on
screen, allowing more context for bug analysis.

# initcall_debug will allow to see the last thing the kernel tried to initialise
before it hung.

# There are numerous switches that change which at times have proven to be
useful to diagnose failures by disabling various features.

* acpi=off is a big hammer, and if that works, narrowing down by trying
pci=noacpi instead may yield clues
* nolapic and noapic are sometimes useful
* Given it's new and still seeing quite a few changes, nohz=off may be worth
testing. (Though this is F7 and above only) 

# If you get no output at all from the kernel, sometimes booting with
earlyprintk=vga can sometimes yield something of interest.

# If the kernel locks up with a 'soft lockup' report, booting with nosoftlockup
will disable this check allowing booting to continue.

If the problem has gone away then please close this bug or I'll do so in a few
days if there is no additional information lodged.

Comment 23 Christopher Brown 2007-12-13 13:45:48 EST
Closing INSUFFICIENT_DATA as indicated. Please re-open as required.

Note You need to log in before you can comment on or make changes to this bug.