Bug 536721 - 'nohz=off' required on rawhide with 2.6.31.5-127.fc12.i686
Summary: 'nohz=off' required on rawhide with 2.6.31.5-127.fc12.i686
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 19
Hardware: All
OS: Linux
low
medium
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-11-11 03:29 UTC by Michal Jaegermann
Modified: 2013-04-05 16:49 UTC (History)
8 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2013-04-05 16:49:52 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
syslog from a normal boot (37.94 KB, text/plain)
2009-11-11 03:29 UTC, Michal Jaegermann
no flags Details
syslog from a boot with added 'nohz=off' (38.31 KB, text/plain)
2009-11-11 03:30 UTC, Michal Jaegermann
no flags Details
dmesg from boot of 2.6.33.6-147.2.4.fc13.i686 without 'nohz=off' (30.07 KB, text/plain)
2010-08-18 22:10 UTC, Michal Jaegermann
no flags Details
dmesg from 2.6.36-0.7.rc2.git0.fc15.i686 without 'nohz=off' (36.14 KB, text/plain)
2010-08-23 15:13 UTC, Michal Jaegermann
no flags Details

Description Michal Jaegermann 2009-11-11 03:29:35 UTC
Created attachment 368985 [details]
syslog from a normal boot

Description of problem:

A test with the current rawhide anaconda images using 2.6.31.5-127.fc12.i686 shows that on a test hardware an installation progress requires a constant pounding on a keyboard to get "unstuck" things.  This was definitely not the case with images from the end of October when 2.6.31.5-96.fc12.i686 kernel was used.  Moreover a "keyboard beep" becomes a few seconds long siren.  An attempt to reboot gets stuck on "waiting for mdraid sets to become clean" and in this moment any further progress becomes impossible.

In syslog one will find
 <4>Fast TSC calibration failed
 <6>TSC: PIT calibration matches PMTIMER. 1 loops
instead of
 <4>Fast TSC calibration using PIT

With 'nohz=off' added to boot options these nasties are going away.

Version-Release number of selected component (if applicable):
anaconda 12.46
2009-11-10 images

How reproducible:
every time

Additional info:
Acer TravelMate 230 laptop used in testing

Comment 1 Michal Jaegermann 2009-11-11 03:30:27 UTC
Created attachment 368986 [details]
syslog from a boot with added 'nohz=off'

Comment 2 Chris Lumens 2009-11-11 03:40:41 UTC
These messages likely have nothing to do with any problem in anaconda, and are more likely caused by a change in kernel versions.  Reassigning.

Comment 3 Adam Williamson 2009-11-11 19:27:26 UTC
can you please provide the same log from the last working kernel for comparison? thanks.

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 4 Adam Williamson 2009-11-11 19:44:54 UTC
Vedran: why did you silently change my settings on this bug? mjg has told me in the past that he is interested in timer bugs and wishes them to be assigned to him, it _is_ a regression as far as the user's concerned (it did not happen in kernel 96). severity is debatable, but I don't see that you have the right to override my call with no justification.

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 5 Michal Jaegermann 2009-11-11 20:06:06 UTC
> can you please provide the same log from the last working kernel for
> comparison?

Not from the same hardware unfortunately.  It will likely look similar to what you see with 'nohz=off' and is attached.

Attached outputs result from testing anaconda installation images; which I am trying to do from time to time.  The laptop in question still runs in this moment Fedora 10 (where 'clocksource=jiffies' is required or you are practically goner; c.f. still NEW bug 476609).  I overwrote older images and they are not on mirrors too.

As for a "debatable severity" this is not a very big deal for _me_.  It is likely a different story with a "newbie" attempting to install on a similarly affected hardware.  At the end of October this "just worked".

Comment 6 Adam Williamson 2009-11-11 20:19:23 UTC
output from different hardware is worthless. I wanted to see the log from the affected system with -96 because it may _not_ be identical to the log with nohz=off ; it's not like we suddenly enabled the tickless timer between -96 and -127 or anything, so I want to know what _else_ has changed which has suddenly exposed this issue on your machine. The logs might tell us that.

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 7 Michal Jaegermann 2009-11-11 20:32:09 UTC
> output from different hardware is worthless.

Well, yes.  So I did not provide it. :-)

> ... it may _not_ be identical ...

I agree.  Only that I do not have a way to provide it. Still both kernels were   
2.6.31.5-<something>.  I know that for sure as I happen to have logs from a different hardware so this is a reference point.

Comment 8 Bug Zapper 2009-11-16 15:25:23 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 12 development cycle.
Changing version to '12'.

More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 9 Michal Jaegermann 2009-11-19 04:32:37 UTC
I just got some .iso files and found that 2.6.31.5-127.fc12 with this bug was used on distribution images for Fedora 12. Oh, great!  Some will have an extra "fun" when trying installs/updates.  http://fedoraproject.org/wiki/Bugs/Common does not mention the issue.

Comment 10 Adam Williamson 2009-11-19 07:00:52 UTC
well, yeah, we'd already decided on the final package set when you reported this, it was far too late to change anything. this isn't a 'common' bug, since it's precisely hardware-specific, and it's not considered a blocker (there have been issues of this kind with every release since the tickless timer was enabled by default).

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 11 Michal Jaegermann 2009-12-13 02:32:36 UTC
Current Fedora 12 kernels, and 2.6.31.6-166.fc12.i686 in particular, suffer from the same affliction.  This replaces requirement of 'clocksource=jiffies' for Fedora 10 kernels (cf. bug 476609).

Comment 12 Aram Agajanian 2010-02-23 17:41:59 UTC
I have encountered the same bug on a Dell Optiplex 760 PC.

The computer seems to be running OK with the nohz=off kernel argument.

Comment 13 Adam Williamson 2010-02-23 21:03:44 UTC
aram: please file a new bug. Each instance of this problem is hardware-specific and needs to be tracked separately.

Comment 14 Michal Jaegermann 2010-03-06 20:58:25 UTC
I tried if it would be possible to drop nohz=off for the latest updates kernel-2.6.32.9-67.fc12.i686  and a hardware from this report (Acer TravelMate 230 notebook).  If anything this got even worse.  A boot was moving forward until initramfs was loaded.  At that moment everything was stopped and nothing happened for many minutes until I lost patience and powered down then whole thing.

With nohz=off luckily it does boot.

Comment 15 Scott Robbins 2010-03-15 23:11:52 UTC
The other workaround is to hit enter a few times when it seems to freeze, as discovered by someone on the forums.

http://forums.fedoraforum.org/showthread.php?t=242122

While, in theory, a relatively unimportant bug, it's already driven one person away on the testing list.  

In my case, it only affected one machine, an Acer 4720z with an integrated Intel Mobile GM965/960, the same or nearly the same card as the person on the forums.

Comment 16 Adam Williamson 2010-03-16 00:39:41 UTC
Doesn't have to be enter, any key will do (I use space bar). The graphics card has nothing to do with it. Please file a new bug for each system affected by this problem, the fix can be different for every system even if the symptom is the same.



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 17 Michal Jaegermann 2010-03-16 01:46:39 UTC
(In reply to comment #15)
> The other workaround is to hit enter a few times when it seems to freeze, as
> discovered by someone on the forums.

It does not help very much on my machine. That means that I can move with
an installation if I am constantly generate some keyboard interrupts but it would take a few days to complete such process and results would be unusable and a system clock entirely on a wild side.

In any case as noted in comment #14 a presence of kernel-2.6.32.9-67.fc12.i686 does not improve this situation.  Without 'nohz=off' my Acer simply does not boot after getting stuck inside of initramfs.

Comment 18 Michal Jaegermann 2010-08-16 16:31:18 UTC
'nohz=off' is still required with current Fedora 13 kernels - like, more specific, 2.6.33.6-147.2.4.fc13.i686.  Strictly speaking it is now sometimes possible to "somewhat" boot without this parameter, and even without pounding on a keyboard too much, but then X will not start, or a machine will decide that it is overheating and it will shut off, or both, or something else of that sort.  Even without an automatic shutdown one can hear from time to time fans trying to commit a suicide with overrevs.  In any case a boot is not reliable and prone to hangs in udev. Nothing of that sort if 'nohz=off' is used.

A possibly related could be the following fragment from dmesg (that from a boot with 'nohz=off'):

ACPI: Core revision 20091214
Enabling APIC mode:  Flat.  Using 1 I/O APICs
..TIMER: vector=0x30 apic1=0 pin1=0 apic2=-1 pin2=-1
..MP-BIOS bug: 8254 timer not connected to IO-APIC
...trying to set up timer (IRQ0) through the 8259A ...
..... (found apic 0 pin 0) ...
....... works.

I am afraid that I have the latest BIOS for that particular mobo I was able to find.

Comment 19 Adam Williamson 2010-08-16 18:08:12 UTC
i'm starting to wonder if we need to file these upstream or pay someone to look at them or something, they don't seem to get any traction :( mine's been open for ages now.

Comment 20 Chuck Ebbert 2010-08-17 11:13:46 UTC
(In reply to comment #19)
> i'm starting to wonder if we need to file these upstream or pay someone to look
> at them or something, they don't seem to get any traction :( mine's been open
> for ages now.

Which bug is that?

Comment 21 Adam Williamson 2010-08-17 14:53:16 UTC
https://bugzilla.redhat.com/show_bug.cgi?id=516870



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 22 Michal Jaegermann 2010-08-18 22:10:16 UTC
Created attachment 439524 [details]
dmesg from boot of 2.6.33.6-147.2.4.fc13.i686 without 'nohz=off'

dmesg output here is from a boot to level 1.  When trying to boot to level 3 or 5 something invariably gets firmly stuck and a boot never finishes.

During these attempts not only fans are hitting maximal revs, pretty quickly even while booting but in an intermittent manner, but a "scientific" test of keeping fingers behind fan exhaust indicates that indeed a really hot air is expelled.  That never happens when 'nohz=off' is used.

Even if a laptop booted that way shutting it down cleanly is really difficult and likely impossible without quite a few extra keyboard interrupts.

/sys/devices/system/clocksource/clocksource0/{available,current}_clocksource
both give 'acpi_pm'; regardless if booted without 'nohz=off' or with.

Comment 23 Chuck Ebbert 2010-08-23 13:12:44 UTC
Can you try 2.6.36-rc2 from rawhide?

Comment 24 Michal Jaegermann 2010-08-23 15:11:17 UTC
(In reply to comment #23)
> Can you try 2.6.36-rc2 from rawhide?

In three attempts to boot 2.6.36-0.7.rc2.git0.fc15.i686 without 'nohz=off' two got stuck, fast, in "Starting udev" and one later in "Retrigger failed udev events".  In this second case it was possible to force a progress with a keyboard interrupt (although start of various services was really slow).  A 'reboot', in this one case when I reached a shell prompt, was not really moving anywhere without a constant "help" from a keyboard and actually powered down a laptop instead of restarting it.  dmesg from this one case when boot finished is attached (although the only notable thing seems to be an infamous "rcu_dereference_check() without protection").

Booting the same kernel with 'nohz=off' does not show of any symptoms above and 'reboot' acts really as a reboot.

Does plymouth have different requirements for 2.6.36?  With 'rhgb quiet' a graphics background is successively erased by what looks like scrolling blocks of black-on-black text.  Just a cosmetics but this does not happen when booting a "regular" 2.6.33.6-147.2.4.fc13.i686.

Comment 25 Michal Jaegermann 2010-08-23 15:13:00 UTC
Created attachment 440409 [details]
dmesg from 2.6.36-0.7.rc2.git0.fc15.i686 without 'nohz=off'

Comment 26 Bug Zapper 2010-11-04 06:34:01 UTC
This message is a reminder that Fedora 12 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 12.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '12'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 12's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 12 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 27 Michal Jaegermann 2010-11-04 07:28:10 UTC
(In reply to comment #26)
> This message is a reminder that Fedora 12 is nearing its end of life.

Well, comment 24 was describing problems with 2.6.36-0.7.rc2.git0.fc15.i686. The last "release" kernel I had an opportunity to try was 2.6.33.6-147.2.4.fc13.i686
and it required 'nohz=off' or weird things were happening.  Currently the machine which was displaying that behaviour is "out-in-the-field" and it should return in the end of December.

Comment 28 Aram Agajanian 2012-08-03 17:00:58 UTC
I had been using nohz=off on a Dell Optiplex 760 PC since 2010 but it no longer prevented stuttering boots with a new EL6.3 kernel.  I found that the problem went away after I updated the BIOS.  nohz=off is no longer required on that computer.

Comment 29 Fedora End Of Life 2013-04-03 18:42:07 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 19 development cycle.
Changing version to '19'.

(As we did not run this process for some time, it could affect also pre-Fedora 19 development
cycle bugs. We are very sorry. It will help us with cleanup during Fedora 19 End Of Life. Thank you.)

More information and reason for this action is here:
https://fedoraproject.org/wiki/BugZappers/HouseKeeping/Fedora19


Note You need to log in before you can comment on or make changes to this bug.