Red Hat Bugzilla – Bug 476609
kernels from Fedora 10 unusable unless 'clocksource=jiffies' supplied
Last modified: 2013-01-13 08:51:48 EST
Created attachment 327061 [details]
dmesg from 188.8.131.52-134.fc10 with 'acpi_pm_good clocksource=jiffies' parameters
Description of problem:
After an upgrade of an "Acer TravelMate 230" laptop a machine became unusable. A user-space was not starting, beginining with udev, without hitting on a keyboard for every step. Even when booted into level 3 with a default boot it was possible to loose about 40 clock minutes during one hour. With added acpi_pm_good (AFAICT acpi_pm should be good) and hpet=force the situation
improved somewhat. It was still necessary to coax a kernel from a keyboard through a boot sequence but time looked a bit more stable - for a while.
Without any good reason a machine may loose big chunks of it, in order
of tens of minutes, without any aparent reason.
Also after some relatively short time some commands would stop responding. For example after 'ls' was typed it may never return and attempts to kill it will not work although there will be nothing unusual in a process table, if you can get 'ps' still to work; 'ls' will likely be in "S" or "Ss" state and that is about it. The same 'ls' may still work if used from a network login although a situation will be deteriorating and after a while nothing will work but a power switch. /sys/devices/system/clocksource/clocksource0/available_clocksource shows 'tsc acpi_pm jiffies' and also 'hpet' if 'hpet=force' is used.
After some experiments it turned out that F10 kernels can be used only if 'clocksource=jiffies' is added to boot parameters. I have also 'acpi_pm_good' there and I do not know if skipping it would have bad effects. In any case - in this setup wall-clock behaves normally and there are no bad effects as described above.
An upgrade itself took well over 12 hours and did not really finish (but it was impossible to examine what really happened) even if it displayed "Reboot". yum-complete-transaction later had around 500 steps on its list. As I understand that now this was courtesy of a kernel used by anaconda.
Attached is dmesg from 184.108.40.206-134.fc10 in a "good" configuration. It does
not seem to be much of a difference when booting "bad" although I found
then lines like:
Marking TSC unstable due to TSC halts in idle
Clocksource tsc unstable (delta = 107801214414 ns)
intel8x0_measure_ac97_clock: measured 85204 usecs
while for a "good" configuration that last line will be
intel8x0_measure_ac97_clock: measured 51000 usecs
If 'hpet=force' is used then line like those will show up too:
hpet clockevent registered
hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0
hpet0: 3 64-bit timers, 14318180 Hz
only attempts to use that also came to naught.
/proc/cpuinfo shows all the time:
processor : 0
vendor_id : GenuineIntel
cpu family : 15
model : 2
model name : Mobile Intel(R) Celeron(R) CPU 2.00GHz
stepping : 7
cpu MHz : 1999.755
cache size : 256 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe up pebs bts cid xtpr
bogomips : 3999.51
clflush size : 64
even when user-space starts to "die" (but 'cat' still works).
I do not know if this is a more extreme manifestation of bug 476051 or this is another issue.
Version-Release number of selected component (if applicable):
always on that particular hardware
The laptop in question did not need any "clock magic" to run just fine
with all kernels from F8 distro.
I have seen F10 running on another, really different, i686 class SMP machine and there nothing like the above happens. Only tsc is used there as a default
clocksource, "available" reports the same 'tsc acpi_pm jiffies', and tsc related items in dmesg look like this:
TSC: PIT calibration confirmed by PMTIMER.
TSC: using PMTIMER calibration value
checking TSC synchronization [CPU#0 -> CPU#1]: passed.
Only that the first two "TSC" lines show exactly like that for all F10 kernels
on a stricken laptop regardless of used options.
Happens for me, too. FC10 is unusable without "clocksource=jiffies" kernel parameter. Thanks Michal for this workaround!
Which hardware information do you need for a useful bug report?
I seem to have hit this bug too... Sometimes my machine will stall until I move the mouse og hit the keyboard and things will progress for a few seconds. This off course, causes all sorts of problems and in general means that the system is unusable. So far, I'm not sure what has caused it to happen. I've seen it once in a while earlier, but the past few days have been more problematic than usual. closksource=jiffies has worked around my problems for a few hours now.
Nice... Just in the same second i hit "Commit" for the last comment, my entire ssytem froze again, until I moved the mouse around. Bummer!
I have an opposite experience: I needed 'tsc' clocksource for my systems. Several F10 systems (Dell hardware, one Intel 82Q35, running kernel 220.127.116.11; the other Intel 82Q963/Q965 based running 18.104.22.168) exhibit slow clock, even though they both run NTP the clock as given by 'date' goes very slowly or not at all.
/sys/devices/system/clocksource/clocksource0/current_clocksource was 'jiffies'
from among the available sources 'jiffies' and 'tsc'. I fixed the problem somehow by 'echo tsc > ....current_clocksource', but this is a) volatile (won't survive reboot) and b) didn't find the root cause. I am not sure why the HPET timer is not available (it wasn't even mentioned in the bootup messages).
> ... but this is a) volatile (won't survive reboot)
I would imagine that adding 'clocksource=tsc' to boot option in /etc/grub.conf would pick up the right clock for you and make it to "stick". Kernel updates preserve such things.
> I am not sure why the HPET timer is not available
BIOS? In the case from the original report it showed up only with 'hpet=force' boot option but trying to use that was not helpful.
A nasty thing is that in the past this was not an issue.
This message is a reminder that Fedora 10 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 10. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora
'version' of '10'.
Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version'
to a later Fedora version prior to Fedora 10's end of life.
Bug Reporter: Thank you for reporting this issue and we are sorry that
we may not be able to fix it before Fedora 10 is end of life. If you
would still like to see this bug fixed and are able to reproduce it
against a later version of Fedora please change the 'version' of this
bug to the applicable version. If you are unable to change the version,
please add a comment here and someone will do it for you.
Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.
The process we are following is described here:
Kernel 22.214.171.124-166.fc12.i686 from Fedora 12 does not require 'clocksource=jiffies' anymore. OTOH this needs to be replaced by 'nohz=off' or the laptop in question is becoming useless but this is a different story (bug 536721).