Red Hat Bugzilla – Bug 630781
systemd hangs on "Clocksource tsc unstable" error and causes the system to freeze after cpu-scaling detection
Last modified: 2010-09-10 17:16:52 EDT
Description of problem:
Version-Release number of selected component (if applicable): Branched-14, linux 220.127.116.11-12.fc14.x86_64
100% on specific hardware. I could only reproduce it with an AMD Turion-X2 laptop, but not with any Intel CPU.
Some reports indicate that it may be caused by wifi drivers or graphic input drivers, but here it seems to be directly CPU-related. (See also 605430, although this one seems worse.)
Steps to Reproduce:
1. Install Fedora (DVD or netinstall, gnome or KDE, whatever)
2. Boot newly installed system.
3. Wait for about 20 seconds. (Or a couple hours, for that matter.)
System hangs just after having detected the available CPU frequencies, with the "Clocksource tsc unstable" message. System is left unusable: no tty, no rescue shell, no X (obviously). Keyboard unresponsive, Sysrq keys don't work either.
- Adding clocksource=acpi_pm boot option does NOT solve the bug, nor does any other clocksource option. The only workaround I could find was adding "notsc noapic nolapic" (*all three*, otherwise the bug still happens).
- Of course, this does also disable any sort of frequency scaling, so it is not a long-term solution.
- And since it happens on laptops, fiddling with the BIOS is unfortunately not an option.
Well, it should boot...
A newly-installed system should at least present the user with a command line prompt. Of course, F14 is in alpha stage, but this should be addressed soon. Current systemd bugs leave the user with a rescue shell, which is not ideal but much preferable to a complete system freeze.
Other distros boot fine and cpu-scaling works on the same hardware (with the same kernel, e.g. Mandriva Cooker). Previous versions of Fedora (F10-F12) used to work fine as well.
(In reply to comment #0)
> - Adding clocksource=acpi_pm boot option does NOT solve the bug, nor does any
> other clocksource option. The only workaround I could find was adding "notsc
> noapic nolapic" (*all three*, otherwise the bug still happens).
OK, further investigation established that it's actually (yet another) systemd-related bug.
When using upstart instead of systemd, the "Clocksource tsc unstable" error message is still printed, but the system doesn't freeze and proceeds on booting without further annoyances.
I'm updating the description accordingly; if systemd integration doesn't make it into F14, I guess that won't be much of a problem.
Do you get any further output if you remove 'rhgb quiet' from the boot arguments?
(In reply to comment #2)
> Do you get any further output if you remove 'rhgb quiet' from the boot
That (and disabling kms) was my first guess as well, but the messages still look pretty much the same.
When booting using /sbin/upstart, the "clocksource tsc unstable" warning is still printed, but it doesn't prevent the system from booting (it doesn't even cause any noticeable delay).
When using systemd, it seems like it's waiting for something to complete but doesn't catch any return code or whatever, leaving the system frozen (it doesn't even react to Sysrq keys, even though I've enabled these using sysctl).
Now, since we're talking AMD and cpu-scaling, perhaps it's the powernow-k8 module that doesn't get along well with systemd?
(In reply to comment #3)
> Now, since we're talking AMD and cpu-scaling, perhaps it's the powernow-k8
> module that doesn't get along well with systemd?
Then disable it using "chkconfig cpuspeed off" to test this theory.
Also try booting with "systemd.log_level=debug" to get more information about what systemd is doing.
Well, systemd does not do anything weird with clocks, so I'd assume if there's a problem here, then it's probably just coincidence that this msg is printed and not an indication of the error.
Could you please try to boot into single user mode? Does that work?
Could you add "systemd.log_level=debug" to your kernel command line and boot with that? Could you please attach the output it generates here?
if you install sshd, can you log in remotely into your machine during that hang?
Discussed at 2010-09-10 blocker review meeting. We cannot determine if this is a blocker without the information requested by Lennart. Reporter, if you could provide that ASAP, it would be appreciated. We will likely drop this bug from Beta blocker consideration if more details are not available at the next meeting.
Adam: you're absolutely right. Sorry for not having answered sooner but I was no longer able to reproduce the bug, and needed time to investigate (with a new, fresh install and the log_level=debug option).
As it turns out, the issue is a lot less serious than what I feared. Long story short: it's a bit of bad luck, and something of a coincidence as Lennart suspected.
The system *does* still freeze upon first boot. However it seems to be a hardware-related problem (probably something to do with a defective USB integrated device, this is a known issue in every distro out there).
- Using upstart instead of systemd, or disabling cpu-scaling altogether, does improve the situation.
- That being said, after I applied Lennart's trick from http://article.gmane.org/gmane.linux.redhat.fedora.devel/137291 I could safely go back to using systemd without any annoyances and without disabling cpufreq-scaling anymore.
My guess is: systemd performs as it should (well, minus the missing symlinks caveat), but the weird clocksource-sync-thingy on my laptop actually prevents systemd from leaving me with a root shell like it would do when it "crashes".
Whilst annoying (and reproducible on my particular hardware), it certainly shouldn't block the beta release.
Seeing your computer becoming unresponsive is always somehow alarming and I must have overreacted. Sorry for the noise!
(BTW: kudos about systemd, after having fixed the target symlinks, it does work like a charm, and is actually quite impressive!)