Bug 630781

Summary:	systemd hangs on "Clocksource tsc unstable" error and causes the system to freeze after cpu-scaling detection
Product:	[Fedora] Fedora	Reporter:	Valentin Villenave <valentin>
Component:	systemd	Assignee:	Kernel Maintainer List <kernel-maint>
Status:	CLOSED NOTABUG	QA Contact:	Fedora Extras Quality Assurance <extras-qa>
Severity:	high	Docs Contact:
Priority:	low
Version:	14	CC:	anton, aquini, awilliam, dougsland, gansalmon, itamar, jonathan, kernel-maint, lpoetter, madhu.chinakonda, metherid, mschmidt, notting, plautrba, tomek
Target Milestone:	---
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2010-09-10 21:16:52 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	611991

Description Valentin Villenave 2010-09-06 23:40:50 UTC

Description of problem:

Version-Release number of selected component (if applicable): Branched-14, linux 2.6.35.4-12.fc14.x86_64

How reproducible:
100% on specific hardware. I could only reproduce it with an AMD Turion-X2 laptop, but not with any Intel CPU.
Some reports indicate that it may be caused by wifi drivers or graphic input drivers, but here it seems to be directly CPU-related. (See also 605430, although this one seems worse.)

Steps to Reproduce:
1. Install Fedora (DVD or netinstall, gnome or KDE, whatever)
2. Boot newly installed system.
3. Wait for about 20 seconds. (Or a couple hours, for that matter.)

Actual results:
System hangs just after having detected the available CPU frequencies, with the "Clocksource tsc unstable" message. System is left unusable: no tty, no rescue shell, no X (obviously). Keyboard unresponsive, Sysrq keys don't work either.

- Adding clocksource=acpi_pm boot option does NOT solve the bug, nor does any other clocksource option. The only workaround I could find was adding "notsc noapic nolapic" (*all three*, otherwise the bug still happens).

- Of course, this does also disable any sort of frequency scaling, so it is not a long-term solution.

- And since it happens on laptops, fiddling with the BIOS is unfortunately not an option.

Expected results:
Well, it should boot...

Additional info:
A newly-installed system should at least present the user with a command line prompt. Of course, F14 is in alpha stage, but this should be addressed soon. Current systemd bugs leave the user with a rescue shell, which is not ideal but much preferable to a complete system freeze.

Other distros boot fine and cpu-scaling works on the same hardware (with the same kernel, e.g. Mandriva Cooker). Previous versions of Fedora (F10-F12) used to work fine as well.

Comment 1 Valentin Villenave 2010-09-07 12:55:13 UTC

(In reply to comment #0)
>  - Adding clocksource=acpi_pm boot option does NOT solve the bug, nor does any
> other clocksource option. The only workaround I could find was adding "notsc
> noapic nolapic" (*all three*, otherwise the bug still happens).

OK, further investigation established that it's actually (yet another) systemd-related bug.

When using upstart instead of systemd, the "Clocksource tsc unstable" error message is still printed, but the system doesn't freeze and proceeds on booting without further annoyances.

I'm updating the description accordingly; if systemd integration doesn't make it into F14, I guess that won't be much of a problem.

Comment 2 Bill Nottingham 2010-09-07 17:26:28 UTC

Do you get any further output if you remove 'rhgb quiet' from the boot arguments?

Comment 3 Valentin Villenave 2010-09-07 23:24:29 UTC

(In reply to comment #2)
> Do you get any further output if you remove 'rhgb quiet' from the boot
> arguments?

That (and disabling kms) was my first guess as well, but the messages still look pretty much the same.

When booting using /sbin/upstart, the "clocksource tsc unstable" warning is still printed, but it doesn't prevent the system from booting (it doesn't even cause any noticeable delay).
When using systemd, it seems like it's waiting for something to complete but doesn't catch any return code or whatever, leaving the system frozen (it doesn't even react to Sysrq keys, even though I've enabled these using sysctl).

Now, since we're talking AMD and cpu-scaling, perhaps it's the powernow-k8 module that doesn't get along well with systemd?

Comment 4 Michal Schmidt 2010-09-08 06:43:46 UTC

(In reply to comment #3)
> Now, since we're talking AMD and cpu-scaling, perhaps it's the powernow-k8
> module that doesn't get along well with systemd?

Then disable it using "chkconfig cpuspeed off" to test this theory.

Comment 5 Michal Schmidt 2010-09-08 06:51:51 UTC

Also try booting with "systemd.log_level=debug" to get more information about what systemd is doing.

Comment 6 Lennart Poettering 2010-09-08 23:06:50 UTC

Well, systemd does not do anything weird with clocks, so I'd assume if there's a problem here, then it's probably just coincidence that this msg is printed and not an indication of the error.

Could you please try to boot into single user mode? Does that work?

Could you add "systemd.log_level=debug" to your kernel command line and boot with that? Could you please attach the output it generates here?

if you install sshd, can you log in remotely into your machine during that hang?

Comment 7 Adam Williamson 2010-09-10 16:54:38 UTC

Discussed at 2010-09-10 blocker review meeting. We cannot determine if this is a blocker without the information requested by Lennart. Reporter, if you could provide that ASAP, it would be appreciated. We will likely drop this bug from Beta blocker consideration if more details are not available at the next meeting.

Comment 8 Valentin Villenave 2010-09-10 21:16:52 UTC

Adam: you're absolutely right. Sorry for not having answered sooner but I was no longer able to reproduce the bug, and needed time to investigate (with a new, fresh install and the log_level=debug option).

As it turns out, the issue is a lot less serious than what I feared. Long story short: it's a bit of bad luck, and something of a coincidence as Lennart suspected.

The system *does* still freeze upon first boot. However it seems to be a hardware-related problem (probably something to do with a defective USB integrated device, this is a known issue in every distro out there).

- Using upstart instead of systemd, or disabling cpu-scaling altogether, does improve the situation.

- That being said, after I applied Lennart's trick from http://article.gmane.org/gmane.linux.redhat.fedora.devel/137291 I could safely go back to using systemd without any annoyances and without disabling cpufreq-scaling anymore.

My guess is: systemd performs as it should (well, minus the missing symlinks caveat), but the weird clocksource-sync-thingy on my laptop actually prevents systemd from leaving me with a root shell like it would do when it "crashes".

Whilst annoying (and reproducible on my particular hardware), it certainly shouldn't block the beta release.

Seeing your computer becoming unresponsive is always somehow alarming and I must have overreacted. Sorry for the noise!

(BTW: kudos about systemd, after having fixed the target symlinks, it does work like a charm, and is actually quite impressive!)