Bug 619889

Summary: X.org pegged at 100% CPU usage after booting with systemd
Product: [Fedora] Fedora Reporter: Adam Williamson <awilliam>
Component: xorg-x11Assignee: X/OpenGL Maintenance List <xgl-maint>
Status: CLOSED ERRATA QA Contact:
Severity: urgent Docs Contact:
Priority: low    
Version: rawhideCC: airlied, darrellpf, drjohnson1, GoinEasy9, jlaska, lpoetter, mattdm, metherid, mschmidt, notting, pebolle, poelstra, tomek
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard: AcceptedBlocker
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-08-11 19:16:18 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 611990    
Attachments:
Description Flags
systemd log messages from an affected boot none

Description Adam Williamson 2010-07-30 19:27:33 UTC
As a couple of reporters noted in https://bugzilla.redhat.com/show_bug.cgi?id=618315 , after booting current Rawhide with systemd - systemd-4-4.fc14.x86_64 - X.org is pegged at 100% CPU usage on one core. This does not happen when booting with upstart. I also note the Scroll Lock LED on my keyboard seems to turn on and off periodically, and I can't control it with the scroll lock key.

I'm attaching the output of grep "init\[1\]" /var/log/messages . Note how the tty1.service messages seem to cycle over and over (they're still going on as I write this) every minute or so. (This doesn't coincide with the scroll lock cycle, I checked).

Comment 1 Adam Williamson 2010-07-30 19:28:10 UTC
Created attachment 435627 [details]
systemd log messages from an affected boot

Comment 2 Adam Williamson 2010-07-30 19:41:56 UTC
Discussed as a blocker out-of-channel with poelcat, jlaska and oxf13, all ACK for this to be an Alpha blocker as it infringes "In most cases, the installed system must boot to a functional graphical environment without user intervention" and probably "Bug hinders execution of required Alpha test plans or dramatically reduces test coverage".



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 3 GoinEasy9 2010-07-30 21:57:38 UTC
I mentioned that one core of my AMD Phenom II X4 955 was at 99% in the comments of https://bugzilla.redhat.com/show_bug.cgi?id=618315.  Systemd log messages were similar to those in comment #1, no need to post them unless someone feels they would be helpful.  Everyone I've spoken to that has used the boot fix in #618315 to access their machine using systemd seems to have the same "cpu gone wild" effect.

Comment 4 darrell pfeifer 2010-07-31 00:54:45 UTC
The 100% CPU is somewhat inconsistent.

1) I was running KDE initially with the 100% CPU. I booted to gnome, still with systemd and the problem went away.

2) After another reboot a day later, I was getting 100% CPU under gnome (It switched me to gnome 3.0 for some strange reason). After attempting to reset to compiz, I was logged out, had weird screen artifacts at the login screen. After logging in I was still using the gnome 3 interface but the CPU was normal. Switched to compiz and the screen is still normal.

Note: I haven't done any X updates during the process.

Comment 5 Matthew Miller 2010-07-31 15:39:39 UTC
And what's X doing? This, over and over:

setitimer(ITIMER_REAL, {it_interval={0, 20000}, it_value={0, 20000}}, NULL) = 0
setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={0, 0}}, NULL) = 0
select(256, [1 3 6 8 9 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 38 39 40 41 44 48 49], NULL, NULL, {4227512, 486000}) = 1 (in [6], left {4227512, 485985})
ioctl(6, TCFLSH, 0x2)                   = -1 EIO (Input/output error)

Comment 6 darrell pfeifer 2010-07-31 15:46:10 UTC
Logging out and logging back in again stops the looping.

Comment 7 Michal Schmidt 2010-08-01 08:09:51 UTC
Does it still happen if you delete the symlink /etc/systemd/system/getty.target.wants/getty and reboot?

Comment 8 Paul Bolle 2010-08-01 09:45:48 UTC
(In reply to comment #7)
> Does it still happen if you delete the symlink
> /etc/systemd/system/getty.target.wants/getty and reboot?    

After booting with that symlink deleted X doesn't seem to try to use 100% of CPU anymore (on a single core i686).

Comment 9 Bill Nottingham 2010-08-02 15:43:08 UTC
Right, we defaulted to having X on tty1 in recent releases, with logic in upstart & gdm to make sure that X and mingetty arguing over the tty.

Comment 10 Lennart Poettering 2010-08-02 15:44:03 UTC
I am pretty sure this is a race between X and mingetty, and X needs to be fixed. However, I am not entirely sure about this. This is triggered probably only because we parallize bootup more drastically than sysv did.

Anyway, I will now reassign this to X, in hope for comments from the X people. Should it turn out that systemd is at fault we can reassign it back.

These two bugs are probably result of the same problem:

https://bugzilla.redhat.com/show_bug.cgi?id=614454
https://bugzilla.redhat.com/show_bug.cgi?id=620043

Comment 11 Michal Schmidt 2010-08-02 15:54:45 UTC
Lennart, what about something like this?:
getty could be treated specially. Instead of being just another instantiation of the getty@.service template, it could be a separate unit, with:
ExecStartPre=/usr/bin/plymouth quit

It would be WantedBy multi-user.target.

And graphical.target would Conflict with getty.

Comment 12 Lennart Poettering 2010-08-02 15:59:02 UTC
Hmm, so on upstart getty@tty1 was only started in runlevel != 5 and gdm in runlevel == 5. 

Currently systemd by default configures "graphical.target" (which kinda replaces runlevel 5) as true superset of "multi-user.target". I wonder how we can make that compatible.

/me needs to think about this.

But anyway, there are two issues here:

1) i think that X needs to be fixed to fail cleanly when this happens instead of just busy looping

2) we somehow need to fix the systemd default config not to start X and getty on the same screen. One option would be to introduce "text.target" or so, which would be a superset of multi-user.target and add that one getty

Comment 13 Lennart Poettering 2010-08-02 16:01:36 UTC
Michal: graphical.target pulls in multi-user.target, and hence with what you suggest we'd both "want" and "conflict" getty from the same transaction, which would make the transaction invalid. systemd would then go on and fix that to make sure you can still boot, and actually would do the right thing, but I am not sure it is nice to ship something wehre the default boot transaction is actually contradicting itself.

Comment 14 Lennart Poettering 2010-08-02 16:02:18 UTC
*** Bug 614454 has been marked as a duplicate of this bug. ***

Comment 15 Lennart Poettering 2010-08-02 16:02:30 UTC
*** Bug 620043 has been marked as a duplicate of this bug. ***

Comment 16 Matthew Miller 2010-08-02 16:39:31 UTC
> Currently systemd by default configures "graphical.target" (which kinda
> replaces runlevel 5) as true superset of "multi-user.target". I wonder how we
> can make that compatible.
> 
> /me needs to think about this.

There are several other cases where one might want services in a multi-user text-only mode to differ from GUI mode. For example, the gpm service, or simply having 12 getty sessions in text but going down to only a few in GUI.

Comment 17 Michal Schmidt 2010-08-03 11:28:51 UTC
Lennart,
since this bug is an accepted Alpha blocker, it should be fixed as soon as possible. Waiting for the real fix in X may take a long time.
The idea with "text.target" should work.
Even a very minimal fix (just removing the default alias getty.target.wants/getty from getty@.service) might be acceptable at this point (the absence of a getty on tty1 could be put in Alpha release notes), with a better fix planned for later.

Comment 18 Adam Williamson 2010-08-03 17:36:11 UTC
Right, as Michal says, we need a fix for this - even if it's only a temporary one - ASAP. It's blocking the Alpha release, and we need to do the Alpha RC compose tomorrow if we're not going to delay the Alpha. So please throw in something that'll fix it in the short term for now, and you can get the 'proper' fix in for Beta :). Thanks!



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 19 Adam Williamson 2010-08-03 17:37:15 UTC
btw, Michal's suggested fix would indeed be fine for Alpha, not having a getty on tty1 would not invalidate any Alpha release criteria.



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 20 Dave Airlie 2010-08-03 20:23:50 UTC
Not sure what X could do, the OS has started a getty on the same tty, yes it can probably crash, but that isn't the desired effect our users want to see either I would assume.

At least pegging the CPU at 100% made the bug traceable to systemd, instead of the X server just mysteriously dying randomly.

Comment 21 Lennart Poettering 2010-08-04 01:42:48 UTC
Dave, in contrast to X the getty properly fails when it cannot get get posession of the tty. X however just enters a busy loop. If should instead behave correctly and fail too. It would be the right thing, even if then it would still be a race who actually owns tty1 in the end.

I have now uploaded a fix for systemd to F14 that makes prefdm conflict with the getty on tty1. That should have the effect that only prefdm is started, and the getty isn't, and should fix the main problem. This should be enough to drop the F14Blocker, though we might want to wait for verification from testers of my upload.

But in the end I do believe that X should be fixed, too, to fail cleanly if misconfiguration like this happens instead of making the keyboard unusable.

Comment 22 Matthew Miller 2010-08-04 02:29:16 UTC
(In reply to comment #21)
> But in the end I do believe that X should be fixed, too, to fail cleanly if
> misconfiguration like this happens instead of making the keyboard unusable.    

That sounds right to me. (Although note that in my case, the keyboard worked fine; the system was just silently eating all of one cpu core.)

Could systemd be made to take some sort of defensive action in this sort of case? With X, we have the luxury of being able to have a civil conversation about the right thing to do upstream, and that's not necessarily the case with all the software that systemd will be in charge of, when it's in charge of everything.

Comment 23 Jens Petersen 2010-08-04 02:43:11 UTC
I installed F14Alpha TC2 and got upstart, upstart-sysinitv, and systemd-units installed.

Comment 24 Bill Nottingham 2010-08-04 15:34:17 UTC
Jens: that will be fixed in a post-TC2 compose.

Comment 25 James Laska 2010-08-04 16:36:05 UTC
(In reply to comment #21)

> I have now uploaded a fix for systemd to F14 that makes prefdm conflict with
> the getty on tty1. That should have the effect that only prefdm is started, and
> the getty isn't, and should fix the main problem. This should be enough to drop
> the F14Blocker, though we might want to wait for verification from testers of
> my upload.

This appears to be provided in the updated systemd-5-2.fc14 (see https://admin.fedoraproject.org/updates/dbus-1.3.2-0.1.885483.fc14,systemd-5-2.fc14)?

Comment 26 John Poelstra 2010-08-05 16:45:24 UTC
Moving to MODIFIED based on comment #25 which implies a fix for this bug is in.

Comment 27 Adam Williamson 2010-08-05 20:16:44 UTC
Making this depend on 621200, a bug that showed up in the updated systemd which fixes *this* bug. This is to make sure we get a systemd that fixes both this and 621200 in f14 alpha.

Comment 28 Adam Williamson 2010-08-05 21:43:29 UTC
scratch the above, I tested 5-2 and it does succeed in booting without 100% CPU usage problems, so that build is okay.



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 29 James Laska 2010-08-06 13:20:53 UTC
Both dbus-1.3.2-0.0.885483.fc14 and systemd-5-2.fc14 are included in F-14-Alpha-RC1.

Moving to ON_QA.  So far the bodhi update has mixed test feedback (totally 0 karma).  Anyone who was able to reproduce this issue, please provide test feedback at https://admin.fedoraproject.org/updates/dbus-1.3.2-0.1.885483.fc14,systemd-5-2.fc14

Thanks!

Comment 30 Michal Schmidt 2010-08-06 15:47:55 UTC
James,

There are currently 2 Bodhi updates with systemd:
https://admin.fedoraproject.org/updates/dbus-1.3.2-0.1.885483.fc14,systemd-5-2.fc14
https://admin.fedoraproject.org/updates/systemd-6-1.fc14

For a working Alpha we need dbus from the first one and systemd-6-1 from the other one.

Comment 31 Adam Williamson 2010-08-08 04:09:47 UTC
jlaska: both are only in RC2 because they were cherrypicked. unless the update submission is fixed we're still vulnerable to stuff up the actual F14 repo by updating systemd but not dbus. the systemd 6-2 update submission needs to be updated to include the dbus package.



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 32 Adam Williamson 2010-08-11 19:16:18 UTC
the fixed packages are now in stable (and alpha rc3) and have been verified by multiple people as working, so let's close this.



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers