Bug 805254

Summary: /etc/X11/prefdm throws away display manager's stdout/stderr
Product: [Fedora] Fedora Reporter: Bob Gustafson <bobgus>
Component: initscriptsAssignee: Bill Nottingham <notting>
Status: CLOSED DEFERRED QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: low Docs Contact:
Priority: unspecified    
Version: 16CC: andres.hans, iarlyy, johannbg, jonathan, lnykryn, metherid, mschmidt, notting, plautrba, rvokal, systemd-maint
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 805507 805517 (view as bug list) Environment:
Last Closed: 2012-03-22 18:30:46 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
This is the output of the ssh session on the initiating system.
none
Terminal output including control C and restart and log out. none

Description Bob Gustafson 2012-03-20 18:20:02 UTC
Description of problem:

When I boot, it stops showing a full Fedora splash screen, but does not go
further to the login.

I had recently installed F16 and was apparently still living on the initial boot. I needed to reboot and found myself in this situation (again - see bug# 803574)

Version-Release number of selected component (if applicable):

This is on a fully updated F16 x86_64 running on a quad core processor.

How reproducible:

  This is my first reboot of the newly installed F16 system.

Steps to Reproduce:
1. Start with running F16 system
2. reboot
3.
  
Actual results:

  Hang at full Fedora splash, no login user choices

Expected results:

  Should see login choices.

Additional info:

Fortunately, the system comes up with networking active and a live ssh server.

From another system, I can ssh into the machine successfully. I picked a user
name (not root) which was available on both systems. When I ssh into the hung
system and then issue the command:

startx

And then look at the console screen of the hung system, voila, I am logged in as that user and I have everything available.

Even though I did a full replacement install of F16, including rewriting the partition map, When I log in through ssh from another machine and give the command 'startx' (my workaround from the previous bug# 803574), the Gnome console screen comes up with 'Applications Places' at the top left of the screen and there are little tabs at the screen bottom for each of the windows open on my screen.

When I click on my name on the top right, I see 'Log Out..' as the bottom entry, but the Alt key does not change it to 'Shutdown..' as in the Gnome screen I had when the system originally booted up. Perhaps 'startx' does different things.

[user1@hoho6 ~]$ systemctl --failed
UNIT           LOAD   ACTIVE SUB    JOB DESCRIPTION
prefdm.service loaded failed failed     Display Manager

LOAD   = Reflects whether the unit definition was properly loaded.
ACTIVE = The high-level unit activation state, i.e. generalization of SUB.
SUB    = The low-level unit activation state, values depend on unit type.
JOB    = Pending job for the unit.

1 units listed. Pass --all to see inactive units, too.

[user1@hoho6 ~]$ systemctl --list-jobs
systemctl: unrecognized option '--list-jobs'
[user1@hoho6 ~]$ systemctl list-jobs
 JOB UNIT                      TYPE            STATE  
   1 graphical.target          start           waiting
   2 multi-user.target         start           waiting
 118 plymouth...t-wait.service start           running
 121 systemd-...nlevel.service start           waiting
 128 systemd-...ead-done.timer start           waiting

5 jobs listed.
[user1@hoho6 ~]$ 

[root@hoho6 log]# grep prefdm messages | tail
Mar 20 12:33:35 hoho6 systemd[1]: prefdm.service: main process exited, code=exited, status=2
Mar 20 12:33:35 hoho6 systemd[1]: prefdm.service holdoff time over, scheduling restart.
Mar 20 12:33:35 hoho6 systemd[1]: Unit prefdm.service entered failed state.
Mar 20 12:33:35 hoho6 systemd[1]: prefdm.service: main process exited, code=exited, status=2
Mar 20 12:33:35 hoho6 systemd[1]: prefdm.service holdoff time over, scheduling restart.
Mar 20 12:33:35 hoho6 systemd[1]: Unit prefdm.service entered failed state.
Mar 20 12:33:35 hoho6 systemd[1]: prefdm.service: main process exited, code=exited, status=2
Mar 20 12:33:35 hoho6 systemd[1]: prefdm.service holdoff time over, scheduling restart.
Mar 20 12:33:35 hoho6 systemd[1]: Unit prefdm.service entered failed state.
Mar 20 12:33:35 hoho6 systemd[1]: prefdm.service start request repeated too quickly, refusing to start.
[root@hoho6 log]# 

Looks like a race condition..

Comment 1 Bob Gustafson 2012-03-20 18:35:52 UTC
Created attachment 571501 [details]
This is the output of the ssh session on the initiating system.

This initiating system is also a F16 system, but on a slower dual processor. It does not have the hang symptoms on reboot.

Comment 2 Bob Gustafson 2012-03-20 18:38:19 UTC
When I rebooted again (using /sbin/shutdown -r now), the system hung again at the Fedora splash.

Comment 3 Bob Gustafson 2012-03-20 18:44:08 UTC
Hmm, after ssh and startx on alternative system, console on affected system DID have the login dialog box. (but before the ssh and startx, it was hung at splash screen).

Comment 4 Bob Gustafson 2012-03-20 18:53:09 UTC
Another experiment - I went back to my alternative initiating system and hit the control C. The 'startx' session quit. When I went back to the problem system, I see the Fedora splash screen. The open gnome session I had previously was gone.

I went back to the alternative system and typed 'startx' again. Coming back to the problem system, it is logged on, but with no application windows - just the same as if I had logged out on the problem system.

So, another experiment - will log out of the problem system (just log out, not reboot). Results coming soon to a screen near you.

Comment 5 Bob Gustafson 2012-03-20 18:59:30 UTC
The result of logging out - the hung fedora splash screen. I am attaching the output from the terminal of the alternative initiating system. Perhaps there are some clues.

Comment 6 Bob Gustafson 2012-03-20 19:03:51 UTC
Created attachment 571503 [details]
Terminal output including control C and restart and log out.

This terminal output includes the previous terminal output, but also the control C and restart and log out from the affected system side.

hoho6 is the affected system (with ssh server)

hoho0 is the slower initiating system with the ssh client.

Comment 7 Bob Gustafson 2012-03-20 19:12:29 UTC
This time, after giving 'startx' on initiating system ssh terminal session, the affected system console screen showed the user already logged in - no login dialog window.

Also, in Comment #4, paragraph 2, I hadn't actually logged out, but had returned to the blank Gnome console screen (logged in as user), after the startx was given on the alternative initiating system ssh terminal session.

Comment 8 Bob Gustafson 2012-03-20 21:03:43 UTC
If my screen times out (as opposed to clicking on the Log Out..), the Password dialog does appear (not the Login dialog).

Comment 9 Bob Gustafson 2012-03-20 21:09:43 UTC
(In reply to comment #3)
> Hmm, after ssh and startx on alternative system, console on affected system DID
> have the login dialog box. (but before the ssh and startx, it was hung at
> splash screen).

This could have been the 'Password' dialog box, not the Login dialog window. It may have timed out before I looked at the console screen after the startx.

Comment 10 Bob Gustafson 2012-03-20 22:34:10 UTC
Following up on the hypothesis of a race condition, how can I 'turn off' a couple of cores in my quad-core processor? The turn off would have to survive a reboot.

I could take the processor from hoho0 and put it into hoho6 - they are both 775 sockets - but that would be a bit too invasive. New heat sink grease and all that.

Comment 11 Bob Gustafson 2012-03-21 00:15:49 UTC
Some commands from an old Leonnart Poettering email in systemd-devel

[root@hoho6 user1]# cp /lib/systemd/system/prefdm.service /etc/systemd/system
[root@hoho6 user1]# cd /etc/systemd/system
[root@hoho6 system]# ls
basic.target.wants                           graphical.target.wants
bluetooth.target.wants                       multi-user.target.wants
dbus-org.freedesktop.Avahi.service           prefdm.service
dbus-org.freedesktop.NetworkManager.service  printer.target.wants
default.target                               sockets.target.wants
default.target.wants                         sysinit.target.wants
getty.target.wants
[root@hoho6 system]# vim prefdm.service
-- Add StandardOutput=syslog to [Service] section

[root@hoho6 system]# systemctl status prefdm.service
prefdm.service - Display Manager
	  Loaded: loaded (/lib/systemd/system/prefdm.service; static)
	  Active: failed since Tue, 20 Mar 2012 14:27:04 -0500; 4h 39min ago
	 Process: 1169 ExecStart=/etc/X11/prefdm -nodaemon (code=exited, status=2)
	  CGroup: name=systemd:/system/prefdm.service

[root@hoho6 system]# systemctl daemon-reload
[root@hoho6 system]# systemctl start prefdm.service
[root@hoho6 system]# systemctl status prefdm.service
prefdm.service - Display Manager
	  Loaded: loaded (/lib/systemd/system/prefdm.service; static)
	  Active: failed since Tue, 20 Mar 2012 19:07:08 -0500; 16s ago
	 Process: 3116 ExecStart=/etc/X11/prefdm -nodaemon (code=exited, status=2)
	  CGroup: name=systemd:/system/prefdm.service
[root@hoho6 system]# 

[root@hoho6 log]# tail -30 messages
Mar 20 19:04:30 hoho6 dbus-daemon[1032]: dbus[1032]: [system] Activating service name='net.reactivated.Fprint' (using servicehelper)
Mar 20 19:04:30 hoho6 dbus-daemon[1032]: Launching FprintObject
Mar 20 19:04:30 hoho6 dbus[1032]: [system] Successfully activated service 'net.reactivated.Fprint'
Mar 20 19:04:30 hoho6 dbus-daemon[1032]: dbus[1032]: [system] Successfully activated service 'net.reactivated.Fprint'
Mar 20 19:04:30 hoho6 dbus-daemon[1032]: ** Message: D-Bus service launched with name: net.reactivated.Fprint
Mar 20 19:04:30 hoho6 dbus-daemon[1032]: ** Message: entering main loop
Mar 20 19:05:00 hoho6 dbus-daemon[1032]: ** Message: No devices in use, exit
Mar 20 19:06:54 hoho6 systemd[1]: Reloading.
Mar 20 19:07:07 hoho6 systemd[1]: prefdm.service: main process exited, code=exited, status=2
Mar 20 19:07:07 hoho6 systemd[1]: prefdm.service holdoff time over, scheduling restart.
Mar 20 19:07:07 hoho6 systemd[1]: Unit prefdm.service entered failed state.
Mar 20 19:07:07 hoho6 systemd[1]: prefdm.service: main process exited, code=exited, status=2
Mar 20 19:07:07 hoho6 systemd[1]: prefdm.service holdoff time over, scheduling restart.
Mar 20 19:07:07 hoho6 systemd[1]: Unit prefdm.service entered failed state.
Mar 20 19:07:07 hoho6 systemd[1]: prefdm.service: main process exited, code=exited, status=2
Mar 20 19:07:07 hoho6 systemd[1]: prefdm.service holdoff time over, scheduling restart.
Mar 20 19:07:07 hoho6 systemd[1]: Unit prefdm.service entered failed state.
Mar 20 19:07:07 hoho6 systemd[1]: prefdm.service: main process exited, code=exited, status=2
Mar 20 19:07:07 hoho6 systemd[1]: prefdm.service holdoff time over, scheduling restart.
Mar 20 19:07:07 hoho6 systemd[1]: Unit prefdm.service entered failed state.
Mar 20 19:07:08 hoho6 systemd[1]: prefdm.service: main process exited, code=exited, status=2
Mar 20 19:07:08 hoho6 systemd[1]: prefdm.service holdoff time over, scheduling restart.
Mar 20 19:07:08 hoho6 systemd[1]: Unit prefdm.service entered failed state.
Mar 20 19:07:08 hoho6 systemd[1]: prefdm.service: main process exited, code=exited, status=2
Mar 20 19:07:08 hoho6 systemd[1]: prefdm.service holdoff time over, scheduling restart.
Mar 20 19:07:08 hoho6 systemd[1]: Unit prefdm.service entered failed state.
Mar 20 19:07:08 hoho6 systemd[1]: prefdm.service start request repeated too quickly, refusing to start.
Mar 20 19:07:14 hoho6 systemd[1]: plymouth-quit-wait.service operation timed out. Terminating.
Mar 20 19:07:14 hoho6 systemd[1]: Unit plymouth-quit-wait.service entered failed state.
Mar 20 19:07:14 hoho6 systemd[1]: Startup finished in 3s 153ms 655us (kernel) + 2s 745ms 58us (initrd) + 4h 40min 17s 271ms 325us (userspace) = 4h 40min 23s 170ms 38us.
[root@hoho6 log]# 

Anything interesting?

Comment 12 Bob Gustafson 2012-03-21 00:18:49 UTC
I am running rvm

Perhaps this has a bearing on the problem? Or perhaps this problem has a bearing on the rvm problem.

https://github.com/wayneeseguin/rvm/issues/835

Comment 13 Michal Schmidt 2012-03-21 11:53:54 UTC
The issue certainly looks similar. Does it help if you move /etc/profile.d/rvm.sh away?

Comment 14 Bob Gustafson 2012-03-21 13:35:22 UTC
Yes, that is it.

cd /etc/profile.d

mv rvm.sh rvm.sh.old

Reboot - and almost everything is fine.

Comment 15 Michal Schmidt 2012-03-21 13:53:19 UTC
Good to know. To summarize the problem:
1) /etc/X11/prefdm executes /usr/sbin/gdm
2) /usr/sbin/gdm is a shell script running in POSIX mode (it has #!/bin/sh)
3) /usr/sbin/gdm sources /etc/profile
4) /etc/profile sources /etc/profile.d/rvm.sh
5) rvm.sh depends on non-POSIX bash functionality, so a syntax error is hit.

The whole thing quits with error code 2.

I see at least 2 issues that made this problem needlessly difficult to debug:
 - /etc/X11/prefdm executes the display manager with ">/dev/null 2>&1",
   thus hiding the error messages.
 - /etc/profile sources the files /etc/profile.d/*.sh with:
   . "$i" >/dev/null 2>&1
   again hiding the error messages.

prefdm is owned by initscripts. Reassigning.

Comment 16 Michal Schmidt 2012-03-21 14:03:37 UTC
I cloned this for the packages setup and gdm as bug 805507 and bug 805517.

Comment 17 Bill Nottingham 2012-03-21 14:12:51 UTC
This is more or less intentional; it's executing, in order, fallbacks from a list of display managers. If we don't throw the error messages out, everyone will get error messages about display managers that don't exist on the system in question, in cases where them not existing isn't an error.

Comment 18 Michal Schmidt 2012-03-21 14:21:03 UTC
But:
 - it does the redirection even when running $preferred.
 - for the fallbacks it could do:
   [ -x /usr/sbin/gdm ] && exec /usr/sbin/gdm "$@"
   and similar.

Comment 19 Bill Nottingham 2012-03-21 14:32:03 UTC
It could, although that means it's tied into whatever directories the DM happens to live in, even if it moves.

Obviously, this should go away with systemd-ification of display managers in F18.

Comment 20 Michal Schmidt 2012-03-21 22:21:17 UTC
(In reply to comment #19)
> It could, although that means it's tied into whatever directories the DM
> happens to live in, even if it moves.

Good point. You might like this then:
type -P gdm >/dev/null && exec gdm "$@"

> Obviously, this should go away with systemd-ification of display managers in
> F18.

Yes, hopefully.
You can decide whether to use the "type -P" guards, or just WONTFIX this for F<18.

Comment 21 Bill Nottingham 2012-03-22 18:30:46 UTC
I think I'm going to WONTFIX this - trying to catch an error in profile.d shell scripts this way isn't a case to go out of the way to rearchitect and fix.

Comment 22 Andres Noce 2012-06-05 03:53:04 UTC
what's the conntent of /etc/sysconfig/desktop ?
i got this issue also, but i fixed, cuz i allways install Fedora from minimal packages options on install stage and every time i forgot to config /etc/sysconfig/desktop, i think this skips firstlogin and other setup which could be the issue of not having /etc/sysconfig/desktop created which could lead to this "bug")
[root@lxvz9mv901 ~]# uname -a
Linux lxvz9mv901 3.3.7-1.fc17.i686.PAE #1 SMP Mon May 21 22:42:05 UTC 2012 i686 i686 i386 GNU/Linux
[root@lxvz9mv901 ~]# rpm -q openbox lxdm
openbox-3.5.0-5.fc17.i686
lxdm-0.4.1-1.fc17.i686
[root@lxvz9mv901 ~]# 

and i got solved as soon as created /etc/sysconfig/desktop
here you have mine:
[root@lxvz9mv901 ~]# cat /etc/sysconfig/desktop 
PREFERRED=/usr/bin/openbox-session
DISPLAYMANAGER=/usr/sbin/lxdm
[root@lxvz9mv901 ~]#