Bug 803574

Summary: After update, F15 boots to Fedora splash and hangs - workaround though
Product: [Fedora] Fedora Reporter: Bob Gustafson <bobgus>
Component: gdmAssignee: Ray Strode [halfline] <rstrode>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 15CC: johannbg, metherid, mschmidt, notting, plautrba, rstrode, systemd-maint
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-03-20 16:07:06 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
This is the output of the ssh session on the initiating system.
none
grep prefdm /var/log/messages none

Description Bob Gustafson 2012-03-15 05:45:45 UTC
Description of problem:

When I boot, it stops showing a full Fedora splash screen, but does not go further to the login.

This occurred because of an update. The system was working properly, I went on a two week trip, did an update, and now am in a degraded state.

Version-Release number of selected component (if applicable):

[user1@hoho6 ~]$ uname -a
Linux hoho6.chidig.com 2.6.42.9-1.fc15.x86_64 #1 SMP Thu Mar 1 23:30:23 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
[user1@hoho6 ~]$ 

How reproducible:

100%

Steps to Reproduce:
1. start with running system F15
2. update with Software Update
3. reboot
  
Actual results:

  Hang at full Fedora splash, no login user choices

Expected results:

  Boot sequence should go through Fedora splash to login user choices

Additional info:

Fortunately, the system comes up with networking active and a live ssh server.

From another system, I can ssh into the machine successfully. I picked a user name (not root) which was available on both systems. When I ssh into the hung system and then issue the command:

startx

And then look at the screen of the hung system, voila, I am logged in as that user and I have everything available. It seems a little slower, but that may be my imagination.

Comment 1 Michal Schmidt 2012-03-15 13:35:19 UTC
(In reply to comment #0)
> Fortunately, the system comes up with networking active and a live ssh server.

That should make debugging easier. What do these commands say?:
systemctl --failed
systemctl list-jobs

Comment 2 Bob Gustafson 2012-03-15 15:30:02 UTC
[user1@hoho6 ~]$ systemctl --failed
UNIT                      LOAD   ACTIVE SUB    JOB DESCRIPTION
httpd.service             loaded failed failed     LSB: start and stop Apache HT
prefdm.service            loaded failed failed     Display Manager

LOAD   = Reflects whether the unit definition was properly loaded.
ACTIVE = The high-level unit activation state, i.e. generalization of SUB.
SUB    = The low-level unit activation state, values depend on unit type.
JOB    = Pending job for the unit.

2 units listed. Pass --all to see inactive units, too.
[user1@hoho6 ~]$ 
[user1@hoho6 ~]$ systemctl list-jobs
  95 getty.target              start           waiting
  96 getty        start           waiting
  98 getty        start           waiting
 100 getty        start           waiting
 102 getty        start           waiting
 107 getty        start           waiting
 111 plymouth...t-wait.service start           running
[user1@hoho6 ~]$ 

I can also give you the output from the ssh session on the other system.

Comment 3 Bob Gustafson 2012-03-15 15:36:34 UTC
Created attachment 570329 [details]
This is the output of the ssh session on the initiating system.

This is the output of the ssh session on the initiating system.

It is still open, so more output might be added as time goes on. If there is an interesting debug command that I can give from the initiating system, let me know.

Comment 4 Michal Schmidt 2012-03-15 15:53:07 UTC
(In reply to comment #2)
> prefdm.service            loaded failed failed     Display Manager

It would be interesting to know why exactly prefdm failed. What does "systemctl status prefdm.service" say? And are there errors about it logged in /var/log/messages?
What do you use as your graphical login manager? gdm, kdm, xdm, ...?

> [user1@hoho6 ~]$ systemctl list-jobs
>   95 getty.target              start           waiting
>   96 getty        start           waiting
>   98 getty        start           waiting
>  100 getty        start           waiting
>  102 getty        start           waiting
>  107 getty        start           waiting
>  111 plymouth...t-wait.service start           running

It is the responsibility of prefdm.service to tell plymouth to quit. Apparently it failed to do even that. Running "plymouth quit" will likely allow the boot finish.

Comment 5 Bob Gustafson 2012-03-16 07:29:13 UTC
[root@hoho6 user1]# systemctl status prefdm.service
prefdm.service - Display Manager
	  Loaded: loaded (/lib/systemd/system/prefdm.service)
	  Active: failed since Wed, 14 Mar 2012 13:36:52 -0500; 1 day and 12h ago
	 Process: 1573 ExecStart=/etc/X11/prefdm -nodaemon (code=exited, status=2)
	  CGroup: name=systemd:/system/prefdm.service
[root@hoho6 user1]# 


This might be interesting to someone:

[root@hoho6 user1]# ps ax | grep X
 6947 pts/3    S+     0:00 xinit /etc/X11/xinit/xinitrc -- /usr/bin/X :0 -auth /home/user1/.serverauth.6931
 6948 tty3     Ss+    3:14 /usr/bin/X :0 -auth /home/user1/.serverauth.6931
 6952 ?        Ss     0:00 /usr/bin/ck-xinit-session /usr/bin/ssh-agent /etc/X11/xinit/Xclients
 7075 ?        Ss     0:00 /usr/bin/ssh-agent /etc/X11/xinit/Xclients
18900 pts/4    S+     0:00 grep --color=auto X
[root@hoho6 user1]# 

Graphical login manager is whatever is the default for:

[root@hoho6 user1]# uname -a
Linux hoho6.chidig.com 2.6.42.9-1.fc15.x86_64 #1 SMP Thu Mar 1 23:30:23 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
[root@hoho6 user1]# 

What command should I use to give you the information?

Comment 6 Bob Gustafson 2012-03-16 07:33:39 UTC
Note that I never used the login manager to get into this system. I logged in using ssh on a terminal on another system. When I gave the command startx on this terminal command line, the main monitor screen came alive with that user's stuff.

Essentially, I stepped over the problem, so what you see in some of the diagnostic probes might not be that informative.

I'm up for any experiments. Should I shut down and come up again with the same procedure, but instead of saying startx, use plymouth quit ?

Comment 7 Bob Gustafson 2012-03-16 07:38:58 UTC
Created attachment 570519 [details]
grep prefdm /var/log/messages

In addition, this file contains prefdm messages from before I was able to log in through ssh. Check the dates. Multiple reboot attempts..

Comment 8 Michal Schmidt 2012-03-16 09:35:44 UTC
(In reply to comment #5)
> Graphical login manager is whatever is the default for:
> 
> [root@hoho6 user1]# uname -a
> Linux hoho6.chidig.com 2.6.42.9-1.fc15.x86_64 #1 SMP Thu Mar 1 23:30:23 UTC
> 2012 x86_64 x86_64 x86_64 GNU/Linux

A kernel version is not related to the login manager. What login manager you have depends on which Fedora live spin you installed, or what packages you selected if you installed using the classical installer.

> What command should I use to give you the information?

You use Gnome, so chances are you have gdm. Does /usr/sbin/gdm exist? Does /etc/sysconfig/desktop exist, and if yes, what is in it?

(In reply to comment #6)
> I'm up for any experiments. Should I shut down and come up again with the same
> procedure, but instead of saying startx, use plymouth quit ?

Yes, you can try that and see what happens.

You can also try running /etc/X11/prefdm as root and see what you get.

(In reply to comment #7)
> Created attachment 570519 [details]
> grep prefdm /var/log/messages
>
> In addition, this file contains prefdm messages from before I was able to log
> in through ssh. Check the dates. Multiple reboot attempts..

Are there any interesting lines before each occurrence of "prefdm.service: main process exited"? Note that they may be interesting without containing the "prefdm" string.

If you indeed have gdm, you'll find more logs in /var/log/gdm/.

Comment 9 Bob Gustafson 2012-03-16 13:45:12 UTC
Before shutting down and rebooting:

[root@hoho6 log]# ls -lZ /usr/sbin/gdm
-rwxr-xr-x. root root system_u:object_r:xdm_exec_t:s0  /usr/sbin/gdm
[root@hoho6 log]# 

[root@hoho6 log]# ls -lZ /etc/sysconfig/desktop
ls: cannot access /etc/sysconfig/desktop: No such file or directory
[root@hoho6 log]# 

[root@hoho6 log]# cd gdm
[root@hoho6 gdm]# ls -lt
total 1244
-rw-r--r--. 1 root root 536309 Mar 14 01:49 :0.log
-rw-r--r--. 1 root root    185 Mar 14 01:49 :0-slave.log
-rw-r--r--. 1 gdm  gdm    2281 Jan 12 16:13 :0-greeter.log
-rw-r--r--. 1 root root 461293 Jan 12 16:12 :0.log.1
-rw-r--r--. 1 gdm  gdm    3229 Nov 14 14:16 :0-greeter.log.1
-rw-r--r--. 1 root root    340 Nov 14 14:16 :0-slave.log.1
-rw-r--r--. 1 root root 102888 Nov 14 14:16 :0.log.2
-rw-r--r--. 1 root root    185 Nov 14 14:16 :0-slave.log.2
-rw-r--r--. 1 gdm  gdm    2281 Nov  6 15:39 :0-greeter.log.2
-rw-r--r--. 1 root root    185 Nov  6 14:45 :0-slave.log.3
-rw-r--r--. 1 root root  36208 Nov  6 14:45 :0.log.3
-rw-r--r--. 1 gdm  gdm    2281 Nov  6 14:31 :0-greeter.log.3
-rw-r--r--. 1 root root  37875 Nov  6 14:30 :0.log.4
-rw-r--r--. 1 root root    185 Nov  6 14:30 :0-slave.log.4
-rw-r--r--. 1 gdm  gdm    2281 Nov  6 14:09 :0-greeter.log.4
-rw-r--r--. 1 root root  27696 May 31  2011 :1.log
-rw-r--r--. 1 root root    342 May 31  2011 :1-slave.log
-rw-r--r--. 1 gdm  gdm    2938 May 31  2011 :1-greeter.log
[root@hoho6 gdm]# 

[root@hoho6 gdm]# tail :0.log
(II) Power Button: Close
(II) Unloading evdev
(II) Power Button: Close
(II) Unloading evdev
(II) AT Translated Set 2 keyboard: Close
(II) Unloading evdev
(II) ImExPS/2 Generic Explorer Mouse: Close
(II) Unloading evdev
(II) NOUVEAU(0): NVLeaveVT is called.
(II) NOUVEAU(0): Closed GPU channel 2
[root@hoho6 gdm]# 

Here is one 'vim -R messages' window starting with the last but one 'prefdm failed' into the next batch of prefdm messages.

Mar 14 02:53:13 hoho6 systemd[1]: Unit prefdm.service entered failed state.
Mar 14 02:53:13 hoho6 avahi-daemon[900]: Joining mDNS multicast group on interface virbr0.IPv4 with address 192.168.122.1.
Mar 14 02:53:13 hoho6 kernel: [   20.355996] ADDRCONF(NETDEV_UP): virbr0: link is not ready
Mar 14 02:53:13 hoho6 avahi-daemon[900]: New relevant interface virbr0.IPv4 for mDNS.
Mar 14 02:53:13 hoho6 avahi-daemon[900]: Registering new address record for 192.168.122.1 on virbr0.IPv4.
Mar 14 02:53:13 hoho6 dnsmasq[1550]: started, version 2.52 cachesize 150
Mar 14 02:53:13 hoho6 dnsmasq[1550]: compile time options: IPv6 GNU-getopt DBus no-I18N DHCP TFTP
Mar 14 02:53:13 hoho6 dnsmasq-dhcp[1550]: DHCP, IP range 192.168.122.2 -- 192.168.122.254, lease time 1h
Mar 14 02:53:13 hoho6 dnsmasq[1550]: reading /etc/resolv.conf
Mar 14 02:53:13 hoho6 dnsmasq[1550]: using nameserver 192.168.49.41#53
Mar 14 02:53:13 hoho6 dnsmasq[1550]: read /etc/hosts - 2 addresses
Mar 14 02:53:13 hoho6 systemd[1]: prefdm.service: main process exited, code=exited, status=2
Mar 14 02:53:13 hoho6 systemd[1]: prefdm.service holdoff time over, scheduling restart.
Mar 14 02:53:13 hoho6 systemd[1]: Unit prefdm.service entered failed state.

I will do the reboot now and do your other experiments

Comment 10 Bob Gustafson 2012-03-16 18:09:40 UTC
OK, I rebooted the problematic F15 system. Courtesy of the new Gnome, there is no 'restart' command available, even with the Alt key depressed, so I had to do '/sbin/shutdown -r now' in a terminal..

The system came up with the full Fedora logo on the console screen.

Going to my alternative (F16) system, I logged in using ssh as before:


[root@hoho0 ~]# ssh 192.168.49.51
root.49.51's password: 
Last login: Wed Mar 14 17:24:34 2012 from hoho6.chidig.com
[root@hoho6 ~]# systemctl status prefdm.service
prefdm.service - Display Manager
	  Loaded: loaded (/lib/systemd/system/prefdm.service)
	  Active: failed since Fri, 16 Mar 2012 12:44:41 -0500; 4min 32s ago
	 Process: 1499 ExecStart=/etc/X11/prefdm -nodaemon (code=exited, status=2)
	  CGroup: name=systemd:/system/prefdm.service
[root@hoho6 ~]# 
[root@hoho6 ~]# systemctl --failed
UNIT                      LOAD   ACTIVE SUB    JOB DESCRIPTION
httpd.service             loaded failed failed     LSB: start and stop Apache HT
prefdm.service            loaded failed failed     Display Manager

LOAD   = Reflects whether the unit definition was properly loaded.
ACTIVE = The high-level unit activation state, i.e. generalization of SUB.
SUB    = The low-level unit activation state, values depend on unit type.
JOB    = Pending job for the unit.

2 units listed. Pass --all to see inactive units, too.
[root@hoho6 ~]# 
[root@hoho6 ~]# plymouth quit
[root@hoho6 ~]# 


What I saw on the console screen after doing 'plymouth quit' was a blank screen with a blinking underline cursor at the top left. I typed 'Alt-F2' and was rewarded with a login prompt. I put in my user name and password and now have a command line active on the console screen.

No graphical interface though. No Gnome3 (is that bad..), No gdm.



From the ssh terminal, I typed:
[root@hoho6 ~]# /etc/X11/prefdm
[root@hoho6 ~]#

It seemed to run and then immediately exit.

There is no change on the console screen. When I log out of my user on the console screen, an immediate login request line comes back.

What next to do?

Comment 11 Bob Gustafson 2012-03-16 18:15:24 UTC
Looking at the bottom lines of:

[root@hoho6 log]# grep prefdm messages

Mar 16 12:44:41 hoho6 systemd[1]: prefdm.service holdoff time over, scheduling restart.
Mar 16 12:44:41 hoho6 systemd[1]: Unit prefdm.service entered failed state.
Mar 16 12:44:41 hoho6 systemd[1]: prefdm.service: main process exited, code=exited, status=2
Mar 16 12:44:41 hoho6 systemd[1]: prefdm.service holdoff time over, scheduling restart.
Mar 16 12:44:41 hoho6 systemd[1]: Unit prefdm.service entered failed state.
Mar 16 12:44:41 hoho6 systemd[1]: prefdm.service: main process exited, code=exited, status=2
Mar 16 12:44:41 hoho6 systemd[1]: prefdm.service holdoff time over, scheduling restart.
Mar 16 12:44:41 hoho6 systemd[1]: Unit prefdm.service entered failed state.
Mar 16 12:44:41 hoho6 systemd[1]: prefdm.service start request repeated too quickly, refusing to start.
[root@hoho6 log]# date
Fri Mar 16 13:10:45 CDT 2012
[root@hoho6 log]# 

It appears that the most recent '/etc/X11/prefdm' doesn't appear in the log/messages.

I will try again to see if that is correct

[root@hoho6 log]# date
Fri Mar 16 13:13:13 CDT 2012
[root@hoho6 log]# /etc/X11/prefdm
[root@hoho6 log]# 
[root@hoho6 log]# grep prefdm messages | tail
Mar 16 12:44:41 hoho6 systemd[1]: prefdm.service: main process exited, code=exited, status=2
Mar 16 12:44:41 hoho6 systemd[1]: prefdm.service holdoff time over, scheduling restart.
Mar 16 12:44:41 hoho6 systemd[1]: Unit prefdm.service entered failed state.
Mar 16 12:44:41 hoho6 systemd[1]: prefdm.service: main process exited, code=exited, status=2
Mar 16 12:44:41 hoho6 systemd[1]: prefdm.service holdoff time over, scheduling restart.
Mar 16 12:44:41 hoho6 systemd[1]: Unit prefdm.service entered failed state.
Mar 16 12:44:41 hoho6 systemd[1]: prefdm.service: main process exited, code=exited, status=2
Mar 16 12:44:41 hoho6 systemd[1]: prefdm.service holdoff time over, scheduling restart.
Mar 16 12:44:41 hoho6 systemd[1]: Unit prefdm.service entered failed state.
Mar 16 12:44:41 hoho6 systemd[1]: prefdm.service start request repeated too quickly, refusing to start.
[root@hoho6 log]# 

Yes, the terminal '/etc/X11/prefdm' doesn't add another prefdm line to messages

Comment 12 Bob Gustafson 2012-03-17 05:38:39 UTC
Duplicating the commands shown in Comment #10 on a satisfactorily running F16 system to see what the output 'should be' (more or less)

[root@hoho0 user1]# systemctl status prefdm.service
prefdm.service - Display Manager
	  Loaded: loaded (/lib/systemd/system/prefdm.service; static)
	  Active: active (running) since Fri, 16 Mar 2012 17:03:06 -0500; 7h ago
	Main PID: 966 (gdm-binary)
	  CGroup: name=systemd:/system/prefdm.service
		  ├  966 /usr/sbin/gdm-binary -nodaemon
		  ├  978 /usr/libexec/gdm-simple-slave --display-id /org/gn...
		  ├  982 /usr/bin/Xorg :0 -background none -verbose -auth /...
		  └ 1201 /usr/bin/gnome-keyring-daemon --daemonize --login
[root@hoho0 user1]#
 
[root@hoho0 user1]# systemctl --failed
UNIT                LOAD   ACTIVE SUB    JOB DESCRIPTION
httpd.service       loaded failed failed     The Apache HTTP Server (prefork MPM
udev-settle.service loaded failed failed     udev Wait for Complete Device Initi

LOAD   = Reflects whether the unit definition was properly loaded.
ACTIVE = The high-level unit activation state, i.e. generalization of SUB.
SUB    = The low-level unit activation state, values depend on unit type.
JOB    = Pending job for the unit.

2 units listed. Pass --all to see inactive units, too.
[root@hoho0 user1]# 


I wonder what is the effect of the failed udev-settle.service.

I may just skip the F15 system with a problem and backup everything and go to F16 from F15. (There is an agony with RAID1 on /boot, but there is always a RAID problem at the beginning of each new release - I can deal with it, but a broken systemd is another thing).

Comment 13 Michal Schmidt 2012-03-19 13:09:45 UTC
(In reply to comment #10)
> From the ssh terminal, I typed:
> [root@hoho6 ~]# /etc/X11/prefdm
> [root@hoho6 ~]#
> 
> It seemed to run and then immediately exit.

Try running /usr/sbin/gdm directly. Hopefully it will be more verbose.

systemd did everything it could to spawn the service. That it fails is not systemd's fault. Reassigning to gdm.

Comment 14 Bob Gustafson 2012-03-20 15:59:37 UTC
I don't have that F15 anymore, went to F16.

As I had mentioned before, installing F on RAID has always been a problem, and one that I was trying to postpone by hanging on to F15 for awhile longer.

I was able to overcome the RAID1 installation problems. The key (I think) is to create a new partition table, making sure it is gpt instead of mdos. Anaconda will give up easily with gpt and go to mdos if there is anything like a mdos partition table on the disks. This will doom further effort.

I should write this up in a separate bug report, not here.

Comment 15 Michal Schmidt 2012-03-20 16:07:06 UTC
(In reply to comment #14)
> I don't have that F15 anymore, went to F16.

No way to debug this further then. Closing.

Comment 16 Bob Gustafson 2012-03-20 17:55:36 UTC
Not so fast. The problem now exists on my newly installed F16 system.

I was able to do stuff for a day, then rebooted. Now have same symptoms as originally reported with the F15 system.

[user1@hoho6 ~]$ systemctl status prefdm.service
prefdm.service - Display Manager
	  Loaded: loaded (/lib/systemd/system/prefdm.service; static)
	  Active: failed since Tue, 20 Mar 2012 12:33:35 -0500; 9min ago
	 Process: 1230 ExecStart=/etc/X11/prefdm -nodaemon (code=exited, status=2)
	  CGroup: name=systemd:/system/prefdm.service
[user1@hoho6 ~]$ 

[user1@hoho6 ~]$ systemctl --failed
UNIT           LOAD   ACTIVE SUB    JOB DESCRIPTION
prefdm.service loaded failed failed     Display Manager

LOAD   = Reflects whether the unit definition was properly loaded.
ACTIVE = The high-level unit activation state, i.e. generalization of SUB.
SUB    = The low-level unit activation state, values depend on unit type.
JOB    = Pending job for the unit.

1 units listed. Pass --all to see inactive units, too.
[user1@hoho6 ~]$ 

[user1@hoho6 ~]$ systemctl list-jobs
 JOB UNIT                      TYPE            STATE  
   1 graphical.target          start           waiting
   2 multi-user.target         start           waiting
 118 plymouth...t-wait.service start           running
 121 systemd-...nlevel.service start           waiting
 128 systemd-...ead-done.timer start           waiting

5 jobs listed.
[user1@hoho6 ~]$ 

I will create a new bug for F16 and reference this one.