Bug 967521 - /var/log/journal breaks system startup
Summary: /var/log/journal breaks system startup
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: plymouth
Version: 27
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Ray Strode [halfline]
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard: RejectedBlocker RejectedFreezeException
: 970801 983796 985038 988231 996370 1016137 (view as bug list)
Depends On:
Blocks: heisenbug 1021605
TreeView+ depends on / blocked
 
Reported: 2013-05-27 10:50 UTC by Dong Chen
Modified: 2018-11-30 22:19 UTC (History)
76 users (show)

Fixed In Version: plymouth-0.8.9-0.2013.03.26.1.fc19
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-11-30 22:19:46 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
my gdm.service file (354 bytes, text/plain)
2013-05-27 10:50 UTC, Dong Chen
no flags Details
journalctl last boot (192.69 KB, text/plain)
2013-06-08 21:33 UTC, Igor Gnatenko
no flags Details
jorunalctl last boot (216.66 KB, text/plain)
2013-07-19 19:15 UTC, Emiliano Barrionuevo
no flags Details
script modified and working (6.25 KB, text/plain)
2013-08-08 14:43 UTC, Lorenzo Mastrogiacomi
no flags Details
Output of "journalctl -o short-monotonic -ab" of boot with /var/log/journal (271.91 KB, text/plain)
2013-09-06 13:49 UTC, Tom London
no flags Details
Output of "systemd-analyze plot" of system with /var/log/journal (92.29 KB, image/svg+xml)
2013-09-06 14:07 UTC, Tom London
no flags Details
Output of "journalctl -o short-monotonic -ab" of boot with /var/log/journal (debug_level=info) (271.52 KB, text/plain)
2013-09-07 21:12 UTC, Tom London
no flags Details
Output of "journalctl -o short-monotonic -ab" of boot with no /var/log/journal (debug_level=info) (271.57 KB, text/plain)
2013-09-07 21:13 UTC, Tom London
no flags Details
Output of "journalctl -o short-monotonic -ab" of boot with /var/log/journal and with auditd disabled (debug_level=debug) (932.93 KB, text/plain)
2013-09-11 02:18 UTC, Tom London
no flags Details
output of systemd-analyze plot (176.14 KB, application/xml)
2013-09-21 11:54 UTC, Robert Muil
no flags Details
Output of journalctl -o short-monotonic -ab with no /var/log/journal (180.73 KB, text/plain)
2013-09-21 12:12 UTC, Robert Muil
no flags Details
Output of "journalctl -o short-monotonic -ab" with "systemd.log_level=debug" on the kernel command line (356.87 KB, text/plain)
2013-10-23 16:41 UTC, Germano Massullo
no flags Details
Output of journalctl -o short-monotonic -ab booted with systemd.log_level=debug (391.48 KB, text/plain)
2013-10-26 21:15 UTC, David Wilkins
no flags Details
Output of journalctl -o short-monotonic -ab (systemd.log_level=debug) with kdm kill (435.75 KB, text/plain)
2013-10-26 23:13 UTC, David Wilkins
no flags Details
Output of journalctl -o short _COMM=yum _COMM=kdm | egrep 'kde-workspace|Quitting Plymouth' (8.58 KB, text/plain)
2013-10-29 10:37 UTC, David Wilkins
no flags Details
Output of journalctl -o short-monotonic _COMM=systemd _COMM=kdm | egrep '[Pp]lymouth|System Initialization|kdm' (199.19 KB, text/plain)
2013-10-29 11:12 UTC, David Wilkins
no flags Details
Proposed patch to the kde-workspace-4.11.1-kdm-logind-multiseat.patch (862 bytes, patch)
2013-10-30 10:32 UTC, David Wilkins
no flags Details | Diff
Output of journalctl -o short-monotonic -ab with kdm patched to generate journal messages (183.92 KB, text/plain)
2013-10-30 22:44 UTC, David Wilkins
no flags Details
Log file Xorg.0.log from computer failing to transition to X on boot (27.00 KB, text/x-log)
2013-10-31 14:26 UTC, David Wilkins
no flags Details
Log file Xorg.0.log.old from same computer and time as Comment 129 (27.50 KB, text/plain)
2013-10-31 14:45 UTC, David Wilkins
no flags Details
Output of journalctl -o short-monotonic -ab for boot with Xorg.0.log attached to Comments 129 and 130 (2.68 MB, text/plain)
2013-10-31 14:54 UTC, David Wilkins
no flags Details
Xorg.0.log.dated-2013-10-31-1513.txt, no transition to X, no subsequent kdm restart (27.50 KB, text/plain)
2013-10-31 16:17 UTC, David Wilkins
no flags Details
Output of journalctl -o short-monotonic -ab to accompany Xorg.0.log of Comment 134 (192.56 KB, text/plain)
2013-10-31 16:22 UTC, David Wilkins
no flags Details
Xorg.0.log.dated-2013-10-31-1548.txt, to virtual terminals, before kdm restart (24.58 KB, text/plain)
2013-10-31 16:32 UTC, David Wilkins
no flags Details
Output of journalctl -o short-monotonic -ab to accompany Xorg.0.log of Comment 136 (223.73 KB, text/plain)
2013-10-31 16:36 UTC, David Wilkins
no flags Details
Output of journalctl -o short-monotonic -ab after update to plymouth-0.8.9-0.2013.03.26.1 (214.97 KB, text/plain)
2013-10-31 22:44 UTC, David Wilkins
no flags Details

Description Dong Chen 2013-05-27 10:50:03 UTC
Created attachment 753554 [details]
my gdm.service file

Description of problem:
every f19 boot time, the boot message show "Failed to start Wait for Plymouth Boot Screen to Quit", it take a long time to reach gnome login from booting.

Version-Release number of selected component (if applicable):


How reproducible:
every boot time

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:
my boot.log:        
         Starting Wait for Plymouth Boot Screen to Quit...
         Starting GNOME Display Manager...
         Starting Kernel Samepage Merging (KSM) Tuning Daemon...
[[32m  OK  [0m] Started Machine Check Exception Logging Daemon.
[[32m  OK  [0m] Started Install ABRT coredump hook.
[[32m  OK  [0m] Started Kernel Samepage Merging (KSM) Tuning Daemon.
[[32m  OK  [0m] Started LSB: Start the iprupdate utility.
[[32m  OK  [0m] Started LSB: Start the ipr init daemon.
         Starting LSB: Start the ipr dump daemon...
[[32m  OK  [0m] Started LSB: Start the ipr dump daemon.
[[32m  OK  [0m] Started Collect UEFI-saved oopses for ABRT.
[[32m  OK  [0m] Started Login Service.
[[32m  OK  [0m] Started Bluetooth service.
[[32m  OK  [0m] Reached target Bluetooth.
[[32m  OK  [0m] Started RealtimeKit Scheduling Policy Service.
[[32m  OK  [0m] Started GNOME Display Manager.
[[0m[31m*     [0m] (1 of 7) A start job is running for firewalld - dynamic firewall daemon
[K[[1;31m*[0m[31m*    [0m] (1 of 7) A start job is running for firewalld - dynamic firewall daemon
[K[[31m*[1;31m*[0m[31m*   [0m] (1 of 7) A start job is running for firewalld - dynamic firewall daemon
[K[ [31m*[1;31m*[0m[31m*  [0m] (2 of 7) A start job is running for Avahi mDNS/DNS-SD Stack
[K[  [31m*[1;31m*[0m[31m* [0m] (2 of 7) A start job is running for Avahi mDNS/DNS-SD Stack
[K[[32m  OK  [0m] Started LSB: Automatically install DKMS modules for new kernels.
[   [31m*[1;31m*[0m[31m*[0m] (2 of 6) A start job is running for Avahi mDNS/DNS-SD Stack
[K[    [31m*[1;31m*[0m] (3 of 6) A start job is running for Restorecon maintaining path file context
[K[     [31m*[0m] (3 of 6) A start job is running for Restorecon maintaining path file context
[K[[1;31mFAILED[0m] Failed to start Wait for Plymouth Boot Screen to Quit.
See 'systemctl status plymouth-quit-wait.service' for details.

Comment 1 ell1e 2013-06-04 22:41:13 UTC
Same here, also standard Gnome 3/gdm. Output for systemctl status plymouth-quit-wait.service:

plymouth-quit-wait.service - Wait for Plymouth Boot Screen to Quit
   Loaded: loaded (/usr/lib/systemd/system/plymouth-quit-wait.service; disabled)
   Active: failed (Result: timeout) since Wed 2013-06-05 00:32:51 CEST; 2min 47s ago
 Main PID: 734
   CGroup: name=systemd:/system/plymouth-quit-wait.service
 
Jun 05 00:32:51 jth.lan systemd[1]: plymouth-quit-wait.service operation timed out. Terminating.
Jun 05 00:32:51 jth.lan systemd[1]: Failed to start Wait for Plymouth Boot Screen to Quit.
Jun 05 00:32:51 jth.lan systemd[1]: Unit plymouth-quit-wait.service entered failed state.

Comment 2 Igor Gnatenko 2013-06-08 21:33:19 UTC
Created attachment 758633 [details]
journalctl last boot

journalctl -b

Comment 3 Himanshu Garg 2013-07-04 03:08:44 UTC
I too have the exact problem with Fedora 19 Final release & Gnome 3.8.2.

Comment 4 Himanshu Garg 2013-07-05 03:10:48 UTC
(In reply to Himanshu Garg from comment #3)
> I too have the exact problem with Fedora 19 Final release & Gnome 3.8.2.


After upgrading to Fedora 19, I see during boot up the following things which have increased the boot time.

(1 of 4) A start job is running for firewalld - dynamic firewall daemon

(2 of 4) A start job is running for Wait for Plymouth Boot Screen to Quit

(3 of 4) A start job is running for LSB: Automatically install DKMS modules for new kernels

(4 of 4) A start job is running for Accounts Service

Failed to start Wait for Plymouth Boot Screen to Quit.
See 'systemctl status plymouth-quit-wait.service' for details

Comment 5 Garve 2013-07-07 09:05:53 UTC
I have the same problem. F19, Gnome, upgraded from F18 with FedUp.

Comment 6 James 2013-07-07 17:59:48 UTC
Same problem here. F19_x64, Gnome 3.8.2 upgraded from F18_x64 using Fedup. 

But, after login in I can start "plymouth-quit-wait.service" from the terminal without any error message using the command:

systemctl status plymouth-quit-wait.service

Comment 7 neronio 2013-07-08 08:30:45 UTC
Same problem. Fresh install of Fedora 19 x86_64

Comment 8 Toby Ovod-Everett 2013-07-09 00:15:11 UTC
I'm seeing the problem with plymouth-quit-wait.service when using the Fedora 19 Desktop Edition (i.e. GNOME-based) x86_64 Live DVD (i.e. http://download.fedoraproject.org/pub/fedora/linux/releases/19/Live/x86_64/Fedora-Live-Desktop-x86_64-19-1.iso as downloaded on July 5th).

It took me a bunch of troubleshooting to discover that if I simply waited long enough, X would start up!  I kept hearing the DVD drive stop thrashing, and after 15 or 30 seconds of waiting assumed there was a problem.  I did some testing with rhgb disabled (using 'e' at the GRUB screen) and while looking through various log files, X eventually started.  I subsequently did some testing with an unmodified boot and discovered that if I was simply patient enough everything would be fine.

The system I'm testing on has an ASUS P8H67-I Deluxe w/ a Intel(R) Core(TM) i5-2400 CPU @ 3.10GHz (fam: 06, model: 2a, stepping: 07).  I'm using the onboard video (i.e. Intel HD Graphics 3000).

Comment 9 Couret Charles-Antoine 2013-07-09 15:57:22 UTC
Same problem with Fedora 19 x86_64 Gnome edition.
I use this version for Alpha release !

Comment 10 Toby Ovod-Everett 2013-07-19 02:34:30 UTC
For whatever it's worth, I'm not seeing this problem after doing a full install of Fedora 19 x86_64 using the "GNOME Desktop" Base Environment.  I did see it when booting from the Live DVD in order to do final backups of my Fedora 17 system prior to wiping the SSD and doing my Fedora 19 install.

Comment 11 Himanshu Garg 2013-07-19 03:47:31 UTC
I had fresh installed Fedora 19 with Gnome and no problem like this.  It was a problem after upgrade.

Comment 12 tojamarcin 2013-07-19 15:07:09 UTC
Same problem.  Fedora 19 x86_64

Comment 13 Emiliano Barrionuevo 2013-07-19 19:15:29 UTC
Created attachment 775913 [details]
jorunalctl last boot

My journal from the last boot.

Comment 14 Wei Mingzhi 2013-07-21 12:55:59 UTC
Same problem.  Fedora 19 x86_64

Comment 15 tathanhdinh 2013-07-26 22:28:24 UTC
I have the same problem since the last 2 days.

Comment 16 Chris 2013-07-30 22:55:29 UTC
I have the same problem, but a bit different behavior. Usually system boots fast, goes well, but says it failed, but sometimes it boots that long that I lost my patience (id est, it usually was booting fast so I lost it after 10 seconds) and forced computer restart, then booting was fine. It says
"See 'systemctl status plymouth-quit-wait.service' for details."
When asked it says:
[krzysztof@localhost ~]$ systemctl status plymouth-quit-wait.service
plymouth-quit-wait.service - Wait for Plymouth Boot Screen to Quit
   Loaded: loaded (/usr/lib/systemd/system/plymouth-quit-wait.service; disabled)
   Active: failed (Result: timeout) since Wed 2013-07-31 00:43:24 BST; 1min 14s ago
 Main PID: 550
   CGroup: name=systemd:/system/plymouth-quit-wait.service

When it refuses to boot fast I will see if it boots after longer time. Should you need any more data, please let me know.

Comment 17 Theodore Lee 2013-07-31 00:33:31 UTC
With regard to the comment above, I have seen similar behaviour (graphical boot failing to complete entirely) starting around the same time this bug started occurring, although I'm not certain it's directly related to this bug. GDM will start but fail to display the login screen, instead leaving the system stuck at the last Plymouth display. Switching into tty2 and checking gdm.service's status with systemctl works, and reports an apparently successful GDM start. I have noticed this log entry shows up most of the time when I run into this issue:

gdm[463]: GdmLocalDisplayFactory: Failed to issue method call: Timeout was reached

Running 'systemctl restart gdm.service' brings up the login screen properly without any problems.

Looking through the boot log snippets in this bug, it seems like a few of the people running into the bug (myself included) are using DKMS. Could that be a possible common factor?

Comment 18 chen haochuan 2013-07-31 02:15:53 UTC
According to this post: http://www.linuxquestions.org/questions/fedora-35/fedora-boot-problem-4175468468/
This problem can be workarounded by moving /var/log/journal to /var/log/journal.log
And also here: http://forums.fedoraforum.org/showpost.php?p=1660377&postcount=10

Comment 19 tathanhdinh 2013-07-31 10:48:41 UTC
This may come from an upgrade of KDE ?. I have reinstalled my Fedora x64, everything goes well, no more plymouth-quit-wait error. 
But after I upgraded to kde-4.10.97 (from kde-unstable repos), the error came back.

Comment 20 GWL 2013-07-31 16:34:01 UTC
Confirm mv /var/log/journal /var/log/journal.org solves the problem.

Comment 21 Peque 2013-08-01 17:31:46 UTC
I confirm as well that:

  # mv /var/log/journal /var/log/journal.org

Solves the problem.

Comment 22 Randy 2013-08-03 14:21:43 UTC
The way I fixed it:

yum -y erase plymouth

To be honest, seems like a cleaner method than renaming a log file for something that doesn't actually DO anything productive.

Comment 23 Germán Racca 2013-08-04 08:42:06 UTC
I also have the very same problem (Fedora 19) and I can confirm that doing

# mv /var/log/journal /var/log/journal.org

solves the problem (though I don't know why).

Comment 24 Mark Harig 2013-08-07 07:24:34 UTC
If the '--debug' switch is added to /usr/bin/plymouth in /usr/lib/systemd/system/plymouth-quit-wait.service, and the service is restarted:

$ sudo systemctl restart plymouth-quit-wait.service

Then the following messages are written to the system log:

[./ply-boot-client.c]                       ply_boot_client_connect:could not connect to /org/freedesktop/plymouthd: Connection refused
[./ply-boot-client.c]                       ply_boot_client_connect:trying old fallback path /ply-boot-protocol
[./ply-boot-client.c]                       ply_boot_client_connect:could not connect to /ply-boot-protocol: Connection refused
[./plymouth.c]                                          main:daemon not running
[./plymouth.c]                                          main:no need to wait

Comment 25 Lorenzo Mastrogiacomi 2013-08-08 14:43:04 UTC
Created attachment 784431 [details]
script modified and working

There are problems with this script:

[root@lof19 ~]# /sbin/plymouth-set-default-theme -R
/usr/share/plymouth/themes//.plymouth does not exist

I've done this modifications:

[root@lof19 ~]# diff /sbin/plymouth-set-default-theme.bak0 /sbin/plymouth-set-default-theme
7,8c7,8
< [ -z "$PLYMOUTH_CONFDIR" ] && PLYMOUTH_CONFDIR="/etc/plymouth/"
< [ -z "$PLYMOUTH_POLICYDIR" ] && PLYMOUTH_POLICYDIR="/usr/share/plymouth/"
---
> [ -z "$PLYMOUTH_CONFDIR" ] && PLYMOUTH_CONFDIR="/etc/plymouth"
> [ -z "$PLYMOUTH_POLICYDIR" ] && PLYMOUTH_POLICYDIR="/usr/share/plymouth"
73,74d72
< 
<         echo $THEME_NAME
149d146
< 
161a159
>         echo $THEME_NAME
174a173,174
> 
> [ -z "$THEME_NAME" ] && get_default_theme

And then:

plymouth-set-default-theme -R

The problem seems solved

Comment 26 Taunus 2013-08-12 11:48:37 UTC
I tried Lorenzos suggestion but it didn't help. Removing the /var/log/journal directory solved the problem.

Comment 27 Lukas Marek 2013-08-13 12:43:16 UTC
This issues sounds like systemd-journald problem. I've "disabled" journald (by moving the logs) on two of my systems and boot is suddenly 1.5-2x faster then before. Can someone poke systemd guys to look at this?

Comment 28 Tomas Vanderka 2013-09-02 18:08:41 UTC
I also had a problem with plymouth-quit-wait.service failing and long boot times.

Even after I masked it with "systemctl mask plymouth-quit-wait.service" the system was still taking over 20s getting to graphical login screen.

I found following errors in the logs

gdm[606]: Failed to set slave bus name on parent: Timeout was reached

and that led me to this

https://mail.gnome.org/archives/gdm-list/2013-February/msg00003.html

Apparently it was stuck in gdm and caused by my changes to dbus config I did a while ago. After reverting all dbus config files to distro defaults the system boots nicely in 2 seconds.

You can try this (YMMV) to check if you got any weird stuff in /etc/dbus-1, it should return nothing

"find /etc/dbus-1|xargs rpm -qf|sort|uniq|xargs rpmverify 2>&1|grep dbus"

Comment 29 Garve 2013-09-03 21:30:54 UTC
Hello shadof. Your command gives me the following:

"..?......    /lib64/dbus-1/dbus-daemon-launch-helper"

Does anyone know if this is ok? And if not, what can I do about it?

Comment 30 Scott Schmit 2013-09-04 03:20:51 UTC
(In reply to Garve from comment #29)
> Hello shadof. Your command gives me the following:
> 
> "..?......    /lib64/dbus-1/dbus-daemon-launch-helper"
> 
> Does anyone know if this is ok? And if not, what can I do about it?

You need to run the command as root, or you won't have permissions to read all the files.

Comment 31 Zbigniew Jędrzejewski-Szmek 2013-09-06 12:41:48 UTC
*** Bug 970801 has been marked as a duplicate of this bug. ***

Comment 32 Zbigniew Jędrzejewski-Szmek 2013-09-06 12:42:40 UTC
*** Bug 985038 has been marked as a duplicate of this bug. ***

Comment 33 Zbigniew Jędrzejewski-Szmek 2013-09-06 12:45:51 UTC
*** Bug 988231 has been marked as a duplicate of this bug. ***

Comment 34 Zbigniew Jędrzejewski-Szmek 2013-09-06 12:48:04 UTC
Hm, can someone knowledgable about plymouth tell it to log what it is attempting to do and why it is failing more verbosely? We have a *lot* of reports of it failing to quit, and no idea why.

Comment 35 Hin-Tak Leung 2013-09-06 13:06:47 UTC
Anybody looked at why there are the oddly named "/var/log/gdm/(null).log"'s yet?

And what graphic hardware are people failing on?

$ lspci | grep VGA
01:05.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] RS690M [Radeon Xpress 1200/1250/1270]

Comment 36 Zbigniew Jędrzejewski-Szmek 2013-09-06 13:09:57 UTC
(In reply to Hin-Tak Leung from comment #35)
> Anybody looked at why there are the oddly named "/var/log/gdm/(null).log"'s
> yet?
It would be great if you could file a separate bug against gdm, if you can reproduce it.

> And what graphic hardware are people failing on?
> 
> $ lspci | grep VGA
> 01:05.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI]
> RS690M [Radeon Xpress 1200/1250/1270]
I saw some nvidias in the other logs. So I don't think this is hardware specific.

Comment 37 Tom London 2013-09-06 13:35:35 UTC
From BZ: https://bugzilla.redhat.com/show_bug.cgi?id=988231#c12

 Harald Hoyer 2013-09-06 00:53:44 EDT

(In reply to Tom London from comment #10)
> Created attachment 794548 [details]
> Output of "systemd-analyze plot" of system with /var/log/journal

Wow.. please boot with "systemd.log_level=debug" on the kernel command line.
Then, when you are booted attach the output of:

# journalctl -o short-monotonic -ab

I cannot do this: my system freezes at boot just after the "Welcome to Fedora 21 (Rawhide)" message. It is completely unresponsive.

I attempted this several times, once waiting over 5 minutes at the point of the freeze before hard power cycling.

I checked and there was nothing in the journal for these attempted boots.

Comment 38 Hin-Tak Leung 2013-09-06 13:41:37 UTC
(In reply to Zbigniew Jędrzejewski-Szmek from comment #36)
> (In reply to Hin-Tak Leung from comment #35)
> > Anybody looked at why there are the oddly named "/var/log/gdm/(null).log"'s
> > yet?
> It would be great if you could file a separate bug against gdm, if you can
> reproduce it.

A new one ( "/var/log/gdm/(null).log.N", N a number) is created every time it fails to boot for me. I mentioned this in my bug 985038 , but nobody seems to corroborate it here? It should be easy, just look under /var/log/gdm/ whenever, as old logs are left behind.

Comment 39 Tom London 2013-09-06 13:49:27 UTC
Created attachment 794752 [details]
Output of "journalctl -o short-monotonic -ab" of boot with /var/log/journal

I'm not able to boot with "systemd.log_level=debug" (see above).

This is the requested output of journalctl from a boot without the enhanced logging.

What else would be helpful?

Comment 40 Tom London 2013-09-06 14:07:13 UTC
Created attachment 794765 [details]
Output of "systemd-analyze plot" of system with /var/log/journal

BTW, the boot freeze also occurs with systemd-206-11.fc21.x86_64.

Attaching fresh plot from boot with /var/log/journal (and systemd-206-11.fc21.x86_64).

Comment 41 Bruno Wolff III 2013-09-06 15:01:41 UTC
The gdm issue should be discussed in a separate bug. I have the plymouth quit problem on a machine that uses kdm, so I doubt gdm issue is causing the plymouth issue.

Comment 42 Tom London 2013-09-07 16:05:06 UTC
Booting with "systemd.log_level=debug" freezes boot similarly even if there is no /var/log/journal directory.

systemd-206-11.fc21.x86_64
kernel-3.12.0-0.rc0.git12.2.fc21.x86_64

Comment 43 Tom London 2013-09-07 21:11:12 UTC
Not sure if it is helpful, but I noticed that I can successfully boot with "systemd.log_level=info".

I booted that way, both with and without /var/log/journal.

Here is a snippet from "journalctl -o short-monotonic -ab" from booting with /var/log/journal:

[   16.061940] tlondon.localhost.org systemd[1]: Started Mark the need to relabel after reboot.
[   16.062238] tlondon.localhost.org systemd[1]: Started Reconfigure the system on administrator request.
[   16.059159] tlondon.localhost.org kernel: SELinux: initialized (dev sda1, type ext4), uses xattr
[   16.062873] tlondon.localhost.org systemd[1]: Started Relabel all filesystems, if necessary.
[   16.063157] tlondon.localhost.org systemd[1]: Starting Recreate Volatile Files and Directories...
[   16.069045] tlondon.localhost.org systemd[1]: Starting Tell Plymouth To Write Out Runtime Data...
[   16.070759] tlondon.localhost.org systemd[1]: Starting Trigger Flushing of Journal to Persistent Storage...
[   16.071157] tlondon.localhost.org systemd[1]: Starting Security Auditing Service...
[   16.877299] tlondon.localhost.org systemd-journal[293]: Permanent journal is using 30.8M (max 4.0G, leaving 4.0G of free 84.6G, current limit 4.0G).
[   42.104035] tlondon.localhost.org systemd[1]: Started Trigger Flushing of Journal to Persistent Storage.
[   42.144998] tlondon.localhost.org systemd[1]: Started Tell Plymouth To Write Out Runtime Data.
[   42.150919] tlondon.localhost.org systemd[1]: Started Security Auditing Service.
[   42.151789] tlondon.localhost.org systemd[1]: Started Recreate Volatile Files and Directories.
[   42.155190] tlondon.localhost.org systemd[1]: Starting Update UTMP about System Reboot/Shutdown...

Notice the jump from 16.8 seconds -> 42.1 seconds after "systemd-journal[293]: Permanent journal is using 30.8M (max 4.0G, leaving 4.0G of free 84.6G, current limit 4.0G)." and before "Started Trigger Flushing of Journal ...".

[   15.945369] tlondon.localhost.org systemd[1]: Started Mark the need to relabel after reboot.
[   15.945633] tlondon.localhost.org systemd[1]: Started Reconfigure the system on administrator request.
[   15.945899] tlondon.localhost.org systemd[1]: Started Relabel all filesystems, if necessary.
[   15.946175] tlondon.localhost.org systemd[1]: Starting Recreate Volatile Files and Directories...
[   15.946461] tlondon.localhost.org systemd[1]: Starting Tell Plymouth To Write Out Runtime Data...
[   15.963542] tlondon.localhost.org systemd[1]: Starting Trigger Flushing of Journal to Persistent Storage...
[   15.963843] tlondon.localhost.org systemd[1]: Starting Security Auditing Service...
[   15.964569] tlondon.localhost.org systemd[1]: Started Trigger Flushing of Journal to Persistent Storage.
[   15.965675] tlondon.localhost.org systemd-journal[288]: Runtime journal is using 1.4M (max 192.3M, leaving 288.4M of free 1.8G, current limit 192.3M).
[   16.006991] tlondon.localhost.org systemd[1]: Started Tell Plymouth To Write Out Runtime Data.
[   16.129848] tlondon.localhost.org auditd[424]: Started dispatcher: /sbin/audispd pid: 431
[   16.130742] tlondon.localhost.org audispd[431]: priority_boost_parser called with: 4
[   16.131075] tlondon.localhost.org audispd[431]: max_restarts_parser called with: 10
[   16.131343] tlondon.localhost.org audispd[431]: audispd initialized with q_depth=150 and 1 active plugins

"systemd-journal[288]: Runtime journal is using 1.4M ..." is only .1 seconds after the previous line.

Looks like "Starting Trigger Flushing of Journal to Persistent Storage" is a tad earlier.

For completeness, I attached both outputs of "journalctl -o short-monotonic -ab" below.

Also, here is what "ls -l /var/log/journal/Funny-System-Number/" says:

[root@tlondon 6f397d582f6f6691f48a1fa74b338324]# ls -l
total 34072
-rw-r-----+ 1 root systemd-journal 25440256 Sep  7 14:06 system.journal
-rw-r-----+ 1 root systemd-journal  5505024 Sep  7 14:02 user-1000.journal
-rw-r-----+ 1 root systemd-journal  3891200 Sep  7 13:57 user-42.journal
[root@tlondon 6f397d582f6f6691f48a1fa74b338324]#

Comment 44 Tom London 2013-09-07 21:12:41 UTC
Created attachment 795243 [details]
Output of "journalctl -o short-monotonic -ab" of boot with /var/log/journal (debug_level=info)

Comment 45 Tom London 2013-09-07 21:13:36 UTC
Created attachment 795244 [details]
Output of "journalctl -o short-monotonic -ab" of boot with no /var/log/journal (debug_level=info)

Comment 46 Scott Schmit 2013-09-08 05:29:32 UTC
I too have a (null).log file, though it hasn't been touched in a while--possibly because I haven't had an issue with boot (other than being very slow) recently?

Relevent-looking contents:
(EE) 
Fatal server error:
(EE) xf86OpenConsole: VT_WAITACTIVE failed: Input/output error
(EE) 
(EE) 
Please consult the Fedora Project support 
         at http://wiki.x.org
 for help. 
(EE) Please also check the log file at "/var/log/Xorg.0.log" for additional information.
(EE) 
(EE) 
FatalError re-entered, aborting
(EE) xf86CloseConsole: VT_ACTIVATE failed: Input/output error
(EE)

Comment 47 Christopher Meng 2013-09-08 22:24:02 UTC
Lifting need info now.

Comment 48 Zbigniew Jędrzejewski-Szmek 2013-09-10 15:14:46 UTC
(In reply to Tom London from comment #43)
> Not sure if it is helpful, but I noticed that I can successfully boot with
> "systemd.log_level=info".
[snip]

This indeed doesn't say much. Can you boot with the following file:

# /etc/systemd/system/systemd-journald.service.d/debug.conf
[Service]
Environment=SYSTEMD_LOG_LEVEL=debug SYSTEMD_LOG_TARGET=kmsg

This will make systemd-journald log what it is doing. The output will be in dmesg.
Might help us to understand what is going on.

Comment 49 Zbigniew Jędrzejewski-Szmek 2013-09-10 15:17:28 UTC
(In reply to Tom London from comment #43)
> Not sure if it is helpful, but I noticed that I can successfully boot with
> "systemd.log_level=info".
[snip]

This indeed doesn't say much. It is possible that systemd-journald is
running fine long before the "Started Trigger Flushing" message appears.
Can you boot with the following file:

# /etc/systemd/system/systemd-journald.service.d/debug.conf
[Service]
Environment=SYSTEMD_LOG_LEVEL=debug SYSTEMD_LOG_TARGET=kmsg

This will make systemd-journald log what it is doing. The output will be in dmesg.
Might help us to understand what is going on.

Comment 50 Tom London 2013-09-10 16:28:13 UTC
In frustration, I figured out that adding "debug" in addition to "systemd.log_info=debug" unfreezes the boot and makes debugging output work.

I've posted the log outputs and plot outputs to a Rawhide/systemd tt: https://bugzilla.redhat.com/show_bug.cgi?id=1006386

Unless I hear that it is unnecessary, I will do the above this evening when I get back to my system...

Comment 51 Tom London 2013-09-11 02:18:20 UTC
Created attachment 796210 [details]
Output of "journalctl -o short-monotonic -ab" of boot with /var/log/journal and with auditd disabled (debug_level=debug)

I managed to make 'systemd.log_level=debug' work.

Does this file suffice?

I've also posted this to https://bugzilla.redhat.com/show_bug.cgi?id=1006386

Comment 52 Mark F. Liu 2013-09-18 15:38:26 UTC
Same problem here with Fedora 19_x64.

Comment 53 Michael 2013-09-19 12:52:15 UTC
Same problem here with Fedora 19_x64.

+ Bug 1009891 (more infos, maybe that helps)

Comment 54 Robert Muil 2013-09-21 11:52:12 UTC
I am experiencing this issue, and I have not yet been able to work around it: I cannot get my graphical session up.

I'm on Fedora 19, with the KDE spin, Dell latitude E6420 with NVidia video. I recently ran a yum update, and on reboot this problem came up.

The systemd messages indicate that it has
 - 'Reached target Multi-user System'
 - and 'Reached target Graphical Interface;
But I have no login screen.

I have tried moving /var/log/journal to /var/log/journal.org
And I have tried the suggested init 3 -> init 5 from tty2.

Neither worked.

I will attach my systemd-analyze plot and journalctl -b logs in case they help.

Comment 55 Robert Muil 2013-09-21 11:54:13 UTC
Created attachment 800893 [details]
output of systemd-analyze plot

Comment 56 Robert Muil 2013-09-21 12:12:20 UTC
Created attachment 800915 [details]
Output of journalctl -o short-monotonic -ab with no /var/log/journal

Comment 57 Robert Muil 2013-09-21 14:43:16 UTC
Update: after much mucking around diagnosing the plymouth-quit-wait issue, I realised that the failure to bring up the GUI was a failing nvidia driver caused by recent kernel update to 3.11.

In case anyone else also runs into the nvidia driver failure and is confused by the unrelated Plymouth quite wait bug, this is the fix for the driver issue that resolved the GUI failure for me: http://www.forums.fedoraforum.org/showthread.php?p=1669791#post1669791

The original problem with the plymouth-quit-wait service timing out still exists on my machine, regardless of the debug flags in the kernel command, or the presence of the /var/log/journal directory.

Comment 58 Christopher Meng 2013-09-21 14:47:50 UTC
(In reply to Robert Muil from comment #57)
> Update: after much mucking around diagnosing the plymouth-quit-wait issue, I
> realised that the failure to bring up the GUI was a failing nvidia driver
> caused by recent kernel update to 3.11.


This issue already happened in 3.9 era, I don't think so.

Comment 59 M Frost 2013-09-25 23:04:08 UTC
(In reply to Christopher Meng from comment #58)
> (In reply to Robert Muil from comment #57)
> > Update: after much mucking around diagnosing the plymouth-quit-wait issue, I
> > realised that the failure to bring up the GUI was a failing nvidia driver
> > caused by recent kernel update to 3.11.
> 
> 
> This issue already happened in 3.9 era, I don't think so.

(In reply to Robert Muil from comment #57)
> Update: after much mucking around diagnosing the plymouth-quit-wait issue, I
> realised that the failure to bring up the GUI was a failing nvidia driver
> caused by recent kernel update to 3.11.
> 
> In case anyone else also runs into the nvidia driver failure and is confused
> by the unrelated Plymouth quite wait bug, this is the fix for the driver
> issue that resolved the GUI failure for me:
> http://www.forums.fedoraforum.org/showthread.php?p=1669791#post1669791
> 
> The original problem with the plymouth-quit-wait service timing out still
> exists on my machine, regardless of the debug flags in the kernel command,
> or the presence of the /var/log/journal directory.

I have (on different computers) both the lack of login screen and the failure of the nvidia driver by the recent kernel update. The lack of login screen was fixed by moving /var/log/journal (see comments 20,21 and 23). The nvidia error was fixed by reverting to a kernel where it worked and only updating the kernel when the appropriate kmod-nvidia is available for the kernel update.

As a result I am on kernel 3.10.11 and haven't installed 3.11.1 yet.

Comment 60 Hin-Tak Leung 2013-09-25 23:46:24 UTC
I do believe it is hardware/3d/mesa related, but not to specific driver, and not nvidia specific. Fwiw, I have an ATI card, and had the prob even with f19 on an f18 kernel, 3.9.x. 

I suspect it was when mesa got llvmpipe.

Comment 61 Braden 2013-10-01 00:15:22 UTC
Started having this problem with the latest kernel update.

# mv /var/log/journal /var/log/journal.org

Solves the problem!

Comment 62 Ahmed Noor Kader Mustajir Md Eusoff 2013-10-04 03:13:15 UTC
I have the same issues, but using 

# mv /var/log/journal /var/log/journal.org

solved my problem..

Comment 63 Hin-Tak Leung 2013-10-04 17:45:35 UTC
(In reply to Ahmed Noor Kader Mustajir B. Md Eusoff from comment #62)
> I have the same issues, but using 
> 
> # mv /var/log/journal /var/log/journal.org
> 
> solved my problem..

I wish people stop spreading such voodoo/mis-information. As far as I understand it, if systemd encounters a serious error condition, it would stop the graphic boot screen and automatically goes back to the text console to displace the error/warning. And I think 'oops, the log directory disappeared, cannot write to log' is a serious error condition.
So, any "serious" boot errors or warnings, if you can intentionally *make* one, would work.

Failure to write logs is not a fatal error, so it continues in the text console until the gui log-in takes over. Does this sounds more reasonable? IMHO, removing the plymouth package so that it always does text console boot, or manually remembering to press ESC to get out of graphic boot (like I do), sounds more reasonable.

FWIW, I was very surprised to see my system going directly from graphic boot to auto-login live user, when booted with the F19 live USB. (I don't usually do that - just trying to fix a disk error which auto-fsck doesn't want to fix)

So - on my system,
  - text boot + gdm WORKS
  - graphic boot + gdm FAILS
  - graphic boot + live-user auto-login WORKS

This suggests that the problem is between graphic boot and the authentication GUI.

So can anybody disprove this - i.e. is anybody failing on a "graphic boot + single-user auto-login" configuration?

(I assume nobody is failing on "text boot + auto-login". That configuration is 
rare probably nobody does it, or failing over it)

Comment 64 Brent R Brian 2013-10-05 13:12:34 UTC
f19 x86_64 

1) LAPTOP updated with FEDUP from 18, dual core, 4G RAM, INTEL HD Graphics
2) DESKTOP fresh installed (kept /home), dual core, 4G RAM, AMD Radeon

Both system fine, fast, working good until recently updates.  Both system became EXTREMELY slow to boot/shutdown.

Both systems system stopped at F splash screen, plymouth-quit-wait.service fails
     soft reboot normally got it past the issue

DESKTOP I enabled "test updates" and updated ... reboot ... even SLOWER.

Read above that wiping /var/log/journal sub-tree would help.

DESKTOP works.  Fast boot/shutdown.  No Plymouth failure.

Comment 65 Ahmed Noor Kader Mustajir Md Eusoff 2013-10-07 02:34:46 UTC
(In reply to Hin-Tak Leung from comment #63)
> (In reply to Ahmed Noor Kader Mustajir B. Md Eusoff from comment #62)
> > I have the same issues, but using 
> > 
> > # mv /var/log/journal /var/log/journal.org
> > 
> > solved my problem..
> 
> I wish people stop spreading such voodoo/mis-information. As far as I
> understand it, if systemd encounters a serious error condition, it would
> stop the graphic boot screen and automatically goes back to the text console
> to displace the error/warning. And I think 'oops, the log directory
> disappeared, cannot write to log' is a serious error condition.
> So, any "serious" boot errors or warnings, if you can intentionally *make*
> one, would work.
> 
> Failure to write logs is not a fatal error, so it continues in the text
> console until the gui log-in takes over. Does this sounds more reasonable?
> IMHO, removing the plymouth package so that it always does text console
> boot, or manually remembering to press ESC to get out of graphic boot (like
> I do), sounds more reasonable.
> 
> FWIW, I was very surprised to see my system going directly from graphic boot
> to auto-login live user, when booted with the F19 live USB. (I don't usually
> do that - just trying to fix a disk error which auto-fsck doesn't want to
> fix)
> 
> So - on my system,
>   - text boot + gdm WORKS
>   - graphic boot + gdm FAILS
>   - graphic boot + live-user auto-login WORKS
> 
> This suggests that the problem is between graphic boot and the
> authentication GUI.
> 
> So can anybody disprove this - i.e. is anybody failing on a "graphic boot +
> single-user auto-login" configuration?
> 
> (I assume nobody is failing on "text boot + auto-login". That configuration
> is 
> rare probably nobody does it, or failing over it)

so your sugestion to solve this problem?

Comment 66 Brent R Brian 2013-10-07 10:48:36 UTC
f19 x86_64 

1) LAPTOP updated with FEDUP from 18, dual core, 4G RAM, INTEL HD Graphics
2) DESKTOP fresh installed (kept /home), dual core, 4G RAM, AMD Radeon

Both system fine, fast, working good until recently updates.  Both system became EXTREMELY slow to boot/shutdown.

Both systems system stopped at F splash screen, plymouth-quit-wait.service fails
     soft reboot normally got it past the issue

LAPTOP - I did not enable test updates.  Blew away /var/log/journal sub-tree and the laptop no longer stops at splash screen "forever".

BOTH THE DESKTOP SYSTEM AND LAPTOP SYSTEM continue to have FAILURES with plymouth-quit-wait.service ... so the /var/log/journal must just be teasing some race condition.

If you need data ... please advise.

Comment 67 glad08 2013-10-08 08:14:55 UTC
*** Bug 1016137 has been marked as a duplicate of this bug. ***

Comment 68 Stijn Hoop 2013-10-10 08:49:47 UTC
(In reply to Hin-Tak Leung from comment #38)
> (In reply to Zbigniew Jędrzejewski-Szmek from comment #36)
> > (In reply to Hin-Tak Leung from comment #35)
> > > Anybody looked at why there are the oddly named "/var/log/gdm/(null).log"'s
> > > yet?
> > It would be great if you could file a separate bug against gdm, if you can
> > reproduce it.
> 
> A new one ( "/var/log/gdm/(null).log.N", N a number) is created every time
> it fails to boot for me. I mentioned this in my bug 985038 , but nobody
> seems to corroborate it here? It should be easy, just look under
> /var/log/gdm/ whenever, as old logs are left behind.

I noticed occasional waiting on plymouth-quit-wait.service for some time now, but never bothered to look up the cause. Now that I have I can tell that I do not have any

/var/log/gdm/(null).log

or numbered file.

Comment 69 Riku Seppala 2013-10-12 07:55:21 UTC
Today for the first time I kept hitting this bug even after several reboots. So I finally tried rm -rf journal on /var/log and rebooted. Everything is working now. This time I'm not even seeing bug 983110 either.

Comment 70 Frank McCormick 2013-10-13 18:44:05 UTC
I have had this problem for several months now - but I am using lightdm instead of GDM.

Intel 965 on-board video -

It takes 20-25 sec before lightdm comes up --- usually right after I see the ether lights flicker on the computer. Could it be network connected somehow ?

Comment 71 Rui Mota 2013-10-14 09:12:36 UTC
It happens frequently and it really really really anoying.
Despite all reports in bugzilla I've not seen any effort to resolve this.
I consider it now a HIGH SEVERITY BUG, that is living for too long in the system of Fedora lovers.
This boot issue MUST be corrected as soon as possible as it's a very bad image for Fedora product.
I was showing the system to friends and stops at boot? That is ridiculous and everyone laughts at my face and most tell me to use a REAL Linux as Ubuntu.
Is that what you want? 
To pass this bad image to the world right on boot?
Could be somewhere else and no one would really mind or notice, but at boot?

It's like a full bus and everyone happy, singing and willing for the trip and then it doesn't start.

Comment 72 Ruben Guerra Marin 2013-10-16 21:51:06 UTC
I have the same problem,., but only happens once in a while =/

Comment 73 Germano Massullo 2013-10-17 14:20:48 UTC
I just closed a my bugreport about nouveau drivers after finding out that it was a KDM trouble: https://bugzilla.redhat.com/show_bug.cgi?id=1018825

Comment 74 Sergio Basto 2013-10-17 16:44:10 UTC
> I solved by doing rm -rf /var/log/journal/* ; reboot

why is a KDM problem , what KDM have to do with /var/log/journal/ ?

Comment 75 Rex Dieter 2013-10-17 16:48:46 UTC
It doesn't.  I think the comment was in error.

Comment 76 Tad Bilby 2013-10-18 23:52:48 UTC
I just had this problem. Fedora has been upgraded from FC17 with fedup to both FC18 and then fedup to FC19.  ctrl-alt-del did not force an "init 6", pressing power button did not reboot, there were no active vtty consoles (ctrl-alt-F2).  I had to hold the power switch and trust the ext4 journal..

[root@linux986 ~]# journalctl -b
-- Logs begin at Thu 2013-07-25 22:18:18 EDT, end at Fri 2013-10-18 19:20:42 EDT. --
Oct 18 19:12:24 PC systemd-journal[156]: Allowing runtime journal files to grow to 297.9M.
Oct 18 19:12:24 PC: kernel: Initializing cgroup subsys cpuset
Oct 18 19:12:24 PC: kernel: Initializing cgroup subsys cpu
Oct 18 19:12:24 PC: Initializing cgroup subsys cpuacct
Oct 18 19:12:24 PC: Linux version 3.11.4-201.fc19.x86_64 (mockbuild.fedoraproject.org) (gcc version 4.8.
Oct 18 19:12:24 PC: Command line: BOOT_IMAGE=/vmlinuz-3.11.4-201.fc19.x86_64 root=/dev/mapper/vg-lv00 ro rd.md=0 rd.dm=
Oct 18 19:12:24 PC kernel: e820: BIOS-provided physical RAM map:
<snip>
Oct 18 19:12:24 PC kernel: DMI: /DX58SO2, BIOS SOX5820J.86A.0920.2013.0729.0042 07/29/2013
<snip>
Oct 18 19:12:46 PC avahi-daemon[744]: Successfully called chroot().
Oct 18 19:12:46 PC abrtd[760]: '/var/tmp/abrt/oops-2013-01-17-21:55:22-2609-1' is not a problem directory
Oct 18 19:12:46 PC dbus[747]: [system] Rejected send message, 1 matched rules; type="method_call", sender=":1.5" (uid=0 pid=75 
Oct 18 19:12:46 PC avahi-daemon[744]: Successfully dropped remaining capabilities. 
Oct 18 19:12:46 PC abrtd[760]: Can't open file '/var/tmp/abrt/ccpp-2012-09-21-22:08:09-860/type': No such file or directory
Oct 18 19:12:46 PC avahi-daemon[744]: Loading service file /services/udisks.service.
<snip>
Oct 18 19:12:46 PC abrtd[760]: Can't open file '/var/tmp/abrt/ccpp-2012-09-21-22:08:09-860/type': No such file or directory
Oct 18 19:12:46 PC systemd[1]: Starting Authorization Manager...
Oct 18 19:12:46 PC colord[842]: Using config file /etc/colord.conf
Oct 18 19:12:46 PC abrtd[760]: Can't open file '/var/tmp/abrt/ccpp-2012-09-21-22:08:09-860/type': No such file or directory
Oct 18 19:12:46 PC abrtd[760]: Can't open file '/var/tmp/abrt/ccpp-2012-09-21-22:08:09-860/type': No such file or directory
Oct 18 19:12:46 PC abrtd[760]: Can't open file '/var/tmp/abrt/ccpp-2012-09-21-22:08:09-860/type': No such file or directory
Oct 18 19:12:46 PC polkitd[869]: Started polkitd version 0.112 
Oct 18 19:12:46 PC colord[842]: Using mapping database file /var/lib/colord/mapping.db
Oct 18 19:12:46 PC abrtd[760]: Can't open file '/var/tmp/abrt/ccpp-2012-09-21-22:08:09-860/type': No such file or directory
Oct 18 19:12:46 PC bluetoothd[735]: bluetoothd[735]: Adapter /org/bluez/735/hci0 has been enabled
Oct 18 19:12:46 PC bluetoothd[735]: Adapter /org/bluez/735/hci0 has been enabled
Oct 18 19:12:46 PC abrtd[760]: Can't open file '/var/tmp/abrt/ccpp-2012-09-21-22:08:09-860/type': No such file or directory
Oct 18 19:12:46 PC colord[842]: Using device database file /var/lib/colord/storage.db
Oct 18 19:12:46 PC abrtd[760]: Can't open file '/var/tmp/abrt/ccpp-2012-09-21-22:08:09-860/type': No such file or directory
Oct 18 19:12:46 PC abrtd[760]: Can't open file '/var/tmp/abrt/ccpp-2012-09-21-22:08:09-860/type': No such file or directory
Oct 18 19:12:46 PC abrtd[760]: '/var/tmp/abrt/ccpp-2012-09-21-22:08:09-860' is not a problem directory
Oct 18 19:12:46 PC abrtd[760]: Can't open file '/var/tmp/abrt/ccpp-2013-08-01-19:44:25-1940/type': No such file or directory
Oct 18 19:12:46 PC colord[842]: loaded plugin libcd_plugin_scanner.so
<snip>
[root@ ~]# systemctl status plymouth-quit-wait.service
plymouth-quit-wait.service - Wait for Plymouth Boot Screen to Quit
   Loaded: loaded (/usr/lib/systemd/system/plymouth-quit-wait.service; disabled)
   Active: inactive (dead) since Fri 2013-10-18 19:12:49 EDT; 33min ago
  Process: 755 ExecStart=/usr/bin/plymouth --wait (code=exited, status=0/SUCCESS)

Oct 18 19:12:49 PC systemd[1]: Started Wait for Plymouth Boot Screen to Quit.

Comment 77 Karel Volný 2013-10-20 16:44:52 UTC
as multiple users (including me(*)) can confirm that this problem gets "solved" by (re)moving /var/log/journal, I don't understand what are we still talking about here - this is clearly systemd problem, reassigning

(*) I've just verified the workaround on two different systems with different software, one of them don't having plymouth installed at all (see bug #983796 comment #42), but otherwise showing the same symptomes

Comment 78 Karel Volný 2013-10-20 16:50:30 UTC
*** Bug 983796 has been marked as a duplicate of this bug. ***

Comment 79 Rex Dieter 2013-10-20 18:58:29 UTC
*** Bug 996370 has been marked as a duplicate of this bug. ***

Comment 80 Tom London 2013-10-20 19:18:32 UTC
The best I've been able to narrow this down...

Below is snippet from  'journalctl -o short-monotonic -ab' showing an "unfortunate bracket" of startup activity.

The first "left bracket" is 'Starting Recreate Volatile Files and Directories' (systemd-tmpfiles-setup.service @ 17.801). Its "right bracket" occurs at 35.47.

Next is 'Starting to Tell Plymouth to Write Out Runtime Data...' (plymouth-read-write.service @ 17.806). Its "right bracket" occurs at 35.47.

Next is 'Starting to Trigger Flushing of Journal to Persistent Storage...' (systemd-journal-flush.service @ 17.809). Its "right bracket" occurs at 35.429, although there is a journal-related message at 25.461.

Finally, 'Starting Security Auditing Service...' (auditd.service @ 17.81). Its "right bracket" occurs at 35.471, although there is a auditd startup message at 24.75.

[   17.801999] tlondon.localhost.org systemd[1]: Starting Recreate Volatile Files and Directories...
[   17.806450] tlondon.localhost.org systemd[1]: Starting Tell Plymouth To Write Out Runtime Data...
[   17.809044] tlondon.localhost.org systemd[1]: Starting Trigger Flushing of Journal to Persistent Storage...
[   17.813061] tlondon.localhost.org systemd[1]: Starting Security Auditing Service...
[   24.755820] tlondon.localhost.org kernel: type=1305 audit(1382198994.414:9): audit_pid=638 old=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:auditd_t:s0 res=1
[   25.461388] tlondon.localhost.org systemd-journal[379]: Permanent journal is using 13.0M (max 4.0G, leaving 4.0G of free 76.2G, current limit 4.0G).
[   35.429729] tlondon.localhost.org systemd[1]: Started Trigger Flushing of Journal to Persistent Storage.
[   35.470976] tlondon.localhost.org systemd[1]: Started Tell Plymouth To Write Out Runtime Data.
[   35.471516] tlondon.localhost.org systemd[1]: Started Security Auditing Service.
[   35.471793] tlondon.localhost.org systemd[1]: Started Recreate Volatile Files and Directories.

Comment 81 Brent R Brian 2013-10-20 20:07:20 UTC
I have two systems that get booted and shutdown several times a day, desktop and laptop.  The slower of the two (and most frequently rebooted) had this problem RETURN after the /var/log/journal was removed.  Repeating the removal fixed the problem again.

Comment 82 Rui Mota 2013-10-20 21:48:51 UTC
Deleting journal files will NOT resolve the problem. 
It will come back after days or weeks.
Its a matter of time.
I believe its a GDM service related.
For some reason it doesnt start maybe because Plymouth service fails to stop.

Comment 83 chen haochuan 2013-10-21 08:17:06 UTC
(In reply to Rui Mota from comment #82)
> Deleting journal files will NOT resolve the problem. 
> It will come back after days or weeks.
> Its a matter of time.
> I believe its a GDM service related.
> For some reason it doesnt start maybe because Plymouth service fails to stop.

You must delete the whole directory(not just the file in it) and do it again after updating systemd.

Comment 84 Karel Volný 2013-10-21 08:25:55 UTC
(In reply to Rui Mota from comment #82)
> Deleting journal files will NOT resolve the problem. 

that's why I'm using quotation marks around "solved" and talk about _workaround_

> It will come back after days or weeks.
> Its a matter of time.

yes, because the directory gets recreated on each systemd update (it is owned by systemd package), and then something in the logs will trigger the error condition again

if you dive deep into the forum/mailinglist posts linked in some comments to this bug and its duplicates, there are some other advices how to deal with journal, which may have more permanent effect

> I believe its a GDM service related.

it can hardly be GDM related on systems with KDM and without GDM that behave the same

while several problems had been uncovered along the way, the common cause here lies with journal - neither of the components affected like plymouth, gdm, kdm, sendmail, networkmanager and whatever else I've seen slowed down touches the logfiles directly, so there is no way how (re)moving the files could affect operation of these services

Comment 85 Rui Mota 2013-10-21 13:40:57 UTC
Well, I said it's a GDM service related but I'm wrong, or part of my point is wrong.
GDM or KDM does not start but it does not mean it's GDM service issue.
It's more like Plymouth service issue because it fails to quit, it seems.

Anyway, thanks for your tip, I just removed the folder and will wait of it happens again.
I don't agree with these kind of solutions because the folder is there for some purpose.
Anyway it is accepted as a workaround.

Thanks again.

Comment 86 Sergio Basto 2013-10-21 17:59:10 UTC
Oct 21 17:17:18  kdm[637]: plymouth is running
Oct 21 17:17:18  kdm[637]: plymouth is active on VT 1, reusing for :0
Oct 21 17:17:18  kdm[637]: plymouth should quit after server startup
Oct 21 17:17:18  kdm[637]: X server startup timeout, terminating
Oct 21 17:17:18  kdm[637]: X server for display :0 cannot be started, session disabled
Oct 21 17:17:18  kdm[637]: Quitting Plymouth without transition
Oct 21 17:17:18  kdm[637]: plymouth is NOT running

kdm doesn't start on boot 


Oct 21 18:39:41  kdm[627]: plymouth is running
Oct 21 18:39:41  kdm[627]: plymouth is active on VT 1, reusing for :0
Oct 21 18:39:41  kdm[627]: plymouth should quit after server startup
Oct 21 18:39:49  kdm[627]: Quitting Plymouth with transition
Oct 21 18:39:49  kdm[627]: Is Plymouth still running? no

kdm started on boot 

my workaround, add Before=graphical.target to /usr/lib/systemd/system/plymouth-quit.service , as we see doesn't work .

I think we have 2 or 3 different bugs ...

Comment 87 Hin-Tak Leung 2013-10-21 21:54:42 UTC
(In reply to Karel Volný from comment #77)
> as multiple users (including me(*)) can confirm that this problem gets
> "solved" by (re)moving /var/log/journal, I don't understand what are we
> still talking about here - this is clearly systemd problem, reassigning
> 
> (*) I've just verified the workaround on two different systems with
> different software, one of them don't having plymouth installed at all (see
> bug #983796 comment #42), but otherwise showing the same symptomes

Look, changing the subject from to "plymouth-quit-wait.service fails" "/var/log/journal breaks system startup" is craziness.

I thought I already explained back in comment 63: I believe that systemd automatically disable graphic boot/plymouth and write message to the console if systemd encounters an error. Not finding the log directory is one such error.

So moving "/var/log/journal" is simply *intentionally breaking systemd* to force it to disable graphic boot.

You could achieve the same result by either (1) press ESC (per reboot), (2) removing plymouth (permanent).

Either of those are better than intentionally breaking systemd to force it to dump text to console.

Comment 88 basjetimmer 2013-10-21 22:15:27 UTC
(In reply to Hin-Tak Leung from comment #87)
> 
> Look, changing the subject from to "plymouth-quit-wait.service fails"
> "/var/log/journal breaks system startup" is craziness.
> 
> I thought I already explained back in comment 63: I believe that systemd
> automatically disable graphic boot/plymouth and write message to the console
> if systemd encounters an error. Not finding the log directory is one such
> error.
> 
> So moving "/var/log/journal" is simply *intentionally breaking systemd* to
> force it to disable graphic boot.
> 
> You could achieve the same result by either (1) press ESC (per reboot), (2)
> removing plymouth (permanent).
> 
> Either of those are better than intentionally breaking systemd to force it
> to dump text to console.

I just wanted to say this info is not correct. Did you actually try it? To work around the slow bootup and failing plymouth quit service I delete /var/log/journal and the systemd does *not* disable graphical boot. The system does the entire plymouth splash screen and boots much more quickly to a working kdm. No errors appear in the boot.log either.

Comment 89 Hin-Tak Leung 2013-10-21 23:27:26 UTC
(In reply to basjetimmer from comment #88)
> I just wanted to say this info is not correct. Did you actually try it? To
> work around the slow bootup and failing plymouth quit service I delete
> /var/log/journal and the systemd does *not* disable graphical boot. The
> system does the entire plymouth splash screen and boots much more quickly to
> a working kdm. No errors appear in the boot.log either.

Do you routinely give/take advice from strangers along the line of: "move/remove one of your important system files, it will solve your problems"??!!

Do you know how ridiculous this bug report subject sounds: "<important_system _files_located_where_they_should_be> breaks system starts up". It is also ridiculous for trying to assign blame and blindly guess, instead of just describing the problem and the symptom. "<important_system _files_located_where_they_should_be> breaks system starts up" is not a symptom.

My current chosen work-around is to press ESC while booting, whenever I need to reboot. At least it is an action with its documented intended effect.

Comment 90 Sergio Basto 2013-10-21 23:52:05 UTC
Hi, 

Just to more confusion, I always have plymouth disable in kernel boot configuration, removing rhbg. 
and my bug is more "kdm sometimes fails to properly start during boot"
and I'm seeing this in F19 .

Well could be related with journal, but haven't he a better way to disable it ?

Comment 91 Karel Volný 2013-10-22 10:42:01 UTC
(In reply to Hin-Tak Leung from comment #87)
> Look, changing the subject from to "plymouth-quit-wait.service fails"
> "/var/log/journal breaks system startup" is craziness.

you are welcome to come up with better description

this one is a result of my investigations so far - if there is something rotten in /var/log/journal then systemd's journal goes crazy, leading to multiple service failures (plymouth, gdm, kdm) and problems like resource hogging (avahi, networkmanager, firewalld, sendmail ...), for which "breaks system startup" seems to me 100% appropriate

and as it seems to be related to the contents of /var/log/journal rather than to journal being enabled (it doesn't happen on all machines that have it enabled, and to stop it happening you don't have to disable journald altogether), I find it appropriate to mention that concrete directory rather than journald itself - subject line "journald breaks ..." wouldn't be any more helpful, IMO (/var/log/journal implies journald is involved and adds at least a bit of specification which part is involved)

> I thought I already explained back in comment 63: I believe that systemd
> automatically disable graphic boot/plymouth and write message to the console
> if systemd encounters an error. Not finding the log directory is one such
> error.

could be that things work that way, but I do not understand how does that explain the actual behaviour tons of users are seeing ...

> So moving "/var/log/journal" is simply *intentionally breaking systemd* to
> force it to disable graphic boot.

the core of this problem has nothing to do with graphic boot, it happens both when I'm staring at the fedora logo that flashes in the middle of the screen and both when I'm reading all the boot messages in text

> You could achieve the same result by either (1) press ESC (per reboot), (2)
> removing plymouth (permanent).

no, I cannot

if you've read the reffered bug #983796 comment #42, you'd know that removing plymouth does not help the core issue - it just removes one of its symptomes which was the original subject of this bug ... but the plymouth failure is a consequence, not a cause

> Either of those are better than intentionally breaking systemd to force it
> to dump text to console.

none of these solve or workaround the problem while "breaking" systemd helps, so I just fail to see how "either of those" could be better

Comment 92 Karel Volný 2013-10-22 11:04:25 UTC
(In reply to Sergio Monteiro Basto from comment #90)
> Well could be related with journal, but haven't he a better way to disable
> it ?

I believe

# systemctl mask systemd-journald.service

could do the trick, for example

but I prefer the abovementioned workaround, as it is not permanent (the deleted directory gets recreated on update), so you don't have to remember to cancel your temporary changes once the bug gets fixed

Comment 93 Rex Dieter 2013-10-22 12:44:01 UTC
'man journald.conf'

See Storage= key, in particular, default mode is 'auto' where a persistent /var/log/journal will be used if it exists, else a volatile (non-persistent between boots) /var/run/journal will used.

For *me*, I set
Storage=volatile
as a (hopefully temporary) workaround at my site.

Now please, we're almost up to 100 comments here, so only comment if you have something constructive to add (e.g. evidence that may help toward a solution), thanks.

Comment 94 Hin-Tak Leung 2013-10-22 17:17:07 UTC
(In reply to Karel Volný from comment #84)
> (In reply to Rui Mota from comment #82)
<snipped>
> > I believe its a GDM service related.
> 
> it can hardly be GDM related on systems with KDM and without GDM that behave
> the same
<snipped>

I was quite surprised that though I routinely get stuck during boot if I forget to press ESC, using the liveUSB (which runs a gnome session, but auto-login) went okay when I needed to do some file-system repair.

For me, the issue looks like a problem transitioning from graphic boot to authentication.
(I do not ever get stuck whenever I remember press ESC, nor any "wait" message appears; textconsole to GDM always works, no waits and no strange log messages).

Somebody might want to comment on how /var/log/journal is configured on liveUSB - as I am sure it is different from on-disk installation.

And I'd also like to hear whether people ever get stuck in an auto-login config.

Comment 95 David Wilkins 2013-10-22 23:03:48 UTC
Reporting on my experiences with the plymouth-quit-wait.service bug.

Note I am not a system administrator, and am unfamiliar with a lot of stuff.  I have been installing, updating and upgrading Fedora for about a decade, but I have never had problems with the boot sequence before now.

Recently I turned three destop machines into triple or quadruple boot machines.  All three have Fedora 19, Debian 7.1 or 7.2 and FreeBSD 9.1.  Two of the desktops have been happily booting all three operating systems.  However I have been experiencing problems with just one of the desktops: the one that I am using at the moment.  The three desktops have very similar packages or compiled ports installed.  It seems to me that the factor that distinguishes this computer is the graphics.  Moreover X is clearly problematic under FreeBSD on this computer alone of the three.

According to KInfoCenter, the Driver renderer is recorded as

    Mesa DRI Intel(R) Sandybridge Desktop

with OpenGL/ES version 3.0 Mesa 9.2.0.

This machine has always booted normally under both Debian and Windows 7.  (Windows 7 is the fourth operating system - I don't boot into it very often.)

It seems to me that my problems with the Fedora boot resulted either from an upgrade, or else were caused as the result of the machine being left in an abnormal state after booting FreeBSD (and not recovering gracefully).

After I compiled the FreeBSD XOrg port, I found that it has problems.  I subsequently compiled KDE and Gnome2 ports.  For the record, the OpenGL/ES driver under FreeBSD is 2.1 Mesa 7.6.1.  It seems to function OK, provided that I don't switch virtual terminals.

I decided to do some investigation into the Fedora boot problem today.  I thought that I could probably trigger the problem by booting first into FreeBSD, and that is what happened.  Indeed I booted FreeBSD to arrive at the KDE greeter, logged in, then Ctrl+Alt+F2 for a virtual terminal, then Ctrl+Alt+F9.  As before, this caused screen and keyboard to freeze, with the frozen KDE desktop displayed.  The computer seemed completely dead, and indeed was unreachable over a private 'network' that I have set up joining the affected desktop to another about 4 meters away via a Cat 5e cable.

I disconnected the electricity, plugged back in, and then booted into Fedora.  The graphical boot went into text mode after 'Failed to start Wait for Plymouth Boot Screen to Quit', and appeared to halt.  Guided by previous experience, I then used Ctrl+Alt+F2 to get a virtual terminal, and logged in as root.  I then did what I have done on previous occasions to get a KDE login screen:

   # telinit 3
   # telinit 5

On one previous occasion that has brought up the KDE login screen.  On at least two occasions that has frozen the display and keyboard whilst leaving the computer running, but has nevertheless cleared the Plymouth Boot problem, in that the following boot was OK.  This time the following boot was not OK, so I tried

   # telinit 1

getting the 'Welcome to rescue mode!' message, inviting me to enter the root password, or type Ctrl-D to continue.  So I typed Ctrl-D, and brought up the KDE login screen.

I have been chasing up blog posts and videos relating to systemd over the past 24 hours or so, and taken on board that runlevels are a legacy feature.  So in subsequent experiments, when the boot has failed to arrive at the KDE login screen, I have done the following to get to the login screen

   Use Ctrl+Alt+F2

to get a virtual terminal, then log in as root, and then

    # systemctl rescue

to bring me to 'Welcome to rescue mode!'.  I then type Ctrl+D, and this has brought up the KDE login screen on all attempts so far.

    ===

But on occasions today where the boot has failed to bring up the KDE login screen, and in particular when my actions have apparently frozen to console and the keyboard, I have tried logging in remotely over the 'private network' from the other nearby desktop.  I have regularly been able to get in with ssh -Y, and run X applications.  For example I have been able to view PDF files with evince.  When logging in remotely the desktop seems perfectly normal.

Also, when logged in remotely, I have run systemctl without arguments, and this gives the impression that the boot completed normally, with services and targets like

   polkit.service
   graphical.target
   multi-user.target
   network.target
   sysinit.target

recorded as 'loaded active active' or 'loaded active running' (polkit).  Indeed the only two services generally recorded as having failed are

   plymouth-quit-wait.service
   rngd.service

However, whilst logging in remotely shows that all these services are up and running, the virtual terminal display on the console suggests that the boot halted at some previous stage, such as 

   Starting Authorization Manager

or

   Started Virtualization Daemon

or

   Reached Graphical Target.

From this, I conclude that one would be misled by assuming that the fact that the console display records the boot process proceeding to a certain point and then apparently halting is not evidence that the boot process actually halted.  I suggest that the bug, whatever it is, is putting the first virtual terminal (i.e., the one obtained with Ctrl+Alt+F1) into a state where it does not accept or display input, and moreover does not transition to graphical mode.

Thus, although I am way out of my depth here, I would suggest the bug I am experiencing relates to the graphics.  Displays are freezing, or are failing to transition successfully to some graphical mode, keyboards are becoming dead,  Nevertheless, in other respects, the boot process seems to complete to a point where remote users can log in and run graphical applications using X.

Apologies if the above is overlong, or lacking in technical knowledge etc.

Comment 96 Germano Massullo 2013-10-23 16:41:42 UTC
Created attachment 815474 [details]
Output of "journalctl -o short-monotonic -ab" with "systemd.log_level=debug" on the kernel command line

Comment 97 Germano Massullo 2013-10-23 17:16:41 UTC
I can add a (perhaps) useful info:
When the system boot procedure hangs, if I open a tty console, I login as root and I
# systemctl restart kdm.service
the hanged X session, starts correctly and I can normally use the computer. I only noticed that monitor ICC color profile disappeared.

Comment 98 Sergio Basto 2013-10-23 17:30:33 UTC
(In reply to Germano Massullo from comment #96)
> Created attachment 815474 [details]
> Output of "journalctl -o short-monotonic -ab" with "systemd.log_level=debug"
> on the kernel command line

Received SIGRTMIN+20 from PID 129 (plymouthd).

will try that  with "systemd.log_level=debug" on the kernel command line, but can we get even more debug , seems not enough . 

(In reply to Rex Dieter from comment #93)
> 'man journald.conf'
> 
> See Storage= key, in particular, default mode is 'auto' where a persistent
> /var/log/journal will be used if it exists, else a volatile (non-persistent
> between boots) /var/run/journal will used.
> 
> For *me*, I set
> Storage=volatile
> as a (hopefully temporary) workaround at my site.
> 
> Now please, we're almost up to 100 comments here, so only comment if you
> have something constructive to add (e.g. evidence that may help toward a
> solution), thanks.

I try it , and seems that workaround , at least in my test.

Comment 99 David Wilkins 2013-10-23 21:03:14 UTC
Is it possible the Fedora 19 update

   kde-workspace-4.11.1-3.fc19

resulted in the problems with failure of graphical boot that we have been experiencing over the past month?

Actually I appear to have experienced the problem some hours before the update found its way into the repositories, and what I have written below is probably just a red herring.  (I am adding this paragraph after putting together most of the remarks below.  These remarks might however provide an 'innocent' explanation as to why the plymouth-quit-wait service is failing.)

It seems to me possible that the problem might arise from failure of plymouth to hand over VT1 to X.  I have also been generating extracts of log files with journalctl, and checking them for references to plymouth etc.


===
Oct 23 17:44:34 mta106032.mta.tcd.ie kdm[372]: plymouth is running
Oct 23 17:44:34 mta106032.mta.tcd.ie kdm[372]: plymouth is active on VT 1, reusing for :0
Oct 23 17:44:37 mta106032.mta.tcd.ie kdm[372]: plymouth should quit after server startup
Oct 23 17:44:41 mta106032.mta.tcd.ie kdm[372]: Quitting Plymouth without transition
Oct 23 17:44:42 mta106032.mta.tcd.ie kdm[372]: plymouth is NOT running
Oct 23 17:44:58 mta106032.mta.tcd.ie systemd[1]: plymouth-quit-wait.service operation timed out. Terminating.
Oct 23 17:44:58 mta106032.mta.tcd.ie systemd[1]: Unit plymouth-quit-wait.service entered failed state.
===

Does this imply that KDE is forcing Plymouth to quit independently of plymouth-quit-wait service?  And that plymouth-quit-wait service fails because Plymouth is no longer around, having been killed off by kdm?

Anyway, I did a web search on 'plymouth should quit after server startup', and that brought me to Bug 975079 at the following URL:

   https://bugzilla.redhat.com/show_bug.cgi?id=975079

The discussion apparently concerned a 'kde multiseat patch'.  Comment 11 describes a 'plymouth patch', apparently involving KDE querying Plymouth for its VT, and then using that for the X server.  There was discussion of backporting the patch to Fedora 18 and Fedora 19 in Comments 21 and 22.  The string I was searching on occurs in Comment 25.  Comment 28 on 2013-08-15 includes the following:

   I have added additional logic to the plymouth patch
   which forces starting the server which matches
   plymouth' VT first, otherwise VT handover from
   plymouth to X fails.

Comment 34 on 2013-09-25 from the Fedora Update Service reports that

   kde-workspace-4.11.1-3.fc19 has been submitted as an update for Fedora 19

Comment 36 on 2013-09-27 shows that the update had by then been pushed to the stable repository.  The 'critical path bugfix update' is recorded here:

https://admin.fedoraproject.org/updates/kde-workspace-4.11.1-3.fc19

The release date is recorded as 2013-09-26 23:12:58.

This is about the time that I first experienced a problem with Fedora not booting all the way to graphical mode.  I remember doing a clean reinstall of Fedora 19 when that happened.  But I would appear to have done so earlier on September 26th.  However my root partition seems to have been created around 17:19 on that day, which would seem to be before the update hit the repositories.

Maybe, having eliminated the kde-workspace as the apparent cause of the current failures of graphical boot, I shouldn't be posting this comment.  But maybe it might provide pointers, or at least show that the failure of plymouth-quit-wait.service is not in itself the cause of the failure to achieve a graphical boot.

Comment 100 Brent R Brian 2013-10-25 16:48:40 UTC
Latest round of updates seems to remove the "boot" failures with plymouth ... boot is much faster now.

Comment 101 David Wilkins 2013-10-25 18:21:24 UTC
Here the affected machine has booted normally into KDM since updates on the evening of October 23.  Most packages updated then related to Eclipse, MATE and Cinnamon, and thus would not affect booting into KDM.  There were four kernel-related packages:

   kernel-devel-3.11.6-200.fc19.x86_64
   kernel-headers-3.11.6-200.fc19.x86_64
   kernel-3.11.6-200.fc19.x86_64
   kernel-modules-extra-3.11.6-200.fc19.x86_64

Thus, if the update has solved the problem, then I suggest that one of the above packages must have resolved the boot problem.

I have used journalctl to save my journal in various formats.  The journal starts with a clean install on September 26.  There is a history of both successful and unsuccessful boots, and these can be distinguished through the kdm messages.

The kdm messages can be generated by the superuser with

   # journalctl _COMM=kdm

Here are the kdm logging messages from a normal boot:

===
Oct 23 22:51:14 mta106032.mta.tcd.ie kdm[366]: plymouth is running
Oct 23 22:51:15 mta106032.mta.tcd.ie kdm[366]: plymouth is active on VT 1, reusing for :0
Oct 23 22:51:15 mta106032.mta.tcd.ie kdm[366]: plymouth should quit after server startup
Oct 23 22:51:17 mta106032.mta.tcd.ie kdm[366]: Quitting Plymouth with transition
Oct 23 22:51:17 mta106032.mta.tcd.ie kdm[366]: Is Plymouth still running? no
Oct 23 22:52:15 mta106032.mta.tcd.ie kdm[934]: :0[934]: pam_unix(kdm:session): session opened for user dwilkins by (uid=0)
Oct 24 00:54:16 mta106032.mta.tcd.ie kdm[934]: :0[934]: pam_unix(kdm:session): session closed for user dwilkins
===

And here are the kdm logging messages from a failed boot:

===
Oct 23 17:44:34 mta106032.mta.tcd.ie kdm[372]: plymouth is running
Oct 23 17:44:34 mta106032.mta.tcd.ie kdm[372]: plymouth is active on VT 1, reusing
 for :0
Oct 23 17:44:37 mta106032.mta.tcd.ie kdm[372]: plymouth should quit after server startup
Oct 23 17:44:38 mta106032.mta.tcd.ie kdm[372]: X server startup timeout, terminating
Oct 23 17:44:40 mta106032.mta.tcd.ie kdm[372]: X server for display :0 cannot be started, session disabled
Oct 23 17:44:41 mta106032.mta.tcd.ie kdm[372]: Quitting Plymouth without transition
Oct 23 17:44:42 mta106032.mta.tcd.ie kdm[372]: plymouth is NOT running
Oct 23 17:46:15 mta106032.mta.tcd.ie kdm[1507]: plymouth is running
Oct 23 17:46:16 mta106032.mta.tcd.ie kdm[1507]: plymouth is active on VT 1, reusing for :0
Oct 23 17:46:16 mta106032.mta.tcd.ie kdm[1507]: plymouth should quit after server startup
Oct 23 17:46:16 mta106032.mta.tcd.ie kdm[1507]: Quitting Plymouth with transition
Oct 23 17:46:16 mta106032.mta.tcd.ie kdm[1507]: Is Plymouth still running? no
Oct 23 17:50:35 mta106032.mta.tcd.ie kdm[1566]: :0[1566]: pam_unix(kdm:session): session opened for user dwilkins by (uid=0)
===

The logging messages from kdm with PID=372 are from the failed boot, which would have left the machine with virtual terminals but no KDM greeter.  I would have logged in as root on VT2 as root and given the command

   systemctl rescue

to bring the machine down into rescue mode, and then typed Ctrl-D to bring the machine back up into graphical mode.  This clearly restarted kdm with PID=1507, which then proceeded to note the virtual terminal used by plymouth, terminate plymouth and start X.

===

I get the impression that plymouth-quit-wait service doesn't do much besides test whether plymouth has quit.  In the most recent boots, since the updates on October 23, the plymouth-quit-wait service has 'started' immediately once kdm has terminated plymouth.

On some earlier boots, plymouth-quit-wait has timed out, and thereby 'failed', yet kdm has subsequently succeeded in terminating plymouth and starting X some seconds later, leading to a successful boot.

Comment 102 David Wilkins 2013-10-25 23:14:55 UTC
In Comment 101 above, I quoted a section of the output of journalctl _COMM=kdm representing a boot.  Below is an extract from the short-monotonic journal for that boot.  (A bit of a palaver to obtain.  I consulted the verbose dump of the journal to discover the _BOOT_ID for the timestamp, and then generated the journal dump for that boot using journalctl _BOOT_ID=foo -o short-monotonic.)

So at 65.491474, kdm has pinged plymouth, presumably via

   plymouth --ping

Then by 66.936596, kdm determines that plymouth is using virtual terminal 1 (VT1), presumably via

   plymouth --has-active-vt

kdm is therefore going to use VT1 for the :0 X server.  Presumably it then sets about starting up the X server.  (I assume that kdm cannot start up the X server until it knows which virtual terminal to use.)  Plymouth is due to be terminated after the X server is up and running (66.941094).

At time 69.357665, about 2.4 seconds after kdm sought to start the X server on VT1, the process of initializing the X server times out.  Nevertheless, kdm terminates Plymouth without transitioning to X, presumably via

   plymouth --wait quit --retain-splash

(The plymouth commands above are to be found for kdm in the journal back in September, presumably before the kde-workspace update.  A description of plymouth options can be obtained with the command plymouth -h.)

I have experienced how this appears to the user.  Once the plymouth-quit-wait service has failed, the console transitions from splash screen to text mode, with messages about starting or started services being written to the console screen.  But, after a few seconds, the computer stops writing messages to the console, presumably at the moment when kdm kills off Plymouth.  It appears to the user as though the boot process has stopped.  The boot process continues however.

So what was happening in the 2.4 seconds or less between the time when kdm 
set about starting the X server on VT1 and the subsequent timeout?  Services PolicyKit, Avahi and Network Manager started up during that period.  There is a gap of over 0.5 of a second between 66.941094 and 67.491140, very shortly after the start of PolicyKit and Avahi when no messages were being written to the journal.  The Network Manager started at 67.492850, and, for over a second, no messages were written to the journal until rtkit-daemon got itself running at 68.672938.  PolicyKit started loading rules at 68.979083, and nothing was written to the journal until kdm called time on the X server startup at 69.357665.

This could surely explain why kdm was not able to get the X server on VT1 up and running in the 2.4 seconds or so allotted for that task.

My understanding of the principles of systemd, based on blog posts, videos etc. that I have been reading and watching over the past few days, is that daemons are not started up in a predefined sequence.  Rather they are initiated in parallel, and become active as the relevant sockets become serviced.  So presumably it is happenstance that, in some boots, a number of daemons might become ready to start in a period when a display manager is trying to get X up and running on the console.  Presumably moving /var/log/journal to /var/log/journal.org changes the timings, because the journal is written to the /run filesystem in memory rather than to the hard disk, so processes will start up in a different order, giving kdm (or gdm) the opportunity to get the X server up and running in the allotted time and complete the boot successfully.

The extract from the failed October 23 boot follows below:

======
[   65.491474] mta106032.mta.tcd.ie kdm[372]: plymouth is running
[   65.580546] mta106032.mta.tcd.ie rtkit-daemon[345]: Successfully called chroot.
[   65.728343] mta106032.mta.tcd.ie dbus[364]: [system] Activating via systemd: service name='org.freedesktop.PolicyKit1' unit='polkit.service'
[   66.470836] mta106032.mta.tcd.ie smartd[334]: Configuration file /etc/smartmontools/smartd.conf was parsed, found DEVICESCAN, scanning devices
[   66.591430] mta106032.mta.tcd.ie chronyd[344]: hz=100 shift_hz=7 freq_scale=1.00000000 nominal_tick=10000 slew_delta_tick=833 max_tick_bias=1000 shift_pll=2
[   66.592274] mta106032.mta.tcd.ie avahi-daemon[362]: Successfully dropped root privileges.
[   66.935574] mta106032.mta.tcd.ie udev-configure-printer[325]: MFG:HP MDL:Deskjet F4200 series SERN:CN8BK4G16K05BR serial:CN8BK4G16K05BR
[   66.936596] mta106032.mta.tcd.ie kdm[372]: plymouth is active on VT 1, reusing for :0
[   66.936844] mta106032.mta.tcd.ie rtkit-daemon[345]: Successfully dropped privileges.
[   66.937307] mta106032.mta.tcd.ie dbus[364]: [system] Failed to activate service 'org.freedesktop.PolicyKit1': timed out
[   66.937683] mta106032.mta.tcd.ie polkitd[438]: Started polkitd version 0.112
[   66.937981] mta106032.mta.tcd.ie smartd[334]: Device: /dev/sda, type changed from 'scsi' to 'sat'
[   66.939446] mta106032.mta.tcd.ie chronyd[344]: Frequency -2.249 +/- 1.104 ppm read from /var/lib/chrony/drift
[   66.940035] mta106032.mta.tcd.ie avahi-daemon[362]: avahi-daemon 0.6.31 starting up.
[   66.941094] mta106032.mta.tcd.ie kdm[372]: plymouth should quit after server startup
[   67.491140] mta106032.mta.tcd.ie crond[370]: (CRON) INFO (running with inotify support)
[   67.491585] mta106032.mta.tcd.ie rtkit-daemon[345]: Successfully limited resources.
[   67.492016] mta106032.mta.tcd.ie dbus[364]: [system] Activating via systemd: service name='org.freedesktop.ColorManager' unit='colord.service'
[   67.492850] mta106032.mta.tcd.ie NetworkManager[441]: <info> NetworkManager (version 0.9.8.2-9.git20130709.fc19) is starting...
[   68.672938] mta106032.mta.tcd.ie rtkit-daemon[345]: Running.
[   68.714166] mta106032.mta.tcd.ie rtkit-daemon[345]: Watchdog thread running.
[   68.714417] mta106032.mta.tcd.ie rtkit-daemon[345]: Canary thread running.
[   68.852801] mta106032.mta.tcd.ie NetworkManager[441]: <info> Read config file /etc/NetworkManager/NetworkManager.conf
[   68.853238] mta106032.mta.tcd.ie NetworkManager[441]: <info> WEXT support is enabled
[   68.853775] mta106032.mta.tcd.ie smartd[334]: Device: /dev/sda [SAT], opened
[   68.978459] mta106032.mta.tcd.ie avahi-daemon[362]: WARNING: No NSS support for mDNS detected, consider installing nss-mdns!
[   68.979083] mta106032.mta.tcd.ie polkitd[438]: Loading rules from directory /etc/polkit-1/rules.d
[   69.357665] mta106032.mta.tcd.ie kdm[372]: X server startup timeout, terminating

[More messages]

[   70.269373] mta106032.mta.tcd.ie kdm[372]: X server for display :0 cannot be started, session disabled

[More messages]

[   70.928439] mta106032.mta.tcd.ie kdm[372]: Quitting Plymouth without transition

[More messages]

[   71.885519] mta106032.mta.tcd.ie kdm[372]: plymouth is NOT running
======

Comment 103 David Wilkins 2013-10-26 21:15:45 UTC
Created attachment 816442 [details]
Output of journalctl -o short-monotonic -ab booted with systemd.log_level=debug

Attachment is the output of journalctl -o short-monotonic from a boot that failed to start an X server in VT1 with kdm.  The output has been sorted to put the records in true chronological order (so that the times after boot are genuinely monotonically increasing).  Also IP addresses have been redacted to xxx.xxx.xxx.xxx etc.

As noted in Comment 101, the machine had booted normally several times in the past few days.  However graphical boot failed again this morning.  Moreover I was able to add boot option systemd.log_level=debug to the grub entry before booting.

Once the machine had booted, I logged in to VT2 as root, generated the attached journal dump, and dumps in other output format (short, verbose and json).  I then started the display manager with

   systemctl restart kdm.service

(see Comment 97).

plymouth-quit-wait.service
==========================

The plymouth-quit-wait service does not seem to do much.  The unit file is

   /usr/lib/systemd/plymouth-quit-wait.service

The command /usr/bin/plymouth --wait is issued.  It seems that this process waits till plymouth quits, when it will terminate with (presumably) exit code 0.  There is a timeout of 20 seconds.  I think the expectation is that the display manager (kdm or gdm) will terminate plymouth before the 20 seconds have expired.  In the attached boot, this 'service' was activated at 17.496196, with kdm.service was set running just half a millisecond later, at 17.496648.  After twenty seconds, at 37.520540, the plymouth-quit-wait service timed out and was terminated and changed to a failed state.  However kdm did not show any activity for another 27 seconds.

kdm.service
===========

===
[    5.342876] xxxxxxxxx.xxx.tcd.ie systemd[1]: Looking at job kdm.service/start conflicted_by=no
[    5.342883] xxxxxxxxx.xxx.tcd.ie systemd[1]: Looking at job kdm.service/stop conflicted_by=no
[    5.342889] xxxxxxxxx.xxx.tcd.ie systemd[1]: Fixing conflicting jobs by deleting job kdm.service/stop
[    5.343615] xxxxxxxxx.xxx.tcd.ie systemd[1]: Installed new job kdm.service/start as 204
[   17.496648] xxxxxxxxx.xxx.tcd.ie systemd[1]: kdm.service changed dead -> running
[   17.496658] xxxxxxxxx.xxx.tcd.ie systemd[1]: Job kdm.service/start finished, result=done
[   64.466615] xxxxxxxxx.xxx.tcd.ie kdm[368]: plymouth is running
[   65.004628] xxxxxxxxx.xxx.tcd.ie kdm[368]: plymouth is active on VT 1, reusing for :0
[   65.004968] xxxxxxxxx.xxx.tcd.ie kdm[368]: plymouth should quit after server startup
[   65.482411] xxxxxxxxx.xxx.tcd.ie kdm[368]: X server startup timeout, terminating
[   65.991525] xxxxxxxxx.xxx.tcd.ie kdm[368]: X server for display :0 cannot be started, session disabled
[   66.323467] xxxxxxxxx.xxx.tcd.ie kdm[368]: Quitting Plymouth without transition
[   66.439419] xxxxxxxxx.xxx.tcd.ie kdm[368]: plymouth is NOT running
===

There is very little relating to kdm in the journal, and virtually nothing relating to X.  In particular there are no references to PID 368 other than the kdm entries reproduced above.

For the record, the unit for kdms.service is at

   /usr/lib/systemd/system/kdm.service

and reveals that the executable command is

   /usr/bin/kdm vt1

and that this service conflicts with getty and plymouth-quit.service.

I presume that the task of starting up the X server is the responsibility of kdm.  Presumably the configuration of systemd ultimately influences what other processes are active around the time that kdm is active.  It appears that, once kdm showed signs of activity, it had no more than half a second (or perhaps a second) before the X server startup was timed out.  What sets the timeout period?  kdm or systemd?  I note that avahi-daemon started up at 65.092857 and Network Manager started up at 65.115414

Comment 104 David Wilkins 2013-10-26 22:00:48 UTC
Here is something really weird.

I just rebooted, and kdm failed to bring up the login screen.

So I logged in as root on VT2 and gave the command

   systemctl status kde.service

===
kdm.service - The KDE login manager
   Loaded: loaded (/usr/lib/systemd/system/kdm.service; enabled)
   Active: active (running) since Sat 2013-10-26 22:24:14 IST; 5min ago
 Main PID: 365 (kdm)
   CGroup: name=systemd:/system/kdm.service
           └─365 /usr/bin/kdm vt1

Oct 26 22:24:47 mta106032.mta.tcd.ie kdm[365]: plymouth is running
Oct 26 22:24:47 mta106032.mta.tcd.ie kdm[365]: plymouth is active on VT 1, reusing for :0
Oct 26 22:24:48 mta106032.mta.tcd.ie kdm[365]: plymouth should quit after server startup
Oct 26 22:24:51 mta106032.mta.tcd.ie kdm[365]: X server startup timeout, terminating
Oct 26 22:24:52 mta106032.mta.tcd.ie kdm[365]: X server for display :0 cannot be started, session disabled
Oct 26 22:24:52 mta106032.mta.tcd.ie kdm[365]: Quitting Plymouth without transition
Oct 26 22:24:53 mta106032.mta.tcd.ie kdm[365]: plymouth is NOT running
===

This is as one would expect.  I also made copies of the journal for the boot.

Then it decided that I would do something different.  I thought I would see what would happen if I started up kdm directly with /usr/bin/kdm vt1.  But I did not get that far.  I thought that, before starting up a new kdm, I had better kill off the old one.  So I gave the command

   kill -9 365

Instantly the kdm login screen flashed up, and I was able to log in as normal.  I have subsequently found that I have a kdm process running, but it is not 365, Here is the output of ps:

===
[dwilkins@mta106032 ~]$ ps aux | grep kdm
root      1380  0.0  0.0  33064  1356 ?        Ss   22:32   0:00 /usr/bin/kdm vt1
root      1383  0.5  0.6 173108 26316 tty1     Ss+  22:32   0:06 /usr/bin/X :0 vt1 -background none -nolisten tcp -seat seat0 -auth /var/run/kdm/A:0-9g3Htc
dwilkins  2367  0.0  0.0 112652   924 pts/2    S+   22:50   0:00 grep --color=auto kdm
===

Actually, maybe this is not so surprising.  I have just looked at the full output of ps aux, and I note that I logged in as root at 22:26.  The kdm and X processes started at 22:32.  So it seems that killing the existing kdm process caused the system to generate another display manager.  (I did not realise that when I started this comment.)

Comment 105 David Wilkins 2013-10-26 23:13:53 UTC
Created attachment 816466 [details]
Output of journalctl -o short-monotonic -ab (systemd.log_level=debug) with kdm kill

Following my previous comment (Comment 104) about the behaviour of the console on killing the non-functioning kdm process, I set about to analyse and record what happens, so I rebooted, adding systemd.log_level=debug to the boot entry in the grub menu.  The boot failed to bring up the KDM graphical login screen.  To correct previous observations on these boots, I should say that, if one does not press ESC during the graphical boot, it switches to the text-mode messages when plymouth is terminated by kdm, so you don't see new messages being added to the screen.  (You only see new messages if you press ESC before Plymouth is terminated.)

I logged into VT2 as root, identified the PID of kdm, which was 376, and gave the command

   kill -9 376

What happened then was not as instantaneous as I suggested in the previous comment.  First of all the screen switched to VT1, with its dead boot messages.  Then it started up kdm (with the fedora splash putting in a brief appearance, if I remember correctly, then transitioned to the kdm login screen.

Below, I have gathered the KDM-generated messages from the initial boot, and the subsequent messages which I killed process 376 and systemd handled the fallout, generating a new kdm process.

Mulling over this, it seems that kdm knows when it is terminating Plymouth without a transition to X, because it logs a message to that effect.  If kdm were to communicate this to systemd (perhaps by creating or killing a process in the kdm.service cgroup), then presumably systemd would be able to respond by spawning a new kdm process (as it does when the existing process is killed manually), and the bug might thereby be fixed, insofar as it affects kdm boots.

===
-- KDM-related messages in the initial boot

[   68.216681] xxxxxxxxx.xxx.tcd.ie kdm[376]: plymouth is running
[   68.976814] xxxxxxxxx.xxx.tcd.ie kdm[376]: plymouth is active on VT 1, reusing for :0
[   69.112282] xxxxxxxxx.xxx.tcd.ie kdm[376]: plymouth should quit after server startup
[   69.577671] xxxxxxxxx.xxx.tcd.ie kdm[376]: X server startup timeout, terminating
[   71.465347] xxxxxxxxx.xxx.tcd.ie kdm[376]: X server for display :0 cannot be started, session disabled
[   71.666219] xxxxxxxxx.xxx.tcd.ie kdm[376]: Quitting Plymouth without transition
[   71.668924] xxxxxxxxx.xxx.tcd.ie kdm[376]: plymouth is NOT running

-- KDM-related messages on killing off kdm with kill -9 376

[  219.022473] xxxxxxxxx.xxx.tcd.ie systemd[1]: Received SIGCHLD from PID 376 (kdm).
[  219.022890] xxxxxxxxx.xxx.tcd.ie systemd[1]: Got SIGCHLD for process 376 (kdm)
[  219.023208] xxxxxxxxx.xxx.tcd.ie systemd[1]: Child 376 died (code=killed, status=9/KILL)
[  219.023509] xxxxxxxxx.xxx.tcd.ie systemd[1]: Child 376 belongs to kdm.service
[  219.023818] xxxxxxxxx.xxx.tcd.ie systemd[1]: kdm.service: main process exited, code=killed, status=9/KILL
[  219.024145] xxxxxxxxx.xxx.tcd.ie systemd[1]: kdm.service changed running -> failed
[  219.024713] xxxxxxxxx.xxx.tcd.ie systemd[1]: Unit kdm.service entered failed state.
[  219.025188] xxxxxxxxx.xxx.tcd.ie systemd[1]: kdm.service changed failed -> auto-restart
[  219.026426] xxxxxxxxx.xxx.tcd.ie systemd[1]: kdm.service: cgroup is empty
[  219.123077] xxxxxxxxx.xxx.tcd.ie systemd[1]: kdm.service holdoff time over, scheduling restart.
[  219.123559] xxxxxxxxx.xxx.tcd.ie systemd[1]: Trying to enqueue job kdm.service/restart/fail
[  219.152674] xxxxxxxxx.xxx.tcd.ie systemd[1]: Installed new job kdm.service/restart as 1205
[  219.152987] xxxxxxxxx.xxx.tcd.ie systemd[1]: Enqueued job kdm.service/restart as 1205
[  219.153298] xxxxxxxxx.xxx.tcd.ie systemd[1]: kdm.service scheduled restart job.
[  219.153920] xxxxxxxxx.xxx.tcd.ie systemd[1]: kdm.service changed auto-restart -> dead
[  219.154206] xxxxxxxxx.xxx.tcd.ie systemd[1]: Job kdm.service/restart finished, result=done
[  219.154606] xxxxxxxxx.xxx.tcd.ie systemd[1]: Converting job kdm.service/restart -> kdm.service/start
[  219.155198] xxxxxxxxx.xxx.tcd.ie systemd[1]: About to execute: /usr/bin/kdm vt1
[  219.155482] xxxxxxxxx.xxx.tcd.ie systemd[1]: Forked /usr/bin/kdm as 1389
[  219.155752] xxxxxxxxx.xxx.tcd.ie systemd[1]: kdm.service changed dead -> running
[  219.156864] xxxxxxxxx.xxx.tcd.ie kdm[1389]: plymouth is NOT running
[  219.403347] xxxxxxxxx.xxx.tcd.ie systemd[1]: Job kdm.service/start finished, result=done
===

Comment 106 David Wilkins 2013-10-28 13:41:41 UTC
Yesterday I looked over journal entries for a month of boots.  The machine uses KDM as its display manager.  There is a mixture of boots that succeeded in starting an X server on VT1, and others that did not.

The impression I got is that the best predictor as to whether X will be started is the time elapsed after boot at which the transition from Plymouth to X on VT1 is initiated.  This usually happens between 25 and 70 seconds into the boot.  If the transition is initiated around 25 to 35 seconds into the boot it will generally succeed.  It will often succeed around 45 seconds into the boot.  But if the transition is attempted at around 65 seconds into the boot, which is the case for many if not most of the boots, then the transition from Plymouth to X will generally fail, and the machine is left with VT1 in a dead state where Plymouth has been terminated but the X server has not been started.

Also I noted that kdm has a PID that has a value that is typically 370 plus or minus 10, and there is no apparent distinction betweeen successful and failed boots in this regard.

   ===

I also did web searches on the kdm log messages, and tracked down the corresponding source code.  In particular, I downloaded and unpacked

   kde-workspace-4.11.1.tar.xz

and found

   kde-workspace-4.11.1-kdm_plymouth081.patch

built on Github.  I don't know how to apply patches mechanically so I applied the patch to the files

   kde-workspace-4.11.1/kdm/backend/dm.c
   kde-workspace-4.11.1/kdm/backend/server.c

by manually editing those files with vim.

The code that has been patched appears to be mature code for xdm written by Keith Packard.

It seems that KDM is a daemon that, which called, processes its arguments and calls mainLoop(), which should run until KDM is shut down at the time that the system is turned off or rebooted.

The mainLoop() of KDM calls startDisplays() to initiate X on the relevant virtual terminal.  Here is the source code for startDisplays() with the Plymouth patch applied:

===
startDisplays(void)
{
    forEachDisplay(checkDisplayStatus);
    closeGetter();
#ifdef HAVE_VTS
    activeVTs = -1;
    forEachDisplayRev(allocateVT);
    replacePlymouth();
#endif
    forEachDisplay(kickDisplay);
}
===

When allocateVT(d) is called on the display d that represents VT1 then a number of logging messages are written to the systemd journal that constitute the first actions of kdm that are logged on that journal.  So here is the section of code that is called when booting with Plymouth:

===
/* check for plymouth using newer methods */
        d->plymouth_vt = -1;
        if (plymouth_is_running ()) {
            if (plymouth_has_active_vt ()) {
                int vt = get_active_vt ();
                if (allowedVTs[vt]) {
                    logWarn ("plymouth is active on VT %d, reusing for %s\n", vt, d->name);
                    d->serverVT = vt;
                    d->plymouth_vt = vt;
                    return;
                }
            }
===

The function plymouth_is_running() gives the system command

   status = system ("/usr/bin/plymouth --ping");

and generates warnings

   plymouth is running
   plymouth is NOT running

dependent on exit status.  The relevant message is written to the journal, and is the always first message written by kdm to the journal.  The function plymouth_has_active_vt() gives the system command

   status = system ("/usr/bin/plymouth --has-active-vt");

but generates no log messages.  The function get_active_vt() opens /dev/tty0, interrogates it with ioctl to place the console state in console_state, and returns console_state.v_active.  Thus allocateVT(d) checks with Plymouth whether or not it is active on display d, and, if so, finds out from the kernel which virtual terminal is active.  This succeeds, irrespective of whether the server is ultimately started.  Indeed, whether successful or failed, all boots write the log message

   plymouth is active on VT 1, reusing for :0

to the systemd journal when allocateVT(d) is called on VT1.  The number 1 of the virtual terminal running plymouth is stored in both d->serverVT and d->plymouth_vt, where d is the structure that encodes the display.

Once allocateVT(d) has been called on each display, the function replacePlymouth() is called.  This function iterates over displays until it finds the display with Plymouth running.  If that display is not running X, then the function tells Plymouth to quit and kicks the display to initiate the X server for it.

Here is the relevant code snippet:

===
    for (cd = displays; cd; cd = cd->next) {
        if ((cd->serverVT > 0) && (cd->serverVT == cd->plymouth_vt)) {
            if (cd->status == notRunning) {
                /* tell plymouth to quit when server has started */
                logWarn ("plymouth should quit after server startup\n");
                plymouth_prepare_for_transition ();
                kickDisplay(cd);
                return;
            } else if (cd->status == running) {
                /* replacing server is starting up, do nothing */
                return;
            }
        }
    }
===

The warning

   plymouth should quit after server startup

is written to the systemd journal on both normal and failed boots, proving that this section of code is reached with d->status == notRunning on all boots.  Plymouth is prepared for transition with the system command

   status = system ("/usr/bin/plymouth deactivate");

but the code does not write any warnings to the journal.  The function plymouth_prepare_for_transition() returns the exit status of the plymouth deactivate command, but, as we see above, this is not checked.  

The function kickDisplay() is unpatched, and probably derives from xdm.  It starts the display if the display status is notRunning, and it starts the X server on that display.

===
kickDisplay(struct display *d)
{
    if (d->status == notRunning)
        startDisplay(d);
    if (d->serverStatus == awaiting && !startingServer)
        startServer(d);
}
===

Now startDisplay(d) does stuff related to authorizations and then calls
startDisplayP2(d).  It is not clear to me from the code what startDisplayP2 does, except that it forks, and the resultant processes may represent a sub-daemon and a greeter for the display.  The code for startDisplay(d) is in dm.c, whereas the code for startServer(d) is in server.c.  The other observation I would make is that the heavy lifting for opening the display seems to be carried out by calling XOpenDisplay(d->name), and that this is done only in server.c, not in dm.c.

The startServer(d) function in the (patched) server.c calls startServerOnce(d):

===
void
startServer(struct display *d)
{
    startingServer = d;
    d->startTries = 0;
    startServerOnce();
}
===

The integer field d->startTries records how many attempts have been made at starting the server, so is initialized to zero before the first attempt.

The execution startServerOnce() involves a fork and exec, and apparently determines whether or not the attempt to start the X server succeeds.  Here is the code:

===
static void
startServerOnce(void)
{
    struct display *d = startingServer;
    char **argv;

    debug("startServerOnce for %s, try %d\n", d->name, ++d->startTries);

    d->serverStatus = starting;
    switch (Fork(&d->serverPid)) {
    case 0:
        argv = prepareServerArgv(d, d->serverArgsLocal);
        if (d->authFile) {
            if (!(argv = addStrArr(argv, "-auth", 5)) ||
                !(argv = addStrArr(argv, d->authFile, -1)))
                exit(47);
        }
        debug("exec %\"[s\n", argv);
        /*
         * give the server SIGUSR1 ignored,
         * it will notice that and send SIGUSR1
         * when ready
         */
        (void)Signal(SIGUSR1, SIG_IGN);
        (void)execv(argv[0], argv);
        logError("X server %\"s cannot be executed\n", argv[0]);
        exit(47);
    case -1:
        logError("X server fork failed\n");
        startServerFailed();
        break;
    default:
        debug("X server forked, pid %d\n", d->serverPid);
        serverTimeout = d->serverTimeout + now;
        break;
    }
}

===

If this is successful, this fork presumably produces the X process running on display :0 that shows up in the kde.service CGroup?

===
[dwilkins@mta106032 kde_source]$ systemctl status kdm.service
kdm.service - The KDE login manager
   Loaded: loaded (/usr/lib/systemd/system/kdm.service; enabled)
   Active: active (running) since Sun 2013-10-27 16:16:04 GMT; 18h ago
 Main PID: 1376 (kdm)
   CGroup: name=systemd:/system/kdm.service
           ├─1376 /usr/bin/kdm vt1
           └─1379 /usr/bin/X :0 vt1 -background none -nolisten tcp -seat seat0 ...
===

The parent process of the fork just resets the serverTimeout by adding d->serverTimeout to the current time.  My failed boots do not apparently fail at the fork, because if they did, a message "X server fork failed" would be written to the systemd journal and I have never seen this.  The child process does the exec.  First of all the arguments are prepared:

   argv = prepareServerArgv(d, d->serverArgsLocal);

The zero position argument is the command d->serverCmd, which presumably has value "/usr/bin/X".  Then the name d->name of the display is appended, which is presumably ":0".  Then (assuming that kdm has been compiled with HAVE_VTS) the argument vt1 is appended, where the numeral 1 is the value of d->serverVT.  And then it would seem that d->serverArgsLocal is appended to the args array, which presumably includes the values of options such as -background, -nolisten, -seat etc.  Also the child process in startServerOnce() adds the -auth option with the value found in d->authFile.  So it seems clear that the argument string for my process 1379 in the kdm CGroup was generated by prepareServerArgv and startServerOnce and executed by the child process using execv.  Were this process to continue beyond this point, I would be seeing messages

   X server ... cannot be executed

Since I don't see such messages on failed boots, it seems that the execv succeeds.

   ---

It would therefore appear that, on my machine at least, and on other people's machines as reported here, the kdm process has executed normally up to this point, has successfully detected that Plymouth is running on VT1 and has accordingly decided to start the X server on this virtual terminal, and has forked a child process with /usr/bin/X to start the X server.  The imponderable seems to be the exit status of the

   /usr/bin/plymouth deactivate

since this is not checked, and there is no code to write out a status report with either logWarn or debug.  So control passes to /usr/bin/X

Now X does not appear to log its success or failure in the system journal directly.  I haven't seen any messages emanating from X.  However there are callbacks in server.c patched for Plymouth which generate warnings that appear in the systemd journal.

So suppose that the X server is successfully started.  The callback

   startServerSuccess()

in the patched version of server.c is called.  This generates the warning

   Quitting Plymouth with transition

which characterizes normal boots.  The function plymouth_quit_with_transition() in dm.c is called.  This function gives the system command

   /usr/bin/plymouth --wait quit --retain-splash

which presumably exits with exit status 1 (indicating that plymouth is not running), causing startServerSuccess() to write the warning

   Is Plymouth still running? no

These are warnings that are written to the systemd journal during normal boots.

   ---

So much for normal boots where the X server is successfully started.  Now for unsuccessful boots.  The callback

   startServerFailed()

is called.  Here is the code in the version patched for Plymouth:

===
void
startServerFailed()
{
    struct display *d = startingServer;
    if (!d->serverAttempts || d->startTries < d->serverAttempts) {
        d->serverStatus = pausing;
        serverTimeout = d->openDelay + now;
    } else {
        d->serverStatus = ignore;
        serverTimeout = TO_INF;
        startingServer = 0;
        logError("X server for display %s cannot be started,"
                 " session disabled\n", d->name);
        if ((d->serverVT > 0) && (d->serverVT == d->plymouth_vt)) {
            logWarn ("Quitting Plymouth without transition\n" );
            plymouth_quit_without_transition ();
        }
        stopDisplay(d);
    }
}
===

There is code for change the server status to pausing and to reset the serverTimeout in cases where the number of start tries is less than
d->serverAttempts.  However I cannot determine whether this is called or not.  Indeed this function seems to be the only place within the kde-workspace codebase that references a field serverAttempts:

===
[dwilkins@mta106032 kde_source]$ find kde-workspace-4.11.1 -type f -exec fgrep -H 'serverAttempts' {} \;
kde-workspace-4.11.1/kdm/backend/server.c:    if (!d->serverAttempts || d->startTries < d->serverAttempts) {
===

(However there is a similarly-named field d->startAttempts in the exitDisplay function of dm.c.)

Maybe this section of code is executed and leads to retries, or maybe not.  But at some point the value of d-startTries exceeds the permitted number of server attempts and the other case is executed.  The messages

   X server for display :0 cannot be started, session disabled
   Quitting Plymouth without transition

are generated.  The function plymouth_quit_without_transition() kills off plymouth with the system command

   status = system ("/usr/bin/plymouth --wait quit");

and the console is left in a state where there is no X server running on any virtual terminal, and where VT1 is left frozen with the boot messages that had accumulated up to the time that Plymouth was terminated.

   ---

So, to summarize.  In the case where KDM is the display manager, the kdm process correctly determines that Plymouth is running on VT1 and spawns a child process running /usr/bin/X with appropriate arguments.  What happens then is determined by the X server.  If the X server calls startServerSuccess() then we see the systemd journal entries that characterize boots that bring up the greeter.  If the X server calls startServerFailed() then we see the systemd journal entries that characterize boots that fail to get the X server running on VT1.

So the bug kicks in when the system attempts to transition from Plymouth to the X server.  And, in some mysterious fashion, success or failure seems to depend on the timings of the processes running at boot time.  Indeed if the kdm.service is restarted after boot then the X server is started and the greeter is brought up.  Also if the kdm process is killed with SIGKILL (I haven't tried more polite methods of process termination) then systemd has no problem in detecting that kdm has terminated and accordingly restarting it successfully.

    ---

Finally I note that it should be apparent to systemd whether or not kdm has got the X server working on the display.  Because the successful initiation of an X server adds a child process to the kdm CGroup.  So would it not be possible for some service or target like 

   /usr/lib/systemd/system/graphical.target

to check whether or not an X child process has been added to the kdm CGroup and, if not, restart kdm?

Comment 107 David Wilkins 2013-10-28 23:06:34 UTC
Follow-up to Comment 106.  I have making a further study the source code of kdm, and I think I understand better how it works (and how UNIX works, for that matter).

It seems that, in boots where VT1 fails to make the transition to X and bring up the kdm greeter, the error messages in the systemd journal demonstrate that a child process was created for the X server on VT1, but the child process exited with exit status 47.  Moreover, looking at the source code of server.c, this may well have happened after the fork but before the execv.  Because errors in the child process or the function that it calls before the execv cause that child to exit with a status code of 47.

The reasoning goes as follows.  The error messages that we see in the systemd journal from kdm on boots that fail to transition to X are generated when the function

   startServerFailed()

in (the patched)

   kde-workspace-4.11.2/kdm/backend/server.c

Specifically the fact that the logs include

   X server for display :0 cannot be started, session disabled
   Quitting Plymouth without transition

proves that startServerFailed() was called, because these are the error messages which that function generates (see Comment 106 above).  Now the function startServerFailed() was not called by startServerOnce(), because if it had been then I would have found the error message

   X server fork failed

in my systemd journal.  But this error message is not present.

Similarly I presume startServerFailed() was not called by startServerTimeout(), because if it were, I would have come across the message

   X server is stuck in D state; leaving it alone

in the journal (assuming that logInfo(messages) make it to the journal).  But this error message is not present.  A search with the find command shows that server.c, dm.h and dm.c are the only source files in the kde-workspace package that refer to startServerFailed().  Moreover

   kde-workspace-4.11.2/kdm/backend/dm.c

calls that function in just one place, in the function reapChildren().  This surely proves that function reapChildren() of dm.c was called on those boots that failed to transition to X.

Now reapChildren() is called in only one place in the mainLoop() of dm.c.  Specifically it is called when the parent process kdm receives a SIGCHLD signal.  This proves that the kdm process must have received a SIGCHLD signal as a result of the termination of the child process it spawned to run the X server.

Now the source code for reapChildren() seems to yield more information.  Here is the relevant code snippet:

===
            case running:
                if (startingServer == d && d->serverStatus != ignore) {
                    if (d->serverStatus == starting && waitCode(status) != 47)
                        logError("X server died during startup\n");
                    startServerFailed();
                    break;
                }
                logError("X server for display %s terminated unexpectedly\n",
                         d->name);
                if (d->pid != -1) {
                    debug("terminating session pid %d\n", d->pid);
                    terminateProcess(d->pid, SIGTERM);
                }
                break;
===

Now this is the code that is executed when kdm receives a SIGCHILD when d-status=running.  The function waitCode(status) returns the exit code of the process that was spawned by the

   startServerOnce()

function of server.c to run the X server.  If I am reading the code snippet above correctly, it would seem that an error message

   X server died during startup

would be written to the systemd journal if the child process died with an exit status distinct from 47.  But this error message does not appear in my journal for the past month.  Therefore, it seems to me that the child process could only have terminated with an exit code of 47.  Now exit(47) is called once within the main body of startServerOnce(), in a context connected with authorization, but it is called several times by the function 

   prepareServerArgv(struct display *d, const char *args)

of server.c that is used to prepare the arguments of the execv for running the X server.

   ---

But I have just discovered another source of exit(47) calls.  Since my previous comment, I have downloaded the source rpm

   kde-workspace-4.11.2-1.fc19.src.rpm

and exploded the files contained in it using rpm2cpio.  Grepping on the files, I find that there is a patch that has a lot of exit(47) calls in patches to the prepareServerArgv function.  This patch is the following:

   
   kde-workspace-4.11.1-kdm-logind-multiseat.patch

This patch introduces 8 new exit(47) calls.

One could speculate whether the X server itself exits with status code 47.  It would surely seem more plausible to assume that 47 was chosen for the exit status resulting from errors occurring after the child process was forked but before the execv, so that these could be distinguished from errors resulting from crashes of the X server itself.

   ===

I should add some further observations and corrections to previous comments.  First I note that, in order to induce systemd to restart kdm, it suffices to use kill to send SIGTERM.  It is not necessary, and perhaps not appropriate, to use kill -9, thereby send SIGKILL, in order to terminate the existing kdm process.

Next I should describe more accurately what I see after killing kdm.  The display changes to the dead VT1 with the boot messages.  Then a small whirligig appears in the centre of the display for a second or so, and then the kdm login screen is brought up.

I am guessing that the whirligig is generated by X or by kdm code, since Plymouth is presumably out of the picture at that stage.

And moreover I guess that, when I see the whirligig during boots, which I do quite often, it is an indication that kdm is trying to start up the X server.

Comment 108 Zbigniew Jędrzejewski-Szmek 2013-10-29 04:07:59 UTC
(In reply to David Wilkins from comment #102)
> Below is an extract from the short-monotonic
> journal for that boot.  (A bit of a palaver to obtain.  I consulted the
> verbose dump of the journal to discover the _BOOT_ID for the timestamp, and
> then generated the journal dump for that boot using journalctl _BOOT_ID=foo
> -o short-monotonic.)
Not a palaver any more, at least in the git tree:
http://cgit.freedesktop.org/systemd/systemd/commit/?id=f118807

Comment 109 David Wilkins 2013-10-29 10:37:05 UTC
Created attachment 817040 [details]
Output of journalctl -o short _COMM=yum _COMM=kdm | egrep 'kde-workspace|Quitting Plymouth'

What it says on the tin...

Output of

   journalctl -o short _COMM=yum _COMM=kdm | egrep 'kde-workspace|Quitting Plymouth'

A short way of distinguishing normal boots from those that fail to start the X server is to output the journal with _COMM=kdm and grep on 'Quitting Plymouth'.

I wanted to see how the history of normal and failed boots reflected updates of the kde-workspace package.

Here is the time period:

   -- Logs begin at Thu 2013-09-26 18:06:53 IST, end at Tue 2013-10-29 09:03:10 GMT.

As I remember, I did a clean install on Thursday September 26 after the machine first failed to boot into graphical mode with a KDM greeter.  I did this because, at that time, I couldn't think of anything else to do.  I carried out a yum update the following morning.  This updated to

   kde-workspace-libs-4.11.1-2.fc19.x86_64

The following five boots were normal.  Then I updated to

   kde-workspace-libs-4.11.1-3.fc19.x86_64

on September 30.  There were two normal boots, and then there was a load of boots where the boots failed to make the transition to plymouth.

I am thinking of concocting other ways of selecting and presenting relevant material from my systemd journal.  I don't want to upload the entire journal, assuming it were small enough to be uploaded, especially as the Avahi-related log messages contain a lot of IP addresses of hundreds of hosts on the local network that live behind the institutional firewall.

Comment 110 David Wilkins 2013-10-29 11:12:02 UTC
Created attachment 817053 [details]
Output of journalctl -o short-monotonic _COMM=systemd _COMM=kdm | egrep '[Pp]lymouth|System Initialization|kdm'

Output of

   journalctl -o short-monotonic _COMM=systemd _COMM=kdm | egrep '[Pp]lymouth|System Initialization|kdm'

I wanted to dump information on the time elapsed after boot before kdm becomes active.  So I filtered the journal entries first to show only those originating with systemd and kdm.  Looking over this, it seemed that kdm regularly started a few seconds after reaching target System Initialization.  So I have filtered the output of journalctl to capture those related to Plymouth or kdm, together with those related to System Initialization.  A glance at this indicates that kdm seems to be come active within a few seconds of reaching target System Initialization after the change of root.

Comment 111 David Wilkins 2013-10-30 00:26:11 UTC
Does it cause a runtime error if a variable is supplied as an argument to a function that has been compiled with the const keyword, and if the value stored in that variable changes between the time that the program is initialized and the function is called?

I am looking at the

   char **
   prepareServerArgv(struct display *d, const char *args)

function of

   kde-workspace-4.11.1/kdm/backend/server.c

patched with the

   kde-workspace-4.11.1-kdm-logind-multiseat.patch

applied.  In Comment 107 above, I argued that either this function is calling exit(47) or else startServerOnce() is calling exit(47) in the child process, after the fork but before the execv that is intended to start the X server on VT1.  I also noted that there were a number of exit(47) calls in the logind-multiseat patch.

Today, I manually applied the logind-multiseat patch to the main source files for kdm (the server.c, dm.h and dm.c files).  Here (assuming that I have applied the patch correctly, is the patched code for the prepareServerArgv function:

============================================================================
char **
prepareServerArgv(struct display *d, const char *args)
{
    char **argv;
#ifdef HAVE_VTS
    char vtstr[8];
#endif

#if WITH_SYSTEMD
    FILE *tmpFile = NULL;
    if ((tmpFile = fopen(SYSTEMD_X_WRAPPER, "rb")) != NULL && fclose(tmpFile) == 0) {
        if (!(argv = parseArgs(0, SYSTEMD_X_WRAPPER)) || !(argv = addStrArr(argv, d->name, -1))) {
            exit(47);
        }
    }
    else
#endif
    if (!(argv = parseArgs(0, d->serverCmd)) || !(argv = addStrArr(argv, d->name, -1))) {
        exit(47);
    }
#ifdef HAVE_VTS
    if (d->serverVT &&
        !(argv = addStrArr(argv, vtstr,
                           sprintf(vtstr, "vt%d", d->serverVT))))
        exit(47);
#endif
    if (!(argv = parseArgs(argv, args)))
        exit(47);

    if (!changeUser(d->serverUID, d->authFile))
        exit(47);

#ifdef WITH_SYSTEMD
    if (d->systemd_seat) {
        if (!(argv = parseArgs(argv, "-seat")))
            exit(47);
        if (!(argv = parseArgs(argv, d->systemd_seat)))
            exit(47);
        if (!(argv = parseArgs(argv, "-layout")))
            exit(47);
        if (!(argv = parseArgs(argv, d->systemd_seat)))
            exit(47);
    }
    else {
        if (!(argv = parseArgs(argv, "-seat")))
            exit(47);
        if (!(argv = parseArgs(argv, "seat0")))
            exit(47);
    }
#endif

    return argv;
}
============================================================================

Now this function depends in turn on low-level string array manipulation functions parseArgs and addStrArr that are included in

   kde-workspace-4.11.1/kdm/backend/util.c

Here is the signature of parseArgs:

   char **
   parseArgs(char **argv, const char *string)

and here is the signature of addStrArr:

   char **
   addStrArr(char **arr, const char *str, int len)

Now the second argument of both is *declared* as a const char, but I note that, in the original code, parseArgs is used to append string constants to the argument list, whereas addStrArr is used to add string variables to the argument list.  Thus, presuming at, as in the unpatched code, d->serverCmd, is "/usr/bin/X", this is not going to change whilst xdm is running.  But where the server name d->name is to be appended to the argument array, this is done using

   addStrArr(argv, d->name, -1)

Similarly, when, in the #ifdef HAVE_VTS context, the virtual terminal is to be added to the argument string, the following is used:

   addStrArr(argv, vtstr, sprintf(vtstr, "vt%d", d->serverVT)))

Here the sprintf function returns the length of the string that is placed in memory starting at position pointed to by vtstr.

Now here is a snippet of code that came in with the multiseat patch:

    #ifdef WITH_SYSTEMD
        if (d->systemd_seat) {
            if (!(argv = parseArgs(argv, "-seat")))
                exit(47);
            if (!(argv = parseArgs(argv, d->systemd_seat)))
                exit(47);
            if (!(argv = parseArgs(argv, "-layout")))
                exit(47);
            if (!(argv = parseArgs(argv, d->systemd_seat)))
                exit(47);
        }
        else {
            if (!(argv = parseArgs(argv, "-seat")))
                exit(47);
            if (!(argv = parseArgs(argv, "seat0")))
                exit(47);
    }
    #endif

I suggest that, if the pattern of the existing code were copied, the lines involving s->systemd_seat should have read

   if (!(argv = addStrArr(argv, d->systemd_seat, -1)))

Possibly the string array pointed to by argv is properly extended if the value of d->systemd_seat when parseArgs( ) is called is identical to the value of d->systemd_seat when the kdm process is initialized.

I might learn from studying in detail all the string array manipulation functions in util.c work.  These functions are built around malloc() and free().  But I must admit I gave up working through the code soon after I started looking at it this evening.
 
   ===

In my successful boots, kdm typically became active soon after it was initialized, whereas on failed boots kdm typically became active about half a minute after it was initialized.

Comment 112 Hin-Tak Leung 2013-10-30 02:56:50 UTC
(In reply to David Wilkins from comment #111)

I hate to be a kill-joy, but detailed discussion of source code should really be in one of the developer's mailing lists (kde's, or gnome's, or xorg's).

The sort of details you supply is IMHO only appropriate if you are going to propose a patch for RFC at the end. And judging from your confession of not knowing how to apply a patch, it seems unlikely that you are going to supply one.

FWIW, "man patch" should tell you how. Try "patch -pN < thepatchfile" where N is 0, or 1 usually works; you might also like to use 
"patch -p1 --dry-run < thepatchfile" for a dry-run to see if the patch will apply cleanly being doing so; "-R" is for reverse-apply a patch (if your apply failed).

"const" is a hint to the compiler to generate optimized code - it will not generate a runtime error (if you mean "runtime exception" - C is not C++, nor java!); rather, gcc should warn you *during compilation*. Thereafter, it just means the program behaves wrongly. Read a book on C. Even the classic Kernigan+Richie covers this topic, I think.

From a source rpm (or http://pkgs.fedoraproject.org/cgit/kde-workspace.git/, fedora's packaging devel repo, where you can get the distro patches separate frm the source tar ball), you can usually do "rpm -i x.src.rpm", then "rpmbuild -bp x.spec", where x.spec is for package x, to unpack the source tar ball as well as apply all the distro patches. read "man rpm" and "man rpmbuild" for details.

To generate a patch, typically you need to do "diff -u orig new > patchfile". Read "man diff" for what the unified diff option means.

Now, I'd prefer that the next time you want to post your thoughts on the source code, please end it with a proposed patch. Or keep those thoughts to yourself.

Or simply propose a patch and let the patch speak for itself.

Comment 113 Karel Volný 2013-10-30 07:12:42 UTC
(In reply to David Wilkins from comment #111)

thanks for trying to analyze the problem

however, I see two mistakes within your approach

1) instead of diving deep into kdm, the focus should be on what have kdm, gdm and other services somehow affected by this issue (plymouth, avahi, networkmanager, sendmail ... see comment #84) in common

2) you focus on what happens at during boot around 65 seconds after starting the system - but it should be investigated why it happens at 65th second while the whole boot process is expected to take about 15 seconds on average non-prehistoric system

note also, if some set of updates changed the behaviour for you, then it'd be nice to list the whole set of updates, not just something like "oh, I think it could have been kernel"

(sorry if I've overlooked something which would indicate the necessity to put listings of kdm code here, I haven't read everything to the last character, as already said, a bit lenghty and programmer focused for a bugzilla comments ...)

Comment 114 David Wilkins 2013-10-30 10:32:59 UTC
Created attachment 817386 [details]
Proposed patch to the kde-workspace-4.11.1-kdm-logind-multiseat.patch

Apologies in advance if I am doing this incorrectly.

I am proposing a patch to a patch.  As people will have realised, I am not experienced in developing code in a collaborative project.  (What programming I do is almost all in Python, never (for about two decades) in C or any statically-typed language (apart from exercises relating to a Java textbook), and relates to a codebase that I solely create and maintain.  So I have no experience discussing code with other people, proposing patches etc., or submitting comments or patches to bugzillas.  Indeed my posts above are the very first time that I have ever contributed comments or files to any bugzilla.  So I am probably doing everything wrong.  I ask for your indulgence.

The patch is at least short, changing two lines of an existing patch, so if this is not the appropriate way to submit an amendment then I apologize in advance.

   ===

The patch is motivated by the idiom of the kdm (originally xdm) code to which the patch is applied.

The purpose of the patch is to try to ensure that an error does not ensue when the data pointed at by the field d->systemd_seat of the display object mutates when kdm is running, resulting in the function prepareServerArg exiting with status code 47, thereby failing to start the X server on VT1 and in consequence generating the pattern of warnings that I have found in my systemd journal regularly over the past month.  It seems clear from study of the xdm code that parseArgs is used when the appended argument is a fixed string, whereas addStrArr is used when the argument points to a mutating character array.  The kdm-logind-multiseat patch does not appear to respect this distinction.

But I haven't yet had time to study the internals of

   kde-workspace-4.11.1/kdm/backend/util.c

I may well do so for my own education, though presumably discussion of them would not be welcome here.  So my proposed patch is not based on any study of the internal workings of either parseArgs or addStrArr.

  ===

Finally some short comments and hints on why kdm might be starting later in the boot process than it used to.  I have not yet studied what the multiseat patch does when its is applied to

   kde-workspace-4.11.1/kdm/backend/dm.c

but one thing seems abundantly clear from manually applying the patch: the multiseat patch introduces a dependency of kdm upon systemd that did not exist before.  Maybe this might imply that kdm cannot become akde-workspace-4.11.1ctive as early as it used to, because it has to wait for systemd-related sockets to become active, or for systemd-related libraries to be installed.  My limited understanding of systemd indicates that processes are started in parallel, but block until the relevant sockets become available for reading or writing.  So it seems at least plausible to me that introducing a dependency of kdm upon systemd in the

   kde-workspace-4.11.1-3

Fedora 19 update might change the time at which kdm becomes active in the boot process.

Comment 115 David Wilkins 2013-10-30 22:44:46 UTC
Created attachment 817653 [details]
Output of journalctl -o short-monotonic -ab with kdm patched to generate journal messages

Hoping this is a useful contribution.

Well the problems I am experiencing do not stem from the child process generated by kdm exiting with status 47, as I suggested in Comment 111 and Comment 114.

Today I went to the web to find out how to build an RPM from an SRPM.  Then I rebuilt RPMs from

   kde-workspace-4.11.2-1.src.rpm

to verify that I could build them.  Then I prepared another build, went in and edited a couple of kdm source files

   kde-workspace-4.11.2/kdm/backend/server.c
   kde-workspace-4.11.2/kdm/backend/dm.c

to get them to write log messages to the journal at strategic points, including the point just after kdm has forked to generate a child process for the X server, and the point just before that child process calls execv to become the X server process.  I also made sure that a warning would be generated just before each call to exit(47).

Then I created a patch with diff -Npru, as explained on a CentOS wiki page, went back and edited my kde-workspace.spec, reprepared the build, checking that my warnings patch had been applied, and then rebuilt the RPMs with my patch, and installed them.

Only trouble is that I did not know what to put as the appropriate release, so I now have a number of kde*-4.11.2-2.fc19 RPMs on my computer that are not the ones that should come from the Fedora project.  I hope that yum downgrade does what it says on the tin!

For the record, here is the function startServerOnce() from server.c, with my warnings added as logWarn( ... ) /* DRW */

==============================================================================

static void
startServerOnce(void)
{
    struct display *d = startingServer;
    char **argv;

    debug("startServerOnce for %s, try %d\n", d->name, ++d->startTries);

    d->serverStatus = starting;
    switch (Fork(&d->serverPid)) {
    case 0:
        logWarn("In Child process of fork before calling X server\n"); /* DRW */
        argv = prepareServerArgv(d, d->serverArgsLocal);
        if (d->authFile) {
            if (!(argv = addStrArr(argv, "-auth", 5)) ||
                !(argv = addStrArr(argv, d->authFile, -1))) {
                logWarn ("Exit 47: prepareServerArgv, point L\n"); /* DRW */
                exit(47);
            }
        }
        logWarn("About to call execv for X server\n"); /* DRW */
        debug("exec %\"[s\n", argv);
        /*
         * give the server SIGUSR1 ignored,
         * it will notice that and send SIGUSR1
         * when ready
         */
        (void)Signal(SIGUSR1, SIG_IGN);
        (void)execv(argv[0], argv);
        logError("X server %\"s cannot be executed\n", argv[0]);
        logWarn ("Exit 47: prepareServerArgv, point M\n"); /* DRW */
        exit(47);
    case -1:
        logError("X server fork failed\n");
        startServerFailed();
        break;
    default:
        debug("X server forked, pid %d\n", d->serverPid);
        serverTimeout = d->serverTimeout + now;
        break;
    }
}

==============================================================================

There were quite a number of warnings added to the prepareServerArgv(...) function, one before each exit(47), but these were not triggered.

Here is a relevant logWarn(...) added to dm.c, with some context:

==============================================================================
            case running:
                if (startingServer == d && d->serverStatus != ignore) {
                    if (d->serverStatus == starting && waitCode(status) != 47)
                        logError("X server died during startup\n");
                    logWarn("DRW: About to call startServerFailed(), point P;\n"); /* DRW */
                    startServerFailed();
                    break;
                }
==============================================================================

And here, extracted from the attached journalctl output, are the KDE related messages
==============================================================================
[   51.589137] xxxxxxxxx.xxx.tcd.ie kdm[359]: plymouth is running
[   51.664470] xxxxxxxxx.xxx.tcd.ie kdm[359]: plymouth is active on VT 1, reusing for :0
[   52.372671] xxxxxxxxx.xxx.tcd.ie kdm[359]: plymouth should quit after server startup
[   52.484504] xxxxxxxxx.xxx.tcd.ie kdm[359]: DRW: About to try to start server
[   52.485088] xxxxxxxxx.xxx.tcd.ie kdm[641]: In Child process of fork before calling X server
[   52.485348] xxxxxxxxx.xxx.tcd.ie kdm[641]: About to call execv for X server
[   53.119282] xxxxxxxxx.xxx.tcd.ie kdm[359]: X server startup timeout, terminating
[   53.121973] xxxxxxxxx.xxx.tcd.ie kdm[359]: DRW: About to call startServerFailed(), point P;
[   53.173496] xxxxxxxxx.xxx.tcd.ie kdm[359]: X server for display :0 cannot be started, session disabled
[   53.173810] xxxxxxxxx.xxx.tcd.ie kdm[359]: Quitting Plymouth without transition
[   53.176305] xxxxxxxxx.xxx.tcd.ie kdm[359]: plymouth is NOT running
[  134.579248] xxxxxxxxx.xxx.tcd.ie kdm[359]: DRW: KDM is entering mainLoop();
[  134.602796] xxxxxxxxx.xxx.tcd.ie kdm[1377]: plymouth is NOT running
[  134.629061] xxxxxxxxx.xxx.tcd.ie kdm[1377]: DRW: About to try to start server
[  134.629362] xxxxxxxxx.xxx.tcd.ie kdm[1380]: In Child process of fork before calling X server
[  134.629651] xxxxxxxxx.xxx.tcd.ie kdm[1380]: About to call execv for X server
[  136.002493] xxxxxxxxx.xxx.tcd.ie kdm[1377]: DRW: About to call startServerSuccess(), point Q;
[  210.349352] xxxxxxxxx.xxx.tcd.ie kdm[1384]: :0[1384]: pam_unix(kdm:session): session opened for user dwilkins by (uid=0)

==============================================================================

It really was a pain booting in.  The first two attempts left the display with no virtual terminals, responding only to Ctrl-Alt-Del.  On the third attempt I pressed ESC to exit graphical boot.  Nevertheless the display did not transition to X, but left the machine with virtual terminals, so I was able to log in as root on VT2 and do

   systemctl restart kdm.service

Hoping that this is helpful.

Finally, I would note that kdm with Fedora patches does not spawn /usr/bin/X directly, but instead spawns

   /lib/systemd/systemd-multi-seat-x

So if anyone else wants to focus on where the problems lie, the above executable would seem to be the place to start.  This executable belongs to package

   systemd-204-17.fc19.x86_64

Comment 116 Hin-Tak Leung 2013-10-31 00:32:26 UTC
(In reply to David Wilkins from comment #115)

Again, way too much information - regarding how you learn and where you learn from - and irrelevant ones at that. We really don't need t know, and not interested to know.

Please try to summarise your thought in one sentence (say, 20 words), and keep irrelevant details out of it.

> Only trouble is that I did not know what to put as the appropriate release,
> so I now have a number of kde*-4.11.2-2.fc19 RPMs on my computer that are
> not the ones that should come from the Fedora project.  I hope that yum
> downgrade does what it says on the tin!

"yum downgrade" then "upgrade" should restore to where you start from. It is fairly dumb - look for the "Version:" and "Release:" line towards the top of the spec file. upgrade/downgrade is mostly just a numeric/string/dictionary-order-comparison, nothing too sophisticated. I usually edit the release field to put my initials at the end of the release tag, if I am making a slightly modified rpm - this way I know it is mine just from "rpm -q" and it would be overwritten when the next official upgrade comes out. An official upgrade increments the numeric part of the release tag at least, so putting my initials at the end means mine is only marginally higher than the current release.

You could also download from say, http://koji.fedoraproject.org/ rpm packages and use "rpm -Uvh --force" to go back and forth. (koji is fedora's "beta/release-build farm", in a nut shell).

Again, we should not be talking about general development practises/packaging issues, here. This advice is really just to help you experiment (and restore to where you start from).

Comment 117 Fedora Blocker Bugs Application 2013-10-31 00:58:05 UTC
Proposed as a Blocker for 20-beta by Fedora user scorpionit using the blocker tracking app because:

 This is a serious bug that compromise the use of the distribution

Comment 118 Chris Murphy 2013-10-31 01:08:51 UTC
Set this bug to block 963210 "why we boot so slow" rather than making it a beta blocker. Also there is no cited criteria being violated by this bug.

Comment 119 Elia Devito 2013-10-31 01:16:38 UTC
(In reply to Chris Murphy from comment #118)
> Set this bug to block 963210 "why we boot so slow" rather than making it a
> beta blocker. Also there is no cited criteria being violated by this bug.


sometimes this bug prevents booting the system also block the bug  967521 that compromise the user experience

Comment 120 Elia Devito 2013-10-31 01:26:53 UTC
(In reply to Elia Devito from comment #119)
> (In reply to Chris Murphy from comment #118)
> > Set this bug to block 963210 "why we boot so slow" rather than making it a
> > beta blocker. Also there is no cited criteria being violated by this bug.
> 
> 
> sometimes this bug prevents booting the system also block the bug  967521
> that compromise the user experience

block the bug 983110 (sorry)

Comment 121 Chris Murphy 2013-10-31 01:38:33 UTC
It sometimes prevents booting Fedora 20 beta installation media? Does it prevent software updates from fixing this bug?

"compromise the user experience" does not make this a beta blocking bug. Please read Fedora 20 Beta Release Criteria.

http://fedoraproject.org/wiki/Fedora_20_Beta_Release_Criteria

Comment 122 David Wilkins 2013-10-31 09:47:26 UTC
The X wrapper

   /lib/systemd/systemd-multi-seat-x

is in the clear.  I moved this file, and then tried rebooting, but the display failed to make the transition from Plymouth to X.

   ===

Without boring people with the details (which are documented in comments above), kdm with the logind multiseat patch applied will test for the existence of the above file, by trying to opening it for reading.  If it is able to open the file, it will use it in place of /usr/bin/X in order to start the X server.

The relevant code is reproduced in Comment 111 above, noting in addition that SYSTEMD_X_WRAPPER is defined as follows:

   #define SYSTEMD_X_WRAPPER "/lib/systemd/systemd-multi-seat-x"

Without going into the details, which are established in comments above, I have proved that, on all boots, normal and failed, kdm succeeds in spawning a child process for the X server, and that it reaches the execv command used to start the X server process.

   ===

Here is a conundrum for anyone willing to delve into the internals of kdm source.  Why do I not see the error message

   X server died during startup

in my journal output?  (See Comment 115 and the code snippets therein.)
The code to generate this error message is reached, as proved above (Comment

This message is generated with logError(message), not logWarn(message), but I do find
   
   X server for display :0 cannot be started, session disabled

and this is generated with logError(message).  Therefore the logError function does write out to the journal.  From the code snippets in Comment 115 it would appear that either d->serverStatus changes from the value 'starting' set in the startServerOnce() function of server.c or else the X server process exits with status code 47 AFTER it has been started with started with either /usr/bin/X or indirectly via /lib/systemd/systemd-multi-seat-x.  (Comment 115 proves that either the execv call to start /usr/bin/X or the System call before it is reached.)

Comment 123 Harald Hoyer 2013-10-31 13:10:15 UTC
From what I can see, is that some services like network, dkms, firewalld are still running, causing plymouth-quit-wait.service to timeout.

/usr/lib/systemd/system/plymouth-quit-wait.service has:
TimeoutSec=20

So, if "/usr/bin/plymouth --wait" takes longer than 20 seconds, it will fail.

You can either experiment with increasing the timeout, or we have a dead lock here.

What is the status of the plymouth-quit.service ?

Another thing, what makes me wonder:

Why is there a distinction between plymouth-quit-wait.service and plymouth-quit.service?

And why don't gdm.service and kdm.service have:
After=plymouth-quit-wait.service

Comment 124 Harald Hoyer 2013-10-31 13:16:48 UTC
I miss the "Started Terminate Plymouth Boot Screen" message of plymouth-quit.service in the logs.

Comment 125 Harald Hoyer 2013-10-31 13:20:23 UTC
Here we go:

[    5.196367] xxxxxxxxx.xxx.tcd.ie systemd[1]: Trying to enqueue job graphical.target/start/isolate
[    5.196622] xxxxxxxxx.xxx.tcd.ie systemd[1]: Looking at job plymouth-quit.service/stop conflicted_by=yes
[    5.196630] xxxxxxxxx.xxx.tcd.ie systemd[1]: Looking at job plymouth-quit.service/start conflicted_by=no
[    5.196637] xxxxxxxxx.xxx.tcd.ie systemd[1]: Fixing conflicting jobs by deleting job plymouth-quit.service/start

Comment 126 Ray Strode [halfline] 2013-10-31 13:37:22 UTC
I chatted with harald a bit about this on irc (log below).  For KDM and GDM, plymouth-quit-wait is supposed to be used, not plymouth-quit.  Can someone who is seeing this bug try putting TimeoutSec=0 instead of TimeoutSec=20 in /usr/lib/systemd/system/plymouth-quit-wait.service ?

(irc log follows)

<halfline> haraldh: so the scoop with plymouth-quit versus plymouth-wait-quit is there are two ways in which plymouth gets quit
<halfline> the first is before the display manager is started
<halfline> the second is from the display manager itself
<halfline> the first happens for display managers that don't support flicker free boot
<halfline> the second happens for display managers that do
<halfline> plymouth-quit is for when systemd quits plymouthd
<halfline> plymouth-wait-quit is for when the display manager quits plymouth
<halfline> do you think the timeout=20 is what's causing the problem?
<halfline> i don't know why that timeout was put there. i didn't write the service files initially
<haraldh> ok
<halfline> we could try putting TimeoutSec=0
<halfline> and see if that "fixes" things
<halfline> i'll ask on the bug report

Comment 127 Ray Strode [halfline] 2013-10-31 13:44:34 UTC
actually, i'll just do a test build, that's probably easier.

Comment 128 Harald Hoyer 2013-10-31 14:03:36 UTC
maybe also look at /var/log/Xorg.*.log why the X server start fails

Comment 129 David Wilkins 2013-10-31 14:26:13 UTC
Created attachment 817927 [details]
Log file Xorg.0.log from computer failing to transition to X on boot

Comment 130 David Wilkins 2013-10-31 14:45:55 UTC
Created attachment 817945 [details]
Log file Xorg.0.log.old from same computer and time as Comment 129

Comment 131 Fedora Update System 2013-10-31 14:47:59 UTC
plymouth-0.8.9-3.2013.08.14.fc20 has been submitted as an update for Fedora 20.
https://admin.fedoraproject.org/updates/plymouth-0.8.9-3.2013.08.14.fc20

Comment 132 David Wilkins 2013-10-31 14:54:41 UTC
Created attachment 817956 [details]
Output of journalctl -o short-monotonic -ab for boot with Xorg.0.log attached to Comments 129 and 130

Comment 133 Fedora Update System 2013-10-31 14:58:36 UTC
plymouth-0.8.9-0.2013.03.26.1.fc19 has been submitted as an update for Fedora 19.
https://admin.fedoraproject.org/updates/plymouth-0.8.9-0.2013.03.26.1.fc19

Comment 134 David Wilkins 2013-10-31 16:17:31 UTC
Created attachment 817972 [details]
Xorg.0.log.dated-2013-10-31-1513.txt, no transition to X, no subsequent kdm restart

Attached is an Xorg.0.log file from a failed boot.  An accompanying file of journalctl output to follow.

History of this boot.

Booted into Fedora.  Did not type ESC to leave the graphical boot.  White expanding central animation eventually gave way to blue fedora logo in the centre for a few seconds.  Then the screen went black, and stayed black and dead thereafter.  It was impossible to get a virtual terminal with Ctrl+Alt+F2 etc.

Affected computer has Intel Sandybridge graphics.

I logged in remotely with ssh -Y from a neighbouring desktop over a private network.  Neighbouring desktop has Intel Ivybridge graphics.  The password was accepted at 273.681579 seconds after the start of the boot.  I saved

   /var/log/Xorg.0.log

(attached), ran journalctl -o short-monotonic -ab to get a journal dump establishing chronology etc., then viewed a PDF file from the remote computer to verify that the X client was functioning, then logged out of the session on the remote computer.  Finally I went back to the affected computer and typed Ctrl+Alt+Del to power down.

Comment 135 David Wilkins 2013-10-31 16:22:19 UTC
Created attachment 817974 [details]
Output of journalctl -o short-monotonic -ab to accompany Xorg.0.log of Comment 134

The log file is for the same boot as the Xorg.0.log file of Comment 134.  IP addresses on the local network have been redacted.

Comment 136 David Wilkins 2013-10-31 16:32:36 UTC
Created attachment 817988 [details]
Xorg.0.log.dated-2013-10-31-1548.txt, to virtual terminals, before kdm restart

The attached /var/log/Xorg.0.log file is from the boot following that of Comment 134.  This proceeded differently, I typed ESC early on to get out of graphical boot.  The messages flashed past until the final 'Reached target Graphical Boot', though the final four messages did not scroll down the virtual terminal.  When VT1 went dead I waited for the boot to complete, then typed Ctrl+Alt+F2 to get a virtual terminal.  I logged in as root.  I then copied the new Xorg.0.log (attached) and generated journalctl output for the boot.  Having done that, I used

   systemctl restart kdm.service

as root to bring up the KDM greeter.  Accompanying journalctl to follow.

Comment 137 David Wilkins 2013-10-31 16:36:31 UTC
Created attachment 817990 [details]
Output of journalctl -o short-monotonic -ab to accompany Xorg.0.log of Comment 136

Comment 138 Zbigniew Jędrzejewski-Szmek 2013-10-31 16:43:55 UTC
Hi David,
it'd be great if you could test the update that was just posted. Results with earlier versions aren't interesting if a there's a tentative solution.

Comment 139 Zbigniew Jędrzejewski-Szmek 2013-10-31 16:44:42 UTC
...and if you post anything, please be very clear what versions of plymouth, systemd, and other relevant packages are being used.

Comment 140 Fedora Update System 2013-10-31 17:39:50 UTC
Package plymouth-0.8.9-3.2013.08.14.fc20:
* should fix your issue,
* was pushed to the Fedora 20 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing plymouth-0.8.9-3.2013.08.14.fc20'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2013-20391/plymouth-0.8.9-3.2013.08.14.fc20
then log in and leave karma (feedback).

Comment 141 Mike Ruckman 2013-10-31 17:51:38 UTC
Discussed in 2013-10-31 Go/No-Go meeting [1]. Voted as a RejectedBlocker as well as a RejectedFreezeException. Slow bootup is not a blocker or FreezeException worthy issue, especially for an issue that has been around as long as this one has. Please re-propose with a better justification for why this bug should block release if there is more going on than we understand.

[1] http://meetbot.fedoraproject.org/meetbot/meetbot/fedora-meeting-2/2013-10-31/

Comment 142 David Wilkins 2013-10-31 18:50:58 UTC
I am currently running

   systemd-204-17.fc19.x86_64
   kde-workspace-4.11.2-1.fc19.x86_64
   xorg-x11-server-utils-7.7-1.fc19.x86_64
   plymouth-0.8.9-0.2013.03.26.0.fc19.x86_64

I may test the update to plymouth in a few days, if still relevant.  However I think I now know what failed in the boot referenced in Comments 136 and Comments 137 (which is the current boot).  And I claim that the problem lies in the internals of kdm, in old xdm code.

Compare the chronology of the Xorg.0.log file attached in Comment 136 with the journalctl output attached in Comment 137.

kdm became active at around 61.711659, and by 62.979688 it had determined that Plymouth was active on VT1 and set about starting the X server.  The X server duly started up at 62.989, and proceeded apparently normally, if slowly, to complete successfully at 77.981.

However some other process spawned by kdm terminated at some time just before 64.630813.  That process may well have been Plymouth itself, which was called around 64.587147 to prepare to quit.  As a result, kdm received a SIGCHLD signal.

So what does kdm do when it receives a SIGCHLD signal?  To see that, install the kde-workspace-4.11.2-1 source RPM and prepare the build with rpmbuild -bp.
As I understand it, the BUILD directory should then contain the source that is actually compiled.  In what follows, I refer to the source of kdm with all patches applied.

So take a look at

   kde-workspace-4.11.2/kdm/backend/dm.c

lines 1297 to 1300.  Now I spent some time studying the internals of kdm.  The code is written on the assumption that the only process spawned by kdm (originally by xdm) is the child process that runs the X server.  So when kdm receives a SIGCHLD signal, it takes that as a sign that the X server process has terminated.  So then it immediately calls a function reapChildren() to clean up.  (The name of the function says it all.)  The reapChildren() process shuts things down, and in the process calls the function startServerFailed() of

      kde-workspace-4.11.2/kdm/backend/server.c

(line 181).  This function then sets a server timeout (reducing the value of serverTimeout from TO_INF (timeout infinity) to a time slightly ahead of now.  Once the timeout period is elapsed things start shutting down, and messages such as those at 62.882904 and 63.219255 are generated.

Actually things look a little more interesting, because I have just noticed that the X server process starting up at 62.979688 must have been the second such process, and must have been started up after the first one was killed.  But by then it was too late.  Plymouth was on course to quit without transition.

So, on my reading, whether or not the transition from Plymouth to X takes place depends on which process finishes first: the Plymouth process (or some other process started by kdm) or the X server process.  Because if some other process terminates before X then the X process will get killed.

    ===

If I am wrong in the above, it will prove that I don't understand the internals of kdm as well as I think I do, and that is very probable.  Nevertheless I will offer this up as what I still think is the most likely explanation of what has been happening on my machine.

I think I will still leave my machine with the latest plymouth patch uninstalled for a bit.

Comment 143 Hin-Tak Leung 2013-10-31 19:07:36 UTC
(In reply to Ray Strode [halfline] from comment #127)
> actually, i'll just do a test build, that's probably easier.

(In reply to Fedora Update System from comment #133)
> plymouth-0.8.9-0.2013.03.26.1.fc19 has been submitted as an update for
> Fedora 19.
> https://admin.fedoraproject.org/updates/plymouth-0.8.9-0.2013.03.26.1.fc19

I grabbed the source rpm from koji and did a local rpm rebuild, then rpm -Fvh *.
I think you have fixed the issue! Have reboot about 6 times and just let it go
from graphic boot to gdm without pressing ESC, and it went okay.

$ rpm -q plymouth
plymouth-0.8.9-0.2013.03.26.1.fc19.x86_64

$ rpm -q --changelog plymouth |head -4
* Thu Oct 31 2013 Ray Strode <rstrode> 0.8.9-0.2013.03.26.1
- Don't timeout plymouth quit waiting
  Related: #967521

# tac /var/log/messages | grep -i plymouth | more
Oct 31 18:20:43 localhost systemd[1]: Started Wait for Plymouth Boot Screen to Quit.
Oct 31 18:20:39 localhost systemd[1]: Starting Wait for Plymouth Boot Screen to Quit...
Oct 31 18:20:34 localhost systemd[1]: Started Tell Plymouth To Write Out Runtime Data.
Oct 31 18:18:52 localhost systemd[1]: Starting Show Plymouth Power Off Screen...
Oct 31 18:18:32 localhost systemd[1]: Started Wait for Plymouth Boot Screen to Quit.
Oct 31 18:18:28 localhost systemd[1]: Starting Wait for Plymouth Boot Screen to Quit...
Oct 31 18:18:23 localhost systemd[1]: Started Tell Plymouth To Write Out Runtime Data.
Oct 31 18:16:41 localhost systemd[1]: Starting Show Plymouth Reboot Screen...
Oct 31 18:16:03 localhost systemd[1]: Started Wait for Plymouth Boot Screen to Quit.
Oct 31 18:16:00 localhost systemd[1]: Starting Wait for Plymouth Boot Screen to Quit...
Oct 31 18:15:53 localhost systemd[1]: Started Tell Plymouth To Write Out Runtime Data.
Oct 31 18:14:05 localhost systemd[1]: Starting Show Plymouth Reboot Screen...
Oct 31 16:11:00 localhost systemd[1]: Started Wait for Plymouth Boot Screen to Quit.
Oct 31 16:10:54 localhost systemd[1]: Starting Wait for Plymouth Boot Screen to Quit...
Oct 31 16:10:48 localhost systemd[1]: Started Tell Plymouth To Write Out Runtime Data.
Oct 31 16:09:02 localhost systemd[1]: Starting Show Plymouth Power Off Screen...
Oct 31 16:08:01 localhost systemd[1]: Started Wait for Plymouth Boot Screen to Quit.
Oct 31 16:07:48 localhost systemd[1]: Starting Wait for Plymouth Boot Screen to Quit...
Oct 31 16:07:41 localhost systemd[1]: Started Tell Plymouth To Write Out Runtime Data.
Oct 31 16:06:21 localhost systemd[1]: Starting Show Plymouth Reboot Screen...
Oct 31 16:04:01 localhost systemd[1]: Starting Wait for Plymouth Boot Screen to Quit...
Oct 31 16:03:52 localhost systemd[1]: Started Tell Plymouth To Write Out Runtime Data.

FWIW, I can see my system takes about 6 seconds to get from "Starting Wait"
to "Started Wait", and it is a 5-year-old machine. There are a few things
which seems to take a while, e.g. firewalld, NetworkManager, chrony, so it
is quite possible 20 seconds is reached. According to dmesg, it takes about 80 seconds
to get to NetworkManager (the lowest is about 60s from /var/log/messages).

Would be interesting to know where that magic number 20s comes from, though.

In any case, it seems this problem is fixed for me, so well-done!

Comment 144 Hin-Tak Leung 2013-10-31 19:47:36 UTC
FWIW, it takes 23 sec to get from plymouth wait quit to gdm:

#journalctl -o short-monotonic -ab | grep -E '(Plymouth|Graphical Interface)'
[    1.927718] localhost.localdomain systemd[1]: Starting Show Plymouth Boot Screen...
[    3.808197] localhost.localdomain systemd[1]: Started Show Plymouth Boot Screen.
[    3.809951] localhost.localdomain systemd[1]: Starting Forward Password Requests to Plymouth Directory Watch.
[    3.810410] localhost.localdomain systemd[1]: Started Forward Password Requests to Plymouth Directory Watch.
[    3.820684] localhost.localdomain systemd[1]: Starting Plymouth switch root service...
[    3.910892] localhost.localdomain systemd[1]: Started Plymouth switch root service.
[   23.417587] localhost.localdomain systemd[1]: Starting Tell Plymouth To Write Out Runtime Data...
[   63.270925] localhost.localdomain systemd[1]: Started Tell Plymouth To Write Out Runtime Data.
[   67.838854] localhost.localdomain systemd[1]: Starting Wait for Plymouth Boot Screen to Quit...
[   72.202762] localhost.localdomain systemd[1]: Started Wait for Plymouth Boot Screen to Quit.
[   95.535900] localhost.localdomain systemd[1]: Starting Graphical Interface.
[   95.540223] localhost.localdomain systemd[1]: Reached target Graphical Interface.

So this is just a confirmation that 20 sec is too low. This is for
reboot between no-upgrades. With the typical reason for a reboot
being new kernel or new software, there is the read-ahead target
as well as the dkms target which slows it down a bit, so I can really
see it typically being a fair bit above 20 sec.

Comment 145 David Wilkins 2013-10-31 20:37:44 UTC
I did install the Plymouth updates.

They did not cure my problem.  Crashed out of the graphical boot into a frozen VT1 around 'Starting Avahi' as usual.

   journalctl _COMM=kdm -ab

showed the same symptoms - X server timeout etc.

I saved a copy of /var/log/Xorg.0.log and did a journalctl dump before restarting kdm.

I will perhaps make a second journalctl dump, post the kdm restart and see if there is anything interesting.

But it suspect that the '/var/log/journal breaks system startup' may combine several bugs (e.g., gdm-related and kdm-related), the common theme being either a slow boot or a failed boot.

Comment 146 David Wilkins 2013-10-31 20:45:06 UTC
P.S.,

There was one difference.  I didn't see the

   'Failed to start Wait for Plymouth Boot Screen to Quit'

That particular bug may be resolved.  I should try the updates out on a laptop which previously showed that message but subsequently booted.

Comment 147 David Wilkins 2013-10-31 22:44:52 UTC
Created attachment 818079 [details]
Output of journalctl -o short-monotonic -ab after update to plymouth-0.8.9-0.2013.03.26.1

This is several boots following installation of

   plymouth-0.8.9-0.2013.03.26.1.fc19.x86_64

A summary of timings related to plymouth and kdm follows:

================================================================================
[    0.706188] xxxxxxxxx.xxx.tcd.ie systemd[1]: Starting System Initialization.
[    0.706345] xxxxxxxxx.xxx.tcd.ie systemd[1]: Reached target System Initialization.
[    0.706527] xxxxxxxxx.xxx.tcd.ie systemd[1]: Starting Show Plymouth Boot Screen...
[    1.026219] xxxxxxxxx.xxx.tcd.ie systemd[1]: Started Show Plymouth Boot Screen.
[    1.026872] xxxxxxxxx.xxx.tcd.ie systemd[1]: Starting Forward Password Requests to Plymouth Directory Watch.
[    1.027056] xxxxxxxxx.xxx.tcd.ie systemd[1]: Started Forward Password Requests to Plymouth Directory Watch.
[    1.583491] xxxxxxxxx.xxx.tcd.ie systemd[1]: Starting Plymouth switch root service...
[    1.632069] xxxxxxxxx.xxx.tcd.ie systemd[1]: Started Plymouth switch root service.
[   13.728409] xxxxxxxxx.xxx.tcd.ie systemd[1]: Starting Tell Plymouth To Write Out Runtime Data...
[   63.258528] xxxxxxxxx.xxx.tcd.ie systemd[1]: Started Tell Plymouth To Write Out Runtime Data.
[   63.369520] xxxxxxxxx.xxx.tcd.ie systemd[1]: Starting System Initialization.
[   63.397507] xxxxxxxxx.xxx.tcd.ie systemd[1]: Reached target System Initialization.
[   65.570558] xxxxxxxxx.xxx.tcd.ie kdm[364]: plymouth is running
[   66.112521] xxxxxxxxx.xxx.tcd.ie kdm[364]: plymouth is active on VT 1, reusing for :0
[   66.271167] xxxxxxxxx.xxx.tcd.ie kdm[364]: plymouth should quit after server startup
[   66.488908] xxxxxxxxx.xxx.tcd.ie kdm[364]: X server startup timeout, terminating
[   66.736694] xxxxxxxxx.xxx.tcd.ie kdm[364]: X server for display :0 cannot be started, session disabled
[   66.737287] xxxxxxxxx.xxx.tcd.ie kdm[364]: Quitting Plymouth without transition
[   67.576153] xxxxxxxxx.xxx.tcd.ie kdm[364]: plymouth is NOT running
[   70.019150] xxxxxxxxx.xxx.tcd.ie systemd[1]: Started Wait for Plymouth Boot Screen to Quit.
[  310.740575] xxxxxxxxx.xxx.tcd.ie kdm[1482]: plymouth is NOT running
[  349.555253] xxxxxxxxx.xxx.tcd.ie kdm[1489]: :0[1489]: pam_unix(kdm:session): session opened for user dwilkins by (uid=0)
================================================================================

I would have restarted kdm.service from VT2 around 310 seconds after boot.

The about is the usual pattern of log messages, apart from the fact that the plymouth-quit-wait.service 'started'.  However the X server process had been killed off long before that.

There is an accompanying Xorg.0.log file.  It documents an X process beginning at 65.607 and completing successfully at 66.483:

   [    66.483] (EE) Server terminated successfully (0). Closing log file.

There is presumably no point in uploading this.  The only (EE) is the final successful termination!

My theory, described in Comment 142, and based on stuff documented in previous comments, is that there is a race between the X server process and the

   /usr/bin/plymouth deactivate

process launched by kdm (by function plymouth_prepare_for_transition() of dm.c) immediately after 66.271167.  If X were to complete before plymouth exited and sent a SIGCHLD signal to kdm (dm.c), then the X server on VT1 would be left in peace.  But if plymouth exited before X finished, then plymouth's SIGCHLD would be treated by kdm as though it were X terminating and the reapChildren() function would be called, with the consequences that we see documented in the summary above.  It seems to have been a close race this time round, if I am correct.  X had already completed successfully at 66.483 before function startServerTimeout() of server.c was called at 66.488908 to send a SIGTERM signal to the X server process.

   ===

See Comment 86 above for another example with journalctl entries corresponding to mine above.

Comment 148 Ray Strode [halfline] 2013-11-01 02:00:04 UTC
So it could be there is more than one issue at play here, in which case it may make sense to follow up in another bug.  this one is already getting quite long.

To be clear, though, the order is supposed to be 

1) "plymouth deactivate" synchronously runs stalling kdm until it finishes
2) kdm starts X server 
3) kdm calls "plymouth quit"

so, assuming it's implemented that way (which i haven't looked, but it is a good bet), then in that case, step 1 is going to generate a SIGCHLD before X is even started, and in step 3 a SIGCHLD is going to be generated after X is already started. X shouldn't be "completing" at all, only starting (in step 2).  The X server is needed for the login screen, of course, so if it completed at startup, then there would be nothing to show the login screen on.

Could it be the race is between plymouth finishing and X starting (indicated by SIGUSR1)? AKA plymouth quit is getting called as soon as the X server is spawned, but before the SIGUSR1 to indicate it started okay?  at any rate, if the problem is just an errant SIGCHLD from an unexpected plymouth process, surely the fix can be to just verify the pid of the child reaped is the X server.

Comment 149 David Wilkins 2013-11-01 10:31:48 UTC
The manner in which kdm and plymouth interact may have changed recently (i.e., within the last two months).  See the discussion in Bug 975079 from Comment 23 of that bug onwards:

   https://bugzilla.redhat.com/show_bug.cgi?id=975079#c23

(Might it be appropriate to reopen that bug for follow-up?   Is this the way things are done?))

I refer in particular to my journalctl output attached to Comment 147 above, and to the times documented there in relation to the start and finish of the X server startup, as determined by the corresponding Xorg.0.log file (not uploaded).  I have uploaded a number of journalctl output files that partially document how kdm has been operating on my machine over the past month.  But it was a comment yesterday suggesting that Xorg.*.log output might be useful that drew my attention to the existence of those log files.

   ---

I am aborting the rest what I was writing in this comment.  Now that I have read Comment 148 and am looking more carefully at the dm.c code with patches applied, things are more complicated than I thought.

This is what is now bugging me.  Referring to the boot of Comment 147, if kdm was stalled, or at least trapped in the function allocateVT(d) of dm.c, what started the X server process that apparently started at 65.607?  If the function plymouth_is_running() were called more than once, it would be easier to fathom what was going on.

Maybe I should move /var/log/journal and go hunting for the current boot journal in /run?  I moved /var/log/journal for a couple of boots, and this 'cured' the problem for those boots.

Comment 150 Harald Hoyer 2013-11-01 12:26:44 UTC
(In reply to David Wilkins from comment #149)
> The manner in which kdm and plymouth interact may have changed recently
> (i.e., within the last two months).  See the discussion in Bug 975079 from
> Comment 23 of that bug onwards:
> 
>    https://bugzilla.redhat.com/show_bug.cgi?id=975079#c23
> 
> (Might it be appropriate to reopen that bug for follow-up?   Is this the way
> things are done?))
> 
> I refer in particular to my journalctl output attached to Comment 147 above,
> and to the times documented there in relation to the start and finish of the
> X server startup, as determined by the corresponding Xorg.0.log file (not
> uploaded).  I have uploaded a number of journalctl output files that
> partially document how kdm has been operating on my machine over the past
> month.  But it was a comment yesterday suggesting that Xorg.*.log output
> might be useful that drew my attention to the existence of those log files.
> 
>    ---
> 
> I am aborting the rest what I was writing in this comment.  Now that I have
> read Comment 148 and am looking more carefully at the dm.c code with patches
> applied, things are more complicated than I thought.
> 
> This is what is now bugging me.  Referring to the boot of Comment 147, if
> kdm was stalled, or at least trapped in the function allocateVT(d) of dm.c,
> what started the X server process that apparently started at 65.607?  If the
> function plymouth_is_running() were called more than once, it would be
> easier to fathom what was going on.
> 
> Maybe I should move /var/log/journal and go hunting for the current boot
> journal in /run?  I moved /var/log/journal for a couple of boots, and this
> 'cured' the problem for those boots.

why not file a bug against kdm describing the problem?

Comment 151 David Wilkins 2013-11-01 16:43:37 UTC
(In reply to Harald Hoyer from comment #150)

> why not file a bug against kdm describing the problem?

Having been thus prompted, I plan to do so once my computer fails to boot to a KDM greeter.

However about three boots since the plymouth update that failed to get to the greeter have been succeeded by four boots that eventually reached the greeter.  The Xorg.0.log file of the first of these starts at 32.397 and ends at 84.621 with an enormous gap from 33.444 to 80.715.  The Xorg.0.log file of the third starts at 29.838 and ends at 81.143, with an enormous gap from 31.503 to 77.328.  I have also saved journalctl output corresponding to these Xorg log files.  I plan to examine the latest stuff more carefully.

Maybe others experiencing slow boots or failure to boot to a greeter might take a look at their /var/log/Xorg.0.log files and maybe compare with journalctl output?

Now I look at these in more detail, I can spot a common factor.  The two lines before the jump are

   No input driver specified, ignoring this device.
   This device may have been added with another device file.

in the Xorg.0.log files with big jumps that I have so far looked at.
   
===
[    31.503] (**) USB Optical Mouse: (accel) acceleration factor: 2.000
[    31.503] (**) USB Optical Mouse: (accel) acceleration threshold: 4
[    31.503] (II) config/udev: Adding input device USB Optical Mouse (/dev/input
/mouse0)
[    31.503] (II) No input driver specified, ignoring this device.
[    31.503] (II) This device may have been added with another device file.
[    77.328] (II) intel(0): EDID vendor "DEL", prod id 41035
[    77.328] (II) intel(0): Using EDID range info for horizontal sync
[    77.328] (II) intel(0): Using EDID range info for vertical refresh

===

I have just spotted another common factor: there can be a large gap of (e.g., about 40 seconds) after the optical mouse in a lot of my saved Xorg.0.log files.

(It seems to be the mouse rather than the Intel device: I have noticed an example where the jump is followed by the unloading of the "evdev" module.)
Anyway here is the complete Optical Mouse entry, with a couple of following lines, put here just in case the mouse is in part responsible for the problems:

===
[    33.443] (II) config/udev: Adding input device USB Optical Mouse (/dev/input/event4)
[    33.443] (**) USB Optical Mouse: Applying InputClass "evdev pointer catchall"
[    33.443] (II) Using input driver 'evdev' for 'USB Optical Mouse'
[    33.443] (**) USB Optical Mouse: always reports core events
[    33.443] (**) evdev: USB Optical Mouse: Device: "/dev/input/event4"
[    33.443] (--) evdev: USB Optical Mouse: Vendor 0x461 Product 0x4d81
[    33.443] (--) evdev: USB Optical Mouse: Found 3 mouse buttons
[    33.443] (--) evdev: USB Optical Mouse: Found scroll wheel(s)
[    33.443] (--) evdev: USB Optical Mouse: Found relative axes
[    33.443] (--) evdev: USB Optical Mouse: Found x and y relative axes
[    33.443] (II) evdev: USB Optical Mouse: Configuring as mouse
[    33.443] (II) evdev: USB Optical Mouse: Adding scrollwheel support
[    33.443] (**) evdev: USB Optical Mouse: YAxisMapping: buttons 4 and 5
[    33.443] (**) evdev: USB Optical Mouse: EmulateWheelButton: 4, EmulateWheelInertia: 10, EmulateWheelTimeout: 200
[    33.443] (**) Option "config_info" "udev:/sys/devices/pci0000:00/0000:00:1d.0/usb2/2-1/2-1.2/2-1.2:1.0/input/input4/event4"
[    33.443] (II) XINPUT: Adding extended input device "USB Optical Mouse" (type: MOUSE, id 10)
[    33.443] (II) evdev: USB Optical Mouse: initialized for relative axes.
[    33.443] (**) USB Optical Mouse: (accel) keeping acceleration scheme 1
[    33.444] (**) USB Optical Mouse: (accel) acceleration profile 0
[    33.444] (**) USB Optical Mouse: (accel) acceleration factor: 2.000
[    33.444] (**) USB Optical Mouse: (accel) acceleration threshold: 4
[    33.444] (II) config/udev: Adding input device USB Optical Mouse (/dev/input/mouse0)
[    33.444] (II) No input driver specified, ignoring this device.
[    33.444] (II) This device may have been added with another device file.
[    80.715] (II) intel(0): EDID vendor "DEL", prod id 41035
[    80.715] (II) intel(0): Using EDID range info for horizontal sync
===

Comment 152 Germano Massullo 2013-11-02 07:59:53 UTC
My vote on proposed patch is negative, because X, KDM don't start. I still have to restart kdm service to have a KDM login session

Comment 153 Hin-Tak Leung 2013-11-02 15:51:00 UTC
(In reply to David Wilkins from comment #151)

Again, way too long.

Regarding possible source of delay from evdev and probing for plug-able mouses. That evdev system in X was to allow usb mouses - which can be plugged and unplugged - to work. Traditionally X refuses to start if a pointer device is not present. There are also other recent changes in that area for multi-touch devices (i.e. touch screens).

It is a possible source of delay if you have a complicated USB chain, or other heavy-duty/slow devices on your USB chain.

In any case, if you are not going to plug/unplug your mouse at all, nor running X on a multi-touch device, you could just hard-code your mouse setting in your xorg.conf . That should make X starts up faster. Read the manual for how to do that.

(In reply to Germano Massullo from comment #152)
> My vote on proposed patch is negative, because X, KDM don't start. I still
> have to restart kdm service to have a KDM login session

Which one is that? The new plymouth rpm packages from Ray Strode or the kde mod from David Wilkins?

The new plymouth packages seem to fix the problem for me, so gnome users should give it a go, at least.

Comment 154 David Wilkins 2013-11-02 20:23:42 UTC
Two boots today failed to reach the kdm greeter.  I have saved log files for those boots, and also for other boots that did reach the greeter.  Therefore I plan to file a bug against kdm with regard to kdm-specific issues.  I believe there are several issues at play here, and plan to summarize the position that I currently understand it, in the hope that some of what follows will be useful to somebody.  Also it includes my considered test report on the proposed plymouth update (which I was asked to test).

   1. Update to plymouth
      ==================

      The plymouth update of Comment 140 under consideration seems to have
      resolved one specific problem: on my machine, booting into kdm, it has
      ensured that the plymouth-quit-wait.service does not fail to 'start'.
      Moreover the evidence of log files is that this service now 'starts'
      once plymouth has actually quit.

      This eliminates a phenomenon that is a distraction and a red herring.
      In that regard, the update is useful.  It may do more for those
      booting into gdm, or some other display manager.  But I believe that
      those booting into kdm are afflicted by another bug, and moreover
      analysis of log files extracted with journalctl seems to demonstrate
      that the likelihood of kdm achieving a transition from plymouth to
      X did not appear to depend on whether the transition is attempted
      before or after the old plymouth-quit-wait timeout.  However, the
      earlier that kdm attempts the transition, the more likely it is that
      the transition will succeed.

      Changing the timeout in seconds from 20 to the current value of zero
      in itself is not going to resolve the issues underlying this bug.
      But if there are other improvements that ensure that services are
      started more promptly and predictably, then such improvements
      will certainly be positive.

   2. Sources of relevant evidence
      ============================

         (Certainly relevant for kdm, perhaps for gdm.)

         A. Journalctl output
            -----------------

            Non-root users will save themselves hassle if they
            belong to group systemd-journal.  Suggest

               journalctl -o short-monotonic -ab
               journalctl -o short -ab
           
            Both give useful information, the first probably more
            useful.  For a good and useful summary of the most important
            kdm-related information, I recommend

               journalctl -o short-monotonic _COMM=kdm -ab

         B. Xorg.0.log files
            ----------------

            These give very pertinent information not obtainable
            with journalctl, but, to be useful, the log files need
            to be saved before being overwritten.

            In particular, suppose the boot takes you to a black
            screen or to virtual terminals.  The Xorg.0.log contains
            information on the startup (and shutdown) of the X server
            on the display, but it needs to be saved before restarting
            kde.service or starting an X session remotely, or it will
            get overwritten.

            Xorg.0.log.old contains information about the previous boot,
            but the timestamp corresponds to the current boot.

         C. Relevant source files with all relevant patches applied
            -------------------------------------------------------

   3. Interpretation of the evidence - time delays
      ============================================

      The information provided by 2 (A, B and C.) seems irreconcilable
      unless intepreted on the basis that there is a time delay in logging
      events, and that this time delay can be significantly greater for
      journal output in comparison with Xorg logs.  One should not presume
      that the monotonic times in the journal are consistent with those
      in Xorg log files.

   4. The kdm boot bug
      ================

      (Almost certainly not relevant to gdm.)

      Afflicts: kdm >= 4.11.1 patched for Fedora

      The records in log files (2 above), interpreted in accordance with
      3 above, demonstrates that, in attempting to make the transition
      from X to kdm, an X server is started, initializes in a relatively
      short period of time and, on failed boots, is shut down cleanly
      on receipt of an erroneous termination signal from kdm.

      This should be discussed in a separate bug report, but, in summary,
      I now consider that the problem arises because a particular function,
      kickDisplay(), is called twice during the first iteration of the main
      loop of kdm.  This starts the X server the first time it is called.
      I suggest that the second call to kickDisplay() may cause that X
      server to be terminated, but that whether or not it does so is
      determined by the time difference between the successive calls to
      kickDisplay().  This is presumably an issue for a separate bug
      filed against kdm, but I wish to record this summary here.  It is
      incontrovertible that the X server is started and is then cleanly
      shut down (with unloading of kernel modules) shortly afterwards.

   5. Issues related specifically to gdm
      ==================================

      I cannot comment on these.

   6. Apparent large gaps in timings in Xorg.0.log files
      ==================================================

      See Comment 151 above.

      A red herring.

      Some thought, plus examination of log files, shows that the
      large time gaps of approximately 40 seconds cited in Comment 151
      represent the time interval between the computer bringing up the
      kdm greeter and the initialization of a KDE session following
      entry of username and password.  The switching state of the monitor,
      and the consequent blackouts, flickers and jumps, are presumably
      relevant for some bug, but not this bug.

   7. Mouse Issue
      ===========

      Another red herring, in the light of 6.

      (The mouse is a bog-standard Dell N889, and functions perfectly well,
      and always has.) 

Too long?  I know.  Sorry!  But if any of the above is useful to someone charged with resolving the bugs, or seeking to gather and provide evidence to enable the bug to be resolved, then it may be justified.

Comment 155 Hin-Tak Leung 2013-11-02 21:13:17 UTC
(In reply to David Wilkins from comment #154)
<snipped>
>          B. Xorg.0.log files
>             ----------------
> 
>             These give very pertinent information not obtainable
>             with journalctl, but, to be useful, the log files need
>             to be saved before being overwritten.
> 
>             In particular, suppose the boot takes you to a black
>             screen or to virtual terminals.  The Xorg.0.log contains
>             information on the startup (and shutdown) of the X server
>             on the display, but it needs to be saved before restarting
>             kde.service or starting an X session remotely, or it will
>             get overwritten.
> 
>             Xorg.0.log.old contains information about the previous boot,
>             but the timestamp corresponds to the current boot.
<snipped>

That's a lot of rubbish there.

(1) if you start X remotely, the log will have a different display number, i.e. Xorg.1.log, Xorg.2.log, etc. In fact if you switch user (i.e. what the multi-seat patch does), you will have a a different log with a different display number, as well.

(2) Xorg.0.log.old has the "current" timestamp because it is renamed to, when the new one is created. The timestamp is correct for what it indicates. It is just your failure to understand it.

<snipped>
>    7. Mouse Issue
>       ===========
...
>       (The mouse is a bog-standard Dell N889, and functions perfectly well,
>       and always has.) 
<snipped>

You write too much, and do not read.

Your old log indicates that you have a USB optical mouse; moreover, the delay comes from *probing* the USB bus, and probing *every device* on the USB bus - it is valid to config to have two mouses, and many people on laptop indeed have both the internal touchpad and an external mouse. So it does not matter what "bog-standard" mouse you have, the delay is about *everything else*
connected by USB. e.g. you have a USB printer which is plugged into your USB port but not to mains - it will respond (poorly) to a USB probe, because the USB port itself can supply a small amount of power as well, which may be sufficient to power a slow device for probing, but rather poorly.

The correct way to speed up device-probing delay is to create an xorg.conf and specify your device configuration in it. Read the manual.

BTW, the USB specification mandates that the whole USB bus should operates
at the speed of the *slowest* device.

Comment 156 Sergio Basto 2013-11-02 22:36:36 UTC
(In reply to Hin-Tak Leung from comment #153)
> Which one is that? The new plymouth rpm packages from Ray Strode or the kde
> mod from David Wilkins?
> 
> The new plymouth packages seem to fix the problem for me, so gnome users
> should give it a go, at least.

I think not all have the same bug , but the bugs have in common: X doesn't start .
I think also fixes startup for me , don't saw the bug a long time .

Comment 157 David Wilkins 2013-11-02 23:01:01 UTC
So Xorg creates Xorg.1.log, Xorg.2.log etc. when an X session is started remotely, does it?  Which presumably explains why my /var/log is filled with such files:

[dwilkins@mta106032 BootProblem]$ ls -l /var/log/Xorg*
-rw-r--r--. 1 root root 27646 Nov  2 18:29 /var/log/Xorg.0.log
-rw-r--r--. 1 root root 25169 Nov  2 17:39 /var/log/Xorg.0.log.old
-rw-r--r--. 1 root root 25292 Sep 26 18:06 /var/log/Xorg.9.log

I have been copying Xorg.0.log and Xorg.0.log.old files out of /var/log with cp -p for several days, and renaming them with date and time incorporated in the filename.

I have logged in remotely and run X applications, and have restarted kde, and have therefore allegedly created Xorg.1.log, Xorg2.log files in /var/log?  But, for some mysterious reason, they have not shown up when I have listed the contents of /var/log?

And I shouldn't believe the evidence of my own eyes when I see that the monotonic times of my Xorg.0.log begin at 260.184 when I logged in remotely from another computer with ssh -Y, and uploaded that file here, and when I I have an Xorg.0.log starting at 163.666 when I restarted kdm after booting into a computer with no kdm greeter:

   https://bugzilla.redhat.com/attachment.cgi?id=817972
   https://bugzilla.redhat.com/attachment.cgi?id=817945

But somehow, before I restart kdm on a computer that has only virtual terminals, I always find an Xorg.0.log file that starts my earlier and shows a complete X start-up, but of course such files don't get overwritten when I restart kdm?

No doubt, IF I HAD a computer that WAS ALREADY running an X server when I ran an X session remotely, or restarted a display manager, then NO DOUBT THE COMPUTER WOULD CREATE Xorg.1.log, and WOULD NOT OVERWRITE Xorg.0.log.  But for some mysterious reason, the computer I am currently working does not happen to do all the things and create all the files that it is allegedly guaranteed to do and create!  Could this be related to the fact that I am working on a computer that has a bug affecting X and kdm?

Apologies for shouting.

Comment 158 Hin-Tak Leung 2013-11-02 23:50:53 UTC
(In reply to David Wilkins from comment #157)
<a lot of driffle about Xorg.*.log deleted>

Discussing the basis of X server concept is quite outside the scope here; as is quite a few other topics. I have repeated that a few times already.

Look up "display number" and "screen number" in an X server context. Even "man X"  or ("man 7 X") talks about that. It is a 30-year-old concept. The N in Xorg.N.log is the display number.

Comment 159 Germano Massullo 2013-11-03 09:53:29 UTC
(In reply to Hin-Tak Leung from comment #153)
> (In reply to Germano Massullo from comment #152)
> > My vote on proposed patch is negative, because X, KDM don't start. I still
> > have to restart kdm service to have a KDM login session
> 
> Which one is that? The new plymouth rpm packages from Ray Strode or the kde
> mod from David Wilkins?
> 
> The new plymouth packages seem to fix the problem for me, so gnome users
> should give it a go, at least.

plymouth-0.8.9-0.2013.03.26.1.fc19.x86_64
plymouth-core-libs-0.8.9-0.2013.03.26.1.fc19.x86_64
plymouth-scripts-0.8.9-0.2013.03.26.1.fc19.x86_64
plymouth-system-theme-0.8.9-0.2013.03.26.1.fc19.x86_64
plymouth-plugin-two-step-0.8.9-0.2013.03.26.1.fc19.x86_64
plymouth-theme-charge-0.8.9-0.2013.03.26.1.fc19.x86_64
plymouth-graphics-libs-0.8.9-0.2013.03.26.1.fc19.x86_64
plymouth-plugin-label-0.8.9-0.2013.03.26.1.fc19.x86_64

from updates-testing repo

Comment 160 Hin-Tak Leung 2013-11-03 21:33:28 UTC
(In reply to Germano Massullo from comment #159)
...
> plymouth-0.8.9-0.2013.03.26.1.fc19.x86_64
...
> from updates-testing repo

Ok, so we have two KDM users who don't benefit from plymouth-0.8.9-0.2013.03.26.1.fc19.x86_64, and one GDM user who does.

Other KDM/GDM users please test and add to the numbers, and new people who joined should state which DM they are using. Perhaps with the new plymouth only KDM users still suffer.


(In reply to Sergio Monteiro Basto from comment #156)
...
> I think also fixes startup for me , don't saw the bug a long time .

Can you quatify 'a long time'? the new plymouth has only been around for about 4 days.

Comment 161 David Wilkins 2013-11-04 03:19:16 UTC
I had resolved not to comment any further here, but I had a normal boot followed by an abnormal boot, and I have saved Xorg log files, and these clarify matters a bit.

[I was going to write a whole lot more, but I have analysed timings in log files, and should correct one inaccuracy in Comment 154 referred to in Comment 155.]

I had presumed that log files Xorg.0.log and Xorg.0.log.old were created in the same boot because their timestamps were so close.  Having analysed these log files in more detail, I now realise that Xorg.0.log.old was written out at the end of the previous boot, and when I reboot, the end time of the old X session and the start time of the new X session are inevitably close to one another.  Also I certainly hadn't taken on board that, though sometimes boots may seem to take a long time to complete on occasion, they are completed in a minute or two.  I had taken it for granted that if a timestamp on file B is identical to that of file A, or agrees with that on file A to within a minute or two, then the files were last modified in the same boot.  And I now am aware that the final records in an Xorg.0.log file represent the unloading of kernel modules to terminate an X session.

However the particular boot in which the file Xorg.0.log.old is written out is not particularly important.

I stand over my observation that, when a computer with the kdm display manager does not have a running X session at the end of the boot process, and therefore can only be accessed remotely or started through a virtual terminal, then restarting a kdm.session or running an X session remotely will lead to the overwriting of the Xorg.0.log of the aborted X session started at boot.  I stand over this, because that is what happens, and I have confirmed the overwriting this evening.

   ---

Remarks in Comment 155 about the 'mouse issue' misunderstand completely the point I was trying to make about this issue.  Having found the true significance in the gap in the Xorg log files, I had realised that it had nothing whatsoever to do with the mouse, and the mention of the mouse in Comment 154 were intended to clarify that remarks on the mouse in Comment 151.  (In retrospect, maybe the meaning of the term 'red herring' may not be clear to all contributors to bugzillas, and that it would have been wiser to use some other language to express my meaning.)  Any further remarks on the mouse in Comment 155 may be correct or useful, but are of no relevance to discussion of Bug 967521.  And at that point I had already (I thought) made this clear.  That was the intention of Comment 154, issue 7.

===

Having realised that I made a mistake in a remark that I had made in Comment 154, this comment is less fraught than it otherwise would have been.  Nevertheless, although it was suggested that I should file a bug against kdm, and I had said that I planned to do so, I am not sure now that I can go through with this.  My experience over the past few days seems to make clear that a bugzilla is not always a welcoming environment for those who might honestly wish to make a contribution to resolving bugs.  Therefore I may well decide that I cannot take on the responsibility for creating a new bug.

To clarify another point, there is no 'kde mod' from me.  I presume this is language is intended to refer to some modified package that I am presumed to have created.  There is no modified version of any Fedora package produced by me for distribution for any purpose.  There never was, and never will be.  And nothing that I ever wrote on this bugzilla should suggest otherwise to anybody.  So the reference to such a thing, attributed to me, in previous comments is totally inexplicable and without any foundation in fact or intention.

Comment 162 Hin-Tak Leung 2013-11-04 07:04:55 UTC
(In reply to David Wilkins from comment #161)
<more rubbish not worth repeating>

You still fail to read. The level of code details you tried to engage in is more appropriate in a developer mailing list, such as the kde's or the xorg. And the sort of questions you are asking is more appropriate in the newbie community forums, or just read the man pages, about X display numbers and X-forwarding in ssh. Detailed discussion of code in bugzilla is only appropriate when it ends with a proposed patch, intended to be tested by yourself right away, or by others.

If that's not your intention, that's just more rubbish.

"More rubbish" is "more rubbish". "More rubbish" is *not* "information". "More rubbish" is less information.

Comment 163 Hin-Tak Leung 2013-11-04 08:21:01 UTC
(In reply to David Wilkins from comment ...)
> [    0.706188] xxxxxxxxx.xxx.tcd.ie systemd[1]: Starting System
                               ^^^^^^

FWIW, Akademy (the annual KDE users/developers' conference) was held in Trinity College Dublin a few years ago, and I have the T-shirt to prove it. AFAIK tcd.ie has a fairly strong local KDE user/developer community - strong enough and large enough to host such a event. There are probably local KDE users who might have the patience and be more willing to teach you. Some of your questions and curiosities could even be discussed face-to-face, in a more appropriate manner, than in redhat bugzilla.



5 people reported success without identifying what dm, and one KDM failure:
https://admin.fedoraproject.org/updates/plymouth-0.8.9-0.2013.03.26.1.fc19
https://admin.fedoraproject.org/updates/plymouth-0.8.9-3.2013.08.14.fc20
("Anonymous Tester - 2013-10-31 19:13:41" is me)

And an extra KDM failure here.

So the current total is 1 GDM success plus 4 unspecified successes, and 2 KDM failures.

Would be interesting to hear about GDM failures or KDM successes, if any.

Comment 164 David Wilkins 2013-11-04 11:29:54 UTC
Normal boot, resulting in kdm login page and greeter, started 22:45:39, ended circa 23:42:

===
[dwilkins@mta106032 boots-2013-11-03]$ fgrep -H Xorg Xorg*
Xorg.0.log.dated-2013-11-03-2247.txt:[    51.051] (==) Log file: "/var/log/Xorg.0.log", Time: Sun Nov  3 22:46:30 2013
Xorg.0.log.old.dated-2013-11-03-1900.txt:[  3017.859] (==) Log file: "/var/log/Xorg.0.log", Time: Sat Nov  2 18:28:30 2013
===

(Records from later boot deleted.)  Thus Xorg.0.log is from the current boot, and Xorg.0.log.old is from the previous boot, and moreover timestamps on the files are from the current boot and the previous boot respectively.

   ---

Next abnormal boot, not resulting in kdm greeter, saved by logging into a virtual terminal, before restarting kdm:

===
[dwilkins@mta106032 boots-2013-11-03]$ fgrep -H Xorg Xorg*
Xorg.0.log.dated-2013-11-03-2344.txt:[    52.731] (==) Log file: "/var/log/Xorg.0.log", Time: Sun Nov  3 23:43:58 2013
Xorg.0.log.old.dated-2013-11-03-2342.txt:[    51.051] (==) Log file: "/var/log/Xorg.0.log", Time: Sun Nov  3 22:46:30 2013
===

(Records from earlier boot deleted.)  Thus, again Xorg.0.log is from the current boot and starts at 52.731, and Xorg.0.log.old is from the previous boot, and moreover timestamps on the file are from the current boot and the previous boot respectively.

   ---

Finally Xorg log files as found on the computer following the kdm restart:

===
[dwilkins@mta106032 midnightboot]$ fgrep -H Xorg Xorg.*
Xorg.0.log.dated-2013-11-04-0007.txt:[  1379.031] (==) Log file: "/var/log/Xorg.0.log", Time: Mon Nov  4 00:06:05 2013
Xorg.0.log.old.dated-2013-11-03-2344.txt:[    52.731] (==) Log file: "/var/log/Xorg.0.log", Time: Sun Nov  3 23:43:58 2013
===

Thus Xorg.0.log is from the current boot, but starts at 1379.031 when kdm was restarted.  And Xorg.0.log.old is from the same boot, not the previous boot, and is the continuation of the record Xorg.0.log that existed before the kdm restart.

   ---

So (consistent with the main point and purpose of Comment 154, section 2A), if one wishes to save a record of the Xorg log during the boot, one must either save Xorg.0.log from before the kdm restart, or else save Xorg.0.log.old from after the kdm restart.

Moreover these records prove that, although an X server process was not running after the abnormal boot, nevertheless an X server process was started on the console display in the course of the boot.  It seems to me that this is significant information for the purpose of ascertaining which, for kdm people at least, some boots bring up the greeter, whereas other boots do not.

And this is one of the bugs of Bug #967521, not yet some separate and distinct bug: boots fail to arrive at a graphical boot screen, there is a warning message relating to plymouth-quit-wait.service as in Comment #4, and moving /var/log/journal to /var/log/journal.org for a couple of boots resulted in a normal boot which remained in graphical mode throughout.

Comment 165 Germano Massullo 2013-11-04 11:46:14 UTC
DAVID WILKINS, I quote again for you Hin-Tak Leung's message, so maybe you will honor us by reading it instead of showing other  logs and pointless wall of text that do not help.

(In reply to Hin-Tak Leung from comment #162)
> (In reply to David Wilkins from comment #161)
> <more rubbish not worth repeating>
> 
> You still fail to read. The level of code details you tried to engage in is
> more appropriate in a developer mailing list, such as the kde's or the xorg.
> And the sort of questions you are asking is more appropriate in the newbie
> community forums, or just read the man pages, about X display numbers and
> X-forwarding in ssh. Detailed discussion of code in bugzilla is only
> appropriate when it ends with a proposed patch, intended to be tested by
> yourself right away, or by others.
> 
> If that's not your intention, that's just more rubbish.
> 
> "More rubbish" is "more rubbish". "More rubbish" is *not* "information".
> "More rubbish" is less information.

Comment 166 Germano Massullo 2013-11-04 11:47:53 UTC
DAVID WILKINS, I quote again for you Hin-Tak Leung's message, so maybe you will honor us by reading it instead of showing other  logs and pointless wall of text that do not help.

(In reply to Hin-Tak Leung from comment #162)
> (In reply to David Wilkins from comment #161)
> <more rubbish not worth repeating>
> 
> You still fail to read. The level of code details you tried to engage in is
> more appropriate in a developer mailing list, such as the kde's or the xorg.
> And the sort of questions you are asking is more appropriate in the newbie
> community forums, or just read the man pages, about X display numbers and
> X-forwarding in ssh. Detailed discussion of code in bugzilla is only
> appropriate when it ends with a proposed patch, intended to be tested by
> yourself right away, or by others.
> 
> If that's not your intention, that's just more rubbish.
> 
> "More rubbish" is "more rubbish". "More rubbish" is *not* "information".
> "More rubbish" is less information.

Comment 167 Rex Dieter 2013-11-04 13:05:30 UTC
Frankly, I appreciate David's constructive attempt to help resolve this issue... and that's more than anyone's unkind critique of those attempts, e.g. by labeling it as rubbish.  Please make an effort to be constructive, ontopic, and abide by http://fedoraproject.org/code-of-conduct .  Thank you.

Comment 168 Vladislav Grigoryev 2013-11-04 17:11:47 UTC
(In reply to Hin-Tak Leung from comment #163)
> Would be interesting to hear about GDM failures or KDM successes, if any.

F19 + Gnome on Laptop with Intel video:
> # lspci | grep VGA
> 00:02.0 VGA compatible controller: Intel Corporation Mobile 945GSE Express Integrated Graphics Controller (rev 03)

Before update:
1) plymouth-quit-wait.service is "failed" on every boot.
2) In ~30% cases boot stops on screen with Fedora logo.
3) Can get to GDM screen by "systemctl restart gdm.service" over SSH.

Done plymouth update from testing repo:
> # yum --enablerepo updates-testing update plymouth
> # rpm -q plymouth 
> plymouth-0.8.9-0.2013.03.26.1.fc19.i686
Rebooted.

After update:
1) plymouth-quit-wait.service isn't "failed" anymore.
2) No changes.
3) No changes.

So I see no relationship between state of plymouth-quit-wait.service and boot to GDM screen.

Comment 169 Brent R Brian 2013-11-04 17:35:11 UTC
As of today

* aspire 5734Z-4836 - hands off from plymouth -> GDM fine no fails
* white box AMD X2 - plymouth wait fails, hand off to GDM fails on occasion

Both are on x86_64 FC19, both are up to date on "updates".

When GDM hand-off starts to fail on white box, it requires one of these two:

  1) CTRL-ALT_F2, delete /var/log/journal (will work for days)
  2) systemctl method, (required every boot)

Comment 170 Brent R Brian 2013-11-04 17:35:57 UTC
BTW, Aspire is INTEL T4500.

Comment 171 Hin-Tak Leung 2013-11-05 05:32:16 UTC
I tried switching login manager from gdm to kdm . BTW, both gdm/kdm offer logging into the "other camp". While using gdm to login to kde is not cool, it could be an interrim solution.

It is just modifying the symbolic link
"/etc/systemd/system/display-manager.service" from pointing to "/usr/lib/systemd/system/gdm.service" , to /usr/lib/systemd/system/kdm.service .
(and revert to go back).

First time worked, then failed twice, at which point I noted /usr/lib/systemd/system/kdm.service is the only systemd file with two
"Conflicts=" lines, so I merged them; no improvement and failed on the 4th boot.

Then I inserted "-debug 0x7F" to kdm.service to read

    "ExecStart=/usr/bin/kdm -debug 0x7F vt1"

5th was success, 6th failed, 7th was a success. So adding the -debug switch to try to look in more detail seems to decrease the chance of actually seeing a
problem. (from "3 out of 4" to "1 out of 3").

success are accompanied by "Is Plymouth still running? no",
while failures are with "plymouth is NOT running" in /var/log/message.

Anyway, I don't want to spend any more time on this since I don't even use kde normally. The debug bit field are explained in "kdm --help", and I suggest other to mod their kdm.service to add the debug switch with some setting to get more information (and please *do not* post such logs uninvited) - possibly with code-modification to kdm to experiment with. Adding "-debug 0x7F" do show more details between how kdm communicate with greeter and the other sub-processes, etc, ("journalctl _COMM=kdm ...") so maybe that alone can help shed some lights.

After switch back to gdm, I have 3 successes in a roll, so there are definitely some kdm-specific issues here.

FWIW, the quality of David Wilkin's posts were just not good. One can only tell him to keep it short so many times for a whole week (since comment 112, 116, ...) before one loses patience.

Comment 172 Brent R Brian 2013-11-06 11:59:37 UTC
Plymouth changes in update-test have been applied to older machine.

No FAIL on plymouth wait during boot ... but ... the boot sequence may have been changed a bit (first boot after applying the update-test stuff).

Will keep checking

Comment 173 Mohammed Arafa 2013-11-07 20:54:03 UTC
i was just hit by this bug. not the first time ... the actual first time, X started up on its own and i didnt know the reason.

this time, X didnt want to start up at all. so when i foudn this bug and it said delete /var/log/journal/* .. i very excitedly and happily deleted the horrible file. not only did it stop X from starting but it was 845mb in size.

deleting the !@#$!@$ directory worked. 

a hearty thank you all for the work around. 

as a solution can a log rotate script be conjured? or something ?

Comment 174 Rex Dieter 2013-11-08 13:32:24 UTC
for folks still seeing kdm failures, I suspect you're hitting kdmrc setting:
# How long to wait for a local X-server to come up.
ServerTimeout=30

I suppose we can up that default value, if the journal slows things down enough to reliably hit that.

Comment 175 Fedora Update System 2013-11-08 16:51:18 UTC
kde-settings-19-25.fc19 has been submitted as an update for Fedora 19.
https://admin.fedoraproject.org/updates/kde-settings-19-25.fc19

Comment 176 Rex Dieter 2013-11-08 17:20:32 UTC
Sorry, the correct link is
https://admin.fedoraproject.org/updates/kde-settings-19-26.fc19

that implemented the increased ServerTimeout=60 , ServerAttempts=2 in /etc/kde/kdm/kdmrc

Comment 177 Germano Massullo 2013-11-08 17:54:53 UTC
(In reply to Rex Dieter from comment #176)
> Sorry, the correct link is
> https://admin.fedoraproject.org/updates/kde-settings-19-26.fc19
> 
> that implemented the increased ServerTimeout=60 , ServerAttempts=2 in
> /etc/kde/kdm/kdmrc

This fixed the bug on my computer

Comment 178 Germano Massullo 2013-11-08 18:14:03 UTC
So Plymouth had nothing to do with this trouble, right?

Comment 179 Rex Dieter 2013-11-08 18:19:34 UTC
not exactly right.  journal slowing things down seems to be causing bad side effects, where various services including plymouth,kdm, etc... fail due to sometimes too-short startup timeouts (for example, plymouth-quit timeout was originally 20 seconds, kdm ServerTimeout was originally 30 seconds).

Comment 180 Michal Schmidt 2013-11-08 18:22:29 UTC
We have a theory for the journald slowdown. Lennart is working on it.

Comment 181 Jean-Paul Lambrechts 2013-11-09 07:58:11 UTC
I tried this fix on two laptop computers running Fedora 19: IBM/Lenovo X31 and X61S and it did not solve the problem. Looking into the journal with the "journalctl  -b" command, the Plymouth "wait quit" error that we saw before is gone but the X server reports this error:
kdm[xxx]: X server startup timeout, terminating.
I have also two desktop computers running Fedora 19 and they both exhibit this failure to start the KDE desktop. So, my four computers - 2 laptops and 2 desktops - behave the same: the boot sequence gets stuck at the point where the KDE desktop should start and *never* goes beyond it. It is intermittent, though. The workaround (F2 console: systemctl restart kdm.service) works on all of them.
The same four computers did not have this issue with Fedora 18.

Comment 182 Germano Massullo 2013-11-09 08:20:45 UTC
(In reply to Jean-Paul Lambrechts from comment #181)
> I tried this fix 
Which one? The following one?
(In reply to Rex Dieter from comment #176)
> Sorry, the correct link is
> https://admin.fedoraproject.org/updates/kde-settings-19-26.fc19
> 
> that implemented the increased ServerTimeout=60 , ServerAttempts=2 in
> /etc/kde/kdm/kdmrc

Did you leave a karma vote about the patch, in Fedora admin website?

Comment 183 Hin-Tak Leung 2013-11-09 20:14:15 UTC
(In reply to Rex Dieter from comment #176)
> Sorry, the correct link is
> https://admin.fedoraproject.org/updates/kde-settings-19-26.fc19
...

"Anonymous Tester - 2013-11-09 11:28:33" is me, just in case somebody is counting. Switched to kdm again as I described in Comment 171, and booting to kdm now works smoothly - at least 5 times in a roll. Am back to gdm now.

I think both fixes beg the questions: what was the reason for a short timeout setting? It see better to err on the lenient side for older hardware, than to expect state-of-the-art hardware and punish poorer people with lesser hardware in
limbo.

The short-timeout causes a stuck vt; if the short-timeout (for whatever underlying reason) results in a functional plain text-login, that short-timeout would seem more reasonable, but that's not the case. Both setting decisions say: "buy the latest hardware, or you should not be using fedora".

Comment 184 Fedora Update System 2013-11-10 07:21:40 UTC
plymouth-0.8.9-3.2013.08.14.fc20 has been pushed to the Fedora 20 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 185 Fedora Update System 2013-11-11 02:31:22 UTC
plymouth-0.8.9-0.2013.03.26.1.fc19 has been pushed to the Fedora 19 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 186 Fedora Update System 2013-11-11 02:35:59 UTC
kde-settings-19-26.fc19 has been pushed to the Fedora 19 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 187 Jean-Paul Lambrechts 2013-11-11 02:48:17 UTC
The fix I was referring to in my previous comment, consists to use the new revision
of Plymouth packages available on updates-testing repository as reported in recent
commentaries of this bug.
Please look at the yum list commands below.

Here are the test results on four computers A, B, C and D
Systems are up to the latest released revision of Fedora.

Computer A: Laptop IBM/Lenovo X31 *********** Fails ***************

[jplx@jplx-X31 ~]$ inxi                                                         
 CPU~Single core Intel Pentium M (-UP-) clocked at 1300.000 Mhz 
Kernel~3.11.7-200.fc19.i686 i686 Up~22 min Mem~354.8/2016.7MB HDD~96.0GB(10.9% 
used) Procs~173 Client~Shell inxi~1.8.20                                         
[jplx@jplx-X31 ~]$ yum list plymouth
Loaded plugins: langpacks, refresh-packagekit
Installed Packages
plymouth.i686                                   0.8.9-0.2013.03.26.1.fc19       
 
                            @updates-testing 
[jplx@jplx-X31 ~]$ journalctl -b | grep kdm
Nov 09 22:24:09 jplx-X31 kdm[477]: plymouth is running
Nov 09 22:24:10 jplx-X31 kdm[477]: plymouth is active on VT 1, reusing for :0
Nov 09 22:24:10 jplx-X31 kdm[477]: plymouth should quit after server startup
Nov 09 22:24:10 jplx-X31 kdm[477]: X server startup timeout, terminating
Nov 09 22:24:12 jplx-X31 kdm[477]: X server for display :0 cannot be started, session disabled
Nov 09 22:24:12 jplx-X31 kdm[477]: Quitting Plymouth without transition
Nov 09 22:24:13 jplx-X31 kdm[477]: plymouth is NOT running

# System KDE was stuck here. Going to F2 console to invoke 'Systemctl restart kdm.service'
# Following 4 lines pasted from the journal
...
Nov 09 22:25:41 jplx-X31 login[1258]: ROOT LOGIN ON tty2
Nov 09 22:26:04 jplx-X31 systemd[1]: Stopping The KDE login manager
Nov 09 22:26:04 jplx-X31 kdm[1367]: plymouth is NOT running
Nov 09 22:26:05 jplx-X31 kdm[1374]: :0[1374]: pam_unix(kdm-np:session): session opened for user jplx by (uid=0)

# KDE starts and is now OK.

Computer B: Laptop IBM/Lenovo X60S ********* Fails ***************

[jplx@denali ~]$ inxi
CPU~Dual core Intel CPU L2400 (-MCP-) clocked at 1667.000 Mhz 
Kernel~3.11.7-200.fc19.i686 i686 Up~2 min Mem~658.0/2008.1MB HDD~516.1GB(21.4% 
used) Procs~276 Client~Shell inxi~1.8.20  
[jplx@denali ~]$ yum list plymouth
Loaded plugins: langpacks, refresh-packagekit
Installed Packages
plymouth.i686                              0.8.9-0.2013.03.26.1.fc19             
                  @updates-testing
[jplx@denali ~]$ journalctl -b | grep kdm
Nov 10 17:18:30 denali kdm[511]: plymouth is running
Nov 10 17:18:30 denali kdm[511]: plymouth is active on VT 1, reusing for :0
Nov 10 17:18:30 denali kdm[511]: plymouth should quit after server startup
Nov 10 17:18:30 denali kdm[511]: X server startup timeout, terminating
Nov 10 17:18:31 denali kdm[511]: X server for display :0 cannot be started,  session disabled
Nov 10 17:18:31 denali kdm[511]: Quitting Plymouth without transition
Nov 10 17:18:32 denali kdm[511]: plymouth is NOT running
...
# restarted kdm.service
...
Nov 10 17:19:13 denali kdm[2027]: plymouth is NOT running
Nov 10 17:19:13 denali kdm[2042]: :0[2042]: pam_unix(kdm-np:session): session opened for user jplx by (uid=0)
[jplx@denali ~]$

Computer C: Desktop ASUS Extreme Striker ********* Fails or Passes **************

# Cold start: it passes.
[jplx@jplx-lnx ~]$ inxi
CPU~Quad core Intel Core2 Quad CPU Q6600 (-MCP-) clocked at 2399.816 Mhz 
Kernel~3.11.7-200.fc19.x86_64 x86_64 Up~1 min Mem~714.8/3953.7MB 
HDD~580.1GB(10.9% used) Procs~212 Client~Shell inxi~1.8.20  
[jplx@jplx-lnx ~]$ yum list plymouth
Loaded plugins: langpacks, refresh-packagekit
Installed Packages
plymouth.x86_64                  0.8.9-0.2013.03.26.1.fc19                                              
@updates-testing
[jplx@jplx-lnx ~]$ journalctl -b | grep kdm
Nov 10 08:05:09 jplx-lnx kdm[600]: plymouth is running
Nov 10 08:05:09 jplx-lnx kdm[600]: plymouth is active on VT 1, reusing for :0
Nov 10 08:05:10 jplx-lnx kdm[600]: plymouth should quit after server startup
Nov 10 08:05:12 jplx-lnx kdm[600]: Quitting Plymouth with transition
Nov 10 08:05:12 jplx-lnx kdm[600]: Is Plymouth still running? no
Nov 10 08:05:15 jplx-lnx kdm[991]: :0[991]: pam_unix(kdm-np:session): session 
opened for user jplx by (uid=0)
[jplx@jplx-lnx ~]$

# Another cold start: it passes again.
[jplx@jplx-lnx ~]$ inxi
CPU~Quad core Intel Core2 Quad CPU Q6600 (-MCP-) clocked at 2399.816 Mhz 
Kernel~3.11.7-200.fc19.x86_64 x86_64 Up~1 min Mem~714.8/3953.7MB 
HDD~580.1GB(10.9% used) Procs~212 Client~Shell inxi~1.8.20  
[jplx@jplx-lnx ~]$ yum list plymouth
Loaded plugins: langpacks, refresh-packagekit
Installed Packages
plymouth.x86_64                  0.8.9-0.2013.03.26.1.fc19                                              
@updates-testing
[jplx@jplx-lnx ~]$ journalctl -b | grep kdm
Nov 10 16:53:20 jplx-lnx kdm[604]: plymouth is running
Nov 10 16:53:20 jplx-lnx kdm[604]: plymouth is active on VT 1, reusing for :0
Nov 10 16:53:21 jplx-lnx kdm[604]: plymouth should quit after server startup
Nov 10 16:53:23 jplx-lnx kdm[604]: Quitting Plymouth with transition
Nov 10 16:53:24 jplx-lnx kdm[604]: Is Plymouth still running? no
Nov 10 16:53:24 jplx-lnx kdm[1247]: :0[1247]: pam_unix(kdm-np:session): session 
opened for user jplx by (uid=0)
[jplx@jplx-lnx ~]$

# Now, after a warm start, it is going to fail.
[jplx@jplx-lnx ~]$ journalctl -b | grep kdm
Nov 10 16:57:01 jplx-lnx kdm[599]: plymouth is running
Nov 10 16:57:01 jplx-lnx kdm[599]: plymouth is active on VT 1, reusing for :0
Nov 10 16:57:02 jplx-lnx kdm[599]: plymouth should quit after server startup
Nov 10 16:57:02 jplx-lnx kdm[599]: X server startup timeout, terminating
Nov 10 16:57:06 jplx-lnx kdm[599]: X server for display :0 cannot be started, session disabled
Nov 10 16:57:06 jplx-lnx kdm[599]: Quitting Plymouth without transition
Nov 10 16:57:06 jplx-lnx kdm[599]: plymouth is NOT running

# After going to the F2 consoles, logging in as root, invoking 'init 1', and 
# hitting CTRL-D to continue:
# ('init 1' followed by 'CTRL-D' appears to be a cleaner work-around than 'systemctl restart kdm.service'
# because it restarts more services. So, there is less chance of "bad leftover")
Nov 10 16:58:10 jplx-lnx kdm[1661]: plymouth is running
Nov 10 16:58:10 jplx-lnx kdm[1661]: plymouth is active on VT 1, reusing for :0
Nov 10 16:58:10 jplx-lnx kdm[1661]: plymouth should quit after server startup
Nov 10 16:58:11 jplx-lnx kdm[1661]: Quitting Plymouth with transition
Nov 10 16:58:11 jplx-lnx kdm[1661]: Is Plymouth still running? no
Nov 10 16:58:11 jplx-lnx kdm[1892]: :0[1892]: pam_unix(kdm-np:session): session 
opened for user jplx by (uid=0)

Computer D: Desktop ASROCK G31M-GS R2.0 ********** Passes **************

[jplx@jplx-linux ~]$ inxi
CPU~Quad core Intel Core2 Quad CPU Q8400 (-MCP-) clocked at 2670.341 Mhz 
Kernel~3.11.7-200.fc19.x86_64 x86_64 Up~1:35 Mem~891.4/3945.4MB 
HDD~1062.3GB(10.6% used) Procs~239 Client~Shell inxi~1.8.20
[jplx@jplx-linux ~]$ yum list plymouth
Loaded plugins: langpacks, refresh-packagekit
Installed Packages
plymouth.x86_64                 0.8.9-0.2013.03.26.1.fc19                 
@updates-testing
[jplx@jplx-linux ~]$ journalctl -b | grep kdm
Nov 09 22:26:45 jplx-linux kdm[604]: plymouth is running
Nov 09 22:26:45 jplx-linux kdm[604]: plymouth is active on VT 1, reusing for :0
Nov 09 22:26:46 jplx-linux kdm[604]: plymouth should quit after server startup
Nov 09 22:26:46 jplx-linux kdm[604]: Quitting Plymouth with transition
Nov 09 22:26:47 jplx-linux kdm[604]: Is Plymouth still running? no
Nov 09 22:26:48 jplx-linux kdm[973]: :0[973]: pam_unix(kdm-np:session): session 
opened for user jplx by (uid=0)
# Passed

# Another test: it passes.
[jplx@jplx-linux ~]$ inxi
CPU~Quad core Intel Core2 Quad CPU Q8400 (-MCP-) clocked at 2670.290 Mhz 
Kernel~3.11.7-200.fc19.x86_64 x86_64 Up~3 min Mem~840.4/3945.4MB 
HDD~1062.3GB(10.6% used) Procs~248 Client~Shell inxi~1.8.20  
[jplx@jplx-linux ~]$ yum list plymouth
Loaded plugins: langpacks, refresh-packagekit
Installed Packages
plymouth.x86_64                 0.8.9-0.2013.03.26.1.fc19                 
@updates-testing
[jplx@jplx-linux ~]$ journalctl -b | grep kdm
Nov 10 17:31:23 jplx-linux kdm[611]: plymouth is running
Nov 10 17:31:23 jplx-linux kdm[611]: plymouth is active on VT 1, reusing for :0
Nov 10 17:31:23 jplx-linux kdm[611]: plymouth should quit after server startup
Nov 10 17:31:25 jplx-linux kdm[611]: Quitting Plymouth with transition
Nov 10 17:31:25 jplx-linux kdm[611]: Is Plymouth still running? no
Nov 10 17:31:25 jplx-linux kdm[947]: :0[947]: pam_unix(kdm-np:session): session opened for user jplx by (uid=0)
[jplx@jplx-linux ~]$

# A warn restart passed as well.


Tentatitve conclusions:
1 - the "plymouth.quit.wait.service error is gone.
2 - KDE still still fails to start normally on some computers and some restarts
3 - kdm messages from the journal are different from whether it fails or passes:
    Passing case: 
                      plymouth should quit after server startup
                      Quitting Plymouth with transition
                      Is Plymouth still running? no
    Failing case:
                      plymouth should quit after server startup
                      X server startup timeout, terminating
                      X server for display :0 cannot be started, session disabled
                      Quitting Plymouth without transition
                      plymouth is NOT running
    Plymouth appears to be involved somehow in the kdm non-start failure because the message in the journal is different.

Comment 188 Fahad Alduraibi 2013-11-11 14:20:35 UTC
I was happy to see the updates packages, however, after i reboot to test (warm reboot) i got similar if not same results as the comment above "by Jean-Paul Lambrechts".

No plymouth failed messages this time, and it even reported that KDE login manager was started but I didn't see the login window, and I had to log in to one of the consoles and kill KDM (which I used to do before the new patch) and that got it to start.

[root@myComp ~]# journalctl -b | grep kdm
Nov 11 08:58:08 myComp kdm[590]: plymouth is running
Nov 11 08:58:09 myComp kdm[590]: plymouth is active on VT 1, reusing for :0
Nov 11 08:58:10 myComp kdm[590]: plymouth should quit after server startup
Nov 11 08:58:10 myComp kdm[590]: X server startup timeout, terminating
Nov 11 08:58:10 myComp kdm[590]: X server for display :0 cannot be started, session disabled
Nov 11 08:58:10 myComp kdm[590]: Quitting Plymouth without transition
Nov 11 08:58:10 myComp kdm[590]: plymouth is NOT running

Comment 189 Rex Dieter 2013-11-11 14:35:48 UTC
For those of you still seeing kdm startup failures, and error:
kdm: X server startup timeout, terminating

It means X took longer to start than the value of ServerTimeout= key in your /etc/kde/kdm/kdmrc

The prior default was 30 seconds (and ServerAttempts=1), but the latest kde-settings update increased these to 
ServerAttempts=2
ServerTimeout=60
(unless you had edited kdmrc previously, then your changes will got be overwritten).

My point in mentioning all this, is you may need to increase the values of ServerTimeout (and possibly ServerAttempts) manually.


Alternatively, you can explore options to speed to your boot process, including:
* trim systemd journal size, either by manually editing /etc/systemd/journal.conf (or deleting old journal, rm -rf /var/log/journal/* )
* wait for subsequent systemd updates that address journal performance degredations (see comment #180 and bug #1006386 )

Comment 190 Hin-Tak Leung 2013-11-11 15:11:43 UTC
(In reply to Rex Dieter from comment #189)
...
> * wait for subsequent systemd updates that address journal performance
> degredations (see comment #180 and bug #1006386 )

Looking at bug #1006386 in the present context, it also implies that much of the journal log is meaningless - and rubbish to post/keep. *Assumes* that it still logs things in cronological order (which may or may not be true), the time stamps just don't seem to reflect actual time as events happen, especially regarding kdm. The timestamps of all the posted logs don't show intervals more than 20s, but 60s+ setting is needed.

With gdm, it was similar - the logs shows 4-6s in my case, but the old setting of 20s was insufficient.

So I suggest you file another systemd bug: the time stamps of the journal log entries are meaningless. (One might also doubt whether the log keeps thing in cronological order at all. If even that is doubtful, looking at the logs at all for many purposes is a waste of time).

Comment 191 Kevin Kofler 2013-11-11 15:52:04 UTC
Comment on attachment 817386 [details]
Proposed patch to the kde-workspace-4.11.1-kdm-logind-multiseat.patch

@David Wilkins: Sorry, you misunderstood what a "const char *" parameter to a function means (and none of the followups corrected that particular misconception). It only means the string must not change WITHIN that function. It does not mean the string must be constant throughout the whole program's execution.

Comment 192 Jean-Paul Lambrechts 2013-11-11 19:44:25 UTC
Following up on comment #190 from Rex Dieter.
I made the suggested optional change on my four computers listed in my comment #187:
change the Storage=... line in journald.conf to be "Storage=volatile"
I also deleted the old journal with the "rm -rf ..."command to be clean. I verified that the journal is not being saved within the /var/log/journal directory anymore but is in memory instead.
My four computers have now been booting up multiple times without failure. Thank you for your help!

Comment 193 David Wilkins 2013-11-12 09:45:37 UTC
Following up on Comment 189 from Rex Dieter.

I was on the point of making a final post to the effect that, in the kdm case at least, the problem must lie in the server timeout, since looking more carefully at the code, nothing signicant appeared to be happening between the time the X server is started and the timeout reset, and the time that it is checked.  The current flurry of comments has made my intended comment redundant.

However consider this.  The server timeout is reset in function startServerOnce() of kdm/backend/server.c immediately after the X process is forked:

   serverTimeout = d->serverTimeout + now;

If this reset the server timeout to 30 seconds after the current time (or 60 seconds after the current time), then surely the X server would be safe and the subsequent timeout check would not lead to its being terminated within the next second or two (which is what happens).

I had concluded that, for some reason d->serverTimeout was not being set to a value of 30 seconds, or anything like 30 seconds.

But the other possibility, only just occurred to me, is that the value of 'now' is too out-of-date.

   ---

If you look at the main loop of kdm, in kdm/backend/dm.c, you will see that function updateNow() is called to set the value of 'now'.  Then there is stuff (#ifdef XDMCP) that calls some function called anyListenSockets().  Then there is the stuff to terminate plymouth.  And then the X server is started.

If the function updateNow() were called after plymouth had shut down, or at some other relevant time, so that the value of 'now' after the X server had started was a more accurate reflection of the then current time, then the server timeout reset about would have the presumably intended effect.

---

I am leaving this comment here on the offchance that it might be relevant.  I wish to disengage totally from this matter.

Comment 194 Fahad Alduraibi 2013-11-12 23:06:43 UTC
This is a follow up to my comment# 188.

I was doing some changes to the system (but I don't think they are related to this issue) and after few reboots the problem was gone and now KDM always starts with no issues and that was with a server timeout of 30 (since I have changed some setting in the Login Screen kdmrc was not replaced by rpm) so I applied the new changes in kdmrc as Rex suggested just in case.

Thank you all for solving this problem since it was annoying and also embarrassing.

Comment 195 Sam Varshavchik 2013-11-16 15:09:05 UTC
The plymouth update does not fix this bug for me.

After updating to 0.8.9-0.2013.03.26.1.fc19.x86_64, the next reboot failed. And every reboot after that.

For me, what sort of worked was someone else's suggestion of adding Before=graphical.target to plymouth-quit.service

This trick, apparently, is no longer effective.

After booting wedges, logging into the machine over the network still shows this:

[root@monster ~]# systemctl list-jobs
JOB UNIT                                 TYPE  STATE  
 90 graphical.target                     start waiting
 91 multi-user.target                    start waiting
168 systemd-update-utmp-runlevel.service start waiting
221 plymouth-quit-wait.service           start running
230 systemd-readahead-done.timer         start waiting

Running /usr/bin/plymouth --quit gets rid of all the jobs, but the system console is still frozen, and gdm does not come up.

I don't really have the willpower anymore to deal with this bug, and search for other workarounds, so to be able to boot, I had to get rid of plymouth.

The sad thing is that it doesn't really look like a plymouth bug. I never had any problems with plymouth, on this particular server, until Fedora became infested with systemd. Unfortunately, "yum remove systemd" is not an option.

Comment 196 Gilboa Davara 2013-11-19 19:30:20 UTC
Sadly enough I can confirm Sam's findings on multiple machine w/ 0.8.9-0.2013.03.26.1.fc19.x86_64.
On a fast workstation, running init 3 from KDE konsole (which kills X) followed by a quick init 5 from a text console will leave the machine hanging 9/10 attempts.

If anyone else can confirm the above, we should reopen this bug.

- Gilboa

Comment 197 Gilboa Davara 2013-11-19 19:41:48 UTC
I should add that only way, in my case to get kdm running (as above, plymouth --quit has no effect) is to init 1, and init 5 back into kdm.


- Gilboa

Comment 198 Rex Dieter 2013-11-19 20:01:10 UTC
Gilboa, comment #196 does not seem relevant to *this* bug to me (and now that we're approaching 200 comments, chances of this being productive anymore is slim).

If you think there's any remaining bug(s), I would urge a new one be filed.

Comment 199 Germano Massullo 2013-11-19 22:16:43 UTC
I experienced a few times this bug, but a reboot fixed the problem

Comment 200 Mohammed Arafa 2013-11-20 04:05:53 UTC
how do i reopen this bug?

i rebooted my machine and found X wasnt working. so i rebooted another 2 times before deleting the /var/log/journal/* directory and then rebooted.

it worked!
[root@desktop ~]# du -sh /var/log/journal/
74M     /var/log/journal/
[root@desktop ~]# du -sh /var/log/journal/4ca6a6d926bda3c7244424cfcecb8f59/*
8.2M    /var/log/journal/4ca6a6d926bda3c7244424cfcecb8f59/system~
55M     /var/log/journal/4ca6a6d926bda3c7244424cfcecb8f59/system.journal
3.8M    /var/log/journal/4ca6a6d926bda3c7244424cfcecb8f59/user-500.journal
3.6M    /var/log/journal/4ca6a6d926bda3c7244424cfcecb8f59/user-501.journal
3.6M    /var/log/journal/4ca6a6d926bda3c7244424cfcecb8f59/user-505.journal
[root@desktop ~]# rm -rf /var/log/journal/4ca6a6d926bda3c7244424cfcecb8f59/; reboot 
Connection to desktop closed by remote host.
Connection to desktop closed.


looks like that directory needs some logrotate love. when is lennart gonna come up with a fix?

also just to put the issue to rest:
rpm -qi kde-settings
Name        : kde-settings
Version     : 19
Release     : 26.fc19
Architecture: noarch
Install Date: Mon 11 Nov 2013 03:04:04 AM EST
Group       : Unspecified
Size        : 81991
License     : MIT
Signature   : RSA/SHA256, Fri 08 Nov 2013 01:50:14 PM EST, Key ID 07477e65fb4b18e6
Source RPM  : kde-settings-19-26.fc19.src.rpm
Build Date  : Fri 08 Nov 2013 10:09:50 AM EST
Build Host  : buildvm-26.phx2.fedoraproject.org
Relocations : (not relocatable)
Packager    : Fedora Project
Vendor      : Fedora Project
URL         : http://fedorahosted.org/kde-settings
Summary     : Config files for kde
Description :
Config files for kde.

Comment 201 Fedora Update System 2013-11-20 04:54:54 UTC
systemd-208-6.fc20 has been submitted as an update for Fedora 20.
https://admin.fedoraproject.org/updates/systemd-208-6.fc20

Comment 202 Germano Massullo 2013-11-20 08:47:13 UTC
(In reply to Mohammed Arafa from comment #200)
> how do i reopen this bug?
You have to click on "clone this bug" button

Comment 203 Kayvan Sylvan 2014-03-16 05:28:22 UTC
I still saw this problem with Fedora 20:

[ksylvan@ksylvan-t420 ~]$ rpm -qi plymouth
Name        : plymouth
Version     : 0.8.9
Release     : 3.2013.08.14.fc20
Architecture: x86_64
Install Date: Tue 14 Jan 2014 04:30:16 PM PST

The same workaround fixed it (mv /var/log/journal /var/log/journal.org)

Comment 204 M Frost 2014-03-16 15:52:17 UTC
See bug 1073128 which reports what I believe is the same problem.

Comment 205 Paul Wouters 2016-02-22 00:21:20 UTC
I still see this problem on a new fedora 23 install.

Moving the /var/log/journal aside caused a bunch more services to start.
plymouth had to be removed from the system or else the boot would hang.
the rescue mode also stared plymouth and so also failed unless provided with additional arguments "rw 1"

However, gdm is still not showing the greeter screen

Comment 206 Jan Kurik 2018-05-31 09:10:26 UTC
This bug is currently reported against a Fedora version which is already unsuported.
I am changing the version to '27', the latest supported release.

Please check whether this bug is still an issue on the '27' release.
If you find this bug not being applicable on this release, please close it.

Comment 207 Ben Cotton 2018-11-27 14:51:30 UTC
This message is a reminder that Fedora 27 is nearing its end of life.
On 2018-Nov-30  Fedora will stop maintaining and issuing updates for
Fedora 27. It is Fedora's policy to close all bug reports from releases
that are no longer maintained. At that time this bug will be closed as
EOL if it remains open with a Fedora  'version' of '27'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 27 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 208 Ben Cotton 2018-11-30 22:19:46 UTC
Fedora 27 changed to end-of-life (EOL) status on 2018-11-30. Fedora 27 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.