Bug 1588296 - cockpit.socket fails/timeout, lengthens startup
Summary: cockpit.socket fails/timeout, lengthens startup
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: cockpit
Version: 28
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
Assignee: Allison Karlitskaya
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-06-07 04:20 UTC by Chris Murphy
Modified: 2019-05-03 06:01 UTC (History)
6 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2019-05-03 06:01:42 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
journal (159.70 KB, text/plain)
2018-06-07 04:20 UTC, Chris Murphy
no flags Details
journal, systemd debug (1.33 MB, text/plain)
2018-06-07 21:02 UTC, Chris Murphy
no flags Details

Description Chris Murphy 2018-06-07 04:20:55 UTC
Created attachment 1448595 [details]
journal

Description of problem:

Long startups I think due to cockpit.socket waiting for a network connection but it appears to hold up the entire boot process for reasons I can't figure out.

Version-Release number of selected component (if applicable):
cockpit-169-1.fc28.x86_64

How reproducible:
Every boot

Steps to Reproduce:
1. Boot
2.
3.

Actual results:

[   25.109787] f28s.local systemd[1]: Reached target Timers.
[  115.159746] f28s.local systemd[1]: cockpit.socket: Starting timed out. Stopping.
[  115.826318] f28s.local systemd[1]: cockpit.socket: Failed with result 'timeout'.
[  115.827486] f28s.local systemd[1]: Failed to listen on Cockpit Web Service Socket.


Expected results:

It shouldn't hold up boot even if the network isn't up yet.


Additional info:

Comment 1 Chris Murphy 2018-06-07 04:26:41 UTC
OK so it's still failed even after the network is up, and while I can ssh to the host, it's not possible to connect to cockpit at ip:9090. So near as I can tell something broken but the cockpit messages to the journal don't have much information to go on.

Comment 2 Chris Murphy 2018-06-07 04:35:13 UTC
Looking at the the history in the journal, the last time this was working was 2.Jun with cockpit-168-1.fc28.x86_64.rpm and there are no cockpit.socket errors going back farther. The first failure happens right after reboot following updating on 3.Jun, which includes cockpit-169.

Seems like a regression, but no complaints in bodhi for this update.

Comment 3 Martin Pitt 2018-06-07 08:46:46 UTC
The main plausible thing that could hang are the new ExecStartPost= commands for updating motd. Can you please give me the output of

   sudo systemctl status cockpit.socket
   ls -l /run/cockpit/motd

? Thanks!

Comment 4 Chris Murphy 2018-06-07 15:35:14 UTC
[chris@f28s ~]$ sudo systemctl status cockpit.socket
● cockpit.socket - Cockpit Web Service Socket
   Loaded: loaded (/usr/lib/systemd/system/cockpit.socket; enabled; vendor preset: enabled)
   Active: failed (Result: timeout) since Wed 2018-06-06 22:09:58 MDT; 11h ago
     Docs: man:cockpit-ws(8)
   Listen: [::]:9090 (Stream)
  Process: 641 ExecStopPost=/bin/ln -snf /usr/share/cockpit/motd/inactive.motd /run/cockpit/motd (code=exited, status=0/SUCCESS)
    Tasks: 0 (limit: 4519)
   Memory: 1.0M
   CGroup: /system.slice/cockpit.socket

Jun 06 22:08:27 f28s.local systemd[1]: Starting Cockpit Web Service Socket.
Jun 06 22:09:57 f28s.local systemd[1]: cockpit.socket: Starting timed out. Stopping.
Jun 06 22:09:58 f28s.local systemd[1]: cockpit.socket: Failed with result 'timeout'.
Jun 06 22:09:58 f28s.local systemd[1]: Failed to listen on Cockpit Web Service Socket.
[chris@f28s ~]$ ls -l /run/cockpit/motd
lrwxrwxrwx. 1 root root 37 Jun  6 22:09 /run/cockpit/motd -> /usr/share/cockpit/motd/inactive.motd
[chris@f28s ~]$

Comment 5 Martin Pitt 2018-06-07 15:51:09 UTC
Thanks. Unfortunately that doesn't tell me that much yet. Can you please try this:

   sudo systemctl stop cockpit.socket
   sudo systemctl reset-failed cockpit.socket
   sudo systemctl start cockpit.socket

Does that finish immediately, or hang? If it hangs, please run this command in a different terminal (during the 90s hang of the previous command) and copy&paste the output:

   sudo systemctl status cockpit.socket

The most likely thing is that update-motd hangs during early boot. But let's see if it also hangs for you in a running system:

   sh -ex /usr/share/cockpit/motd/update-motd

Does that finish quickly? It will fail in the end with a "Permission denied", unless you run it as root, but that's expected. If it hangs, what does it hang on? (should be the "ip" or "hostname" command).

Comment 6 Chris Murphy 2018-06-07 18:51:52 UTC
Finishes immediately and now it's reported as running.

[chris@f28s ~]$ sudo systemctl status cockpit.socket
● cockpit.socket - Cockpit Web Service Socket
   Loaded: loaded (/usr/lib/systemd/system/cockpit.socket; enabled; vendor preset: enabled)
   Active: active (listening) since Thu 2018-06-07 12:49:45 MDT; 4s ago
     Docs: man:cockpit-ws(8)
   Listen: [::]:9090 (Stream)
  Process: 641 ExecStopPost=/bin/ln -snf /usr/share/cockpit/motd/inactive.motd /run/cockpit/motd (code=exited, status=0/SUCCESS)
  Process: 2282 ExecStartPost=/bin/ln -snf active.motd /run/cockpit/motd (code=exited, status=0/SUCCESS)
  Process: 2267 ExecStartPost=/usr/share/cockpit/motd/update-motd (code=exited, status=0/SUCCESS)
    Tasks: 0 (limit: 4519)
   Memory: 796.0K
   CGroup: /system.slice/cockpit.socket

Jun 07 12:49:44 f28s.local systemd[1]: Starting Cockpit Web Service Socket.
Jun 07 12:49:45 f28s.local systemd[1]: Listening on Cockpit Web Service Socket.
[chris@f28s ~]$ sh -ex /usr/share/cockpit/motd/update-motd
++ systemctl show --property Listen cockpit.socket
++ sed -E '$!d;$s/.*[^0-9]([0-9]+).*/\1/;'
+ port=9090
+ port=9090
++ hostname -f
+ hostname=f28s.local
++ ip -o route get 255.0
++ sed -e 's/.*src \([^ ]*\) .*/\1/'
+ ip=192.168.1.115
+ protocol=https
+ hostname_url=https://f28s.local:9090/
+ ip_url=' or https://192.168.1.115:9090/'
+ printf 'Web console: %s%s\n\n' https://f28s.local:9090/ ' or https://192.168.1.115:9090/'
/usr/share/cockpit/motd/update-motd: line 24: /run/cockpit/active.motd: Permission denied
[chris@f28s ~]$

Comment 7 Martin Pitt 2018-06-07 19:12:56 UTC
OK, so this only happens during early boot. Unfortunately systemctl status doesn't show the ExecStart= commands, as the unit is already stopped again.

Perhaps like this: Can you please reboot, ensure it hangs, and give me the output of

    systemctl show cockpit.socket | grep Exec

If that still isn't useful, can you please do

    sudo journalctl -b > /tmp/journal.txt

and attach that here?

Comment 8 Chris Murphy 2018-06-07 19:30:49 UTC
Journal is already attached for one of the failed boots but it doesn't reveal why it failed.

Comment 9 Chris Murphy 2018-06-07 19:34:11 UTC
OK rebooted and I get this after it fails and finally comes back up:

[chris@f28s ~]$ systemctl show cockpit.socket | grep Exec
ExecStartPost={ path=/usr/share/cockpit/motd/update-motd ; argv[]=/usr/share/cockpit/motd/update-motd ; ignore_errors=yes ; start_time=[Thu 2018-06-07 13:29:48 MDT] ; stop_time=[n/a] ; pid=639 ; code=(null) ; status=0/0 }
ExecStartPost={ path=/bin/ln ; argv[]=/bin/ln -snf active.motd /run/cockpit/motd ; ignore_errors=yes ; start_time=[n/a] ; stop_time=[n/a] ; pid=0 ; code=(null) ; status=0/0 }
ExecStopPost={ path=/bin/ln ; argv[]=/bin/ln -snf /usr/share/cockpit/motd/inactive.motd /run/cockpit/motd ; ignore_errors=yes ; start_time=[Thu 2018-06-07 13:31:18 MDT] ; stop_time=[Thu 2018-06-07 13:31:18 MDT] ; pid=644 ; code=exited ; status=0 }
MemoryDenyWriteExecute=no
[chris@f28s ~]$ 

Lemme know if booting with systemd.log_level=debug will be revealing; often it's too verbose.

Comment 10 Chris Murphy 2018-06-07 21:02:19 UTC
Created attachment 1448874 [details]
journal, systemd debug

Comment 11 Stef Walter 2018-06-08 05:59:39 UTC
Martin it seems that the cockpit.socket ExecStart= commands are run in the wrong place/time during boot, is that correct?

Previously cockpit.socket would be started before networking was enabled. This is expected and desired behavior for socket activated sockets. However by running the ExecStart= motd and issue commands at that point, they do not have the necessary host/ip information to do their task.

Do you think it makes sense to move those commands to be triggered by something else on the system? Perhaps another unit?

Comment 12 Martin Pitt 2018-06-08 06:20:30 UTC
Yes, we already have another unit for that; cockpit-motd.service is Wanted by cockpit.socket, for exactly that case: it runs after network-online.target and updates motd with the correct host name/IPs. It just also gets called in cockpit.socket for the case that network-online.target never comes up or is disabled.

We could simplify this by not calling update-motd in cockpit.socket, only in cockpit-motd.service. Then motd would not show anything about Cockpit between booting and the network coming up, as /run/cockpit/active.motd would not exist yet. But if that causes trouble during boot which we cannot figure out, we might have to live with that.

FWIW, we don't see that boot trouble in our integration tests, or on Allison's or my machine, so I cannot reproduce this.

Indeed the journal looks fine:

[   11.004303] f28s.local systemd[1]: cockpit.socket: About to execute: /usr/share/cockpit/motd/update-motd
[   11.007765] f28s.local systemd[647]: cockpit.socket: Executing: /usr/share/cockpit/motd/update-motd
[   11.030164] f28s.local update-motd[647]: Sent message type=method_call sender=n/a destination=org.freedesktop.systemd1 path=/org/freedesktop/systemd1/unit/cockpit_2esocket interface=org.freedesktop.DBus.Properties member=GetAll cookie=1 reply_cookie=0 signature=s error-name=n/a error-message=n/a[   11.036307] f28s.local update-motd[647]: Got message type=method_return sender=org.freedesktop.systemd1 destination=n/a path=n/a interface=n/a member=n/a cookie=1 reply_cookie=1 signature=a{sv} error-name=n/a error-message=n/a
[   11.038288] f28s.local update-motd[647]: Bus n/a: changing state RUNNING → CLOSED

So there is no significant runtime of update-motd until after the "systemctl show" command - it's not waiting for D-Bus coming up or anything (it just uses systemd's internal bus).

The next relevant message is just 

[  101.250242] f28s.local systemd[1]: cockpit.socket: Starting timed out. Stopping.

Which means that update-motd didn't terminate in 90s.

I'd still like to understand what is going wrong here, before we blindly rip this out again.

Chris, can you  please add a "set -x" to /usr/share/cockpit/motd/update-motd (second line), reboot, and attach the journal again? Thanks!

Comment 13 Martin Pitt 2018-06-08 06:21:43 UTC
@Allison: Possible fallback is to just statically echo "Web Console: https://localhost:9090" into active.motd until the network is up? Then at least we show something useful.

Comment 14 Allison Karlitskaya 2018-06-08 08:10:59 UTC
Looking at strace, it seems that "hostname -f" does a DNS lookup.  That's my prime suspect.

Comment 15 Allison Karlitskaya 2018-06-08 11:07:44 UTC
Can you please test this fix:

  https://github.com/cockpit-project/cockpit/pull/9346

?

Comment 16 Chris Murphy 2018-06-08 17:32:40 UTC
set -x

[   19.241866] f28s.local systemd[1]: Listening on D-Bus System Message Bus Socket.
[   19.243270] f28s.local update-motd[657]: ++ systemctl show --property Listen cockpit.socket
[   19.244398] f28s.local update-motd[657]: ++ sed -E '$!d;$s/.*[^0-9]([0-9]+).*/\1/;'
[   19.270643] f28s.local update-motd[657]: + port=9090
[   19.270929] f28s.local update-motd[657]: + port=9090
[   19.272558] f28s.local update-motd[657]: ++ hostname -f
[   23.469100] f28s.local audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-rfkill comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[  109.232402] f28s.local systemd[1]: cockpit.socket: Starting timed out. Stopping.
[  109.538049] f28s.local systemd[1]: cockpit.socket: Failed with result 'timeout'.
[  110.494424] f28s.local audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=smartd comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[  110.494907] f28s.local systemd[1]: Failed to listen on Cockpit Web Service Socket.

Comment 17 Chris Murphy 2018-06-08 17:42:39 UTC
(In reply to Allison Karlitskaya from comment #15)
> Can you please test this fix:
> 
>   https://github.com/cockpit-project/cockpit/pull/9346
> 
> ?

That appears to fix the problem.

Comment 18 Martin Pitt 2018-06-10 20:38:35 UTC
Thanks for confirming!

Comment 19 Martin Pitt 2018-06-14 08:56:22 UTC
Fix landed here: https://bodhi.fedoraproject.org/updates/cockpit-170-1.fc28

Testing appreciated!

Comment 20 Ben Cotton 2019-05-02 21:35:12 UTC
This message is a reminder that Fedora 28 is nearing its end of life.
On 2019-May-28 Fedora will stop maintaining and issuing updates for
Fedora 28. It is Fedora's policy to close all bug reports from releases
that are no longer maintained. At that time this bug will be closed as
EOL if it remains open with a Fedora 'version' of '28'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 28 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 21 Martin Pitt 2019-05-03 06:01:42 UTC
This bodhi update landed long ago.


Note You need to log in before you can comment on or make changes to this bug.