Created attachment 1448595 [details] journal Description of problem: Long startups I think due to cockpit.socket waiting for a network connection but it appears to hold up the entire boot process for reasons I can't figure out. Version-Release number of selected component (if applicable): cockpit-169-1.fc28.x86_64 How reproducible: Every boot Steps to Reproduce: 1. Boot 2. 3. Actual results: [ 25.109787] f28s.local systemd[1]: Reached target Timers. [ 115.159746] f28s.local systemd[1]: cockpit.socket: Starting timed out. Stopping. [ 115.826318] f28s.local systemd[1]: cockpit.socket: Failed with result 'timeout'. [ 115.827486] f28s.local systemd[1]: Failed to listen on Cockpit Web Service Socket. Expected results: It shouldn't hold up boot even if the network isn't up yet. Additional info:
OK so it's still failed even after the network is up, and while I can ssh to the host, it's not possible to connect to cockpit at ip:9090. So near as I can tell something broken but the cockpit messages to the journal don't have much information to go on.
Looking at the the history in the journal, the last time this was working was 2.Jun with cockpit-168-1.fc28.x86_64.rpm and there are no cockpit.socket errors going back farther. The first failure happens right after reboot following updating on 3.Jun, which includes cockpit-169. Seems like a regression, but no complaints in bodhi for this update.
The main plausible thing that could hang are the new ExecStartPost= commands for updating motd. Can you please give me the output of sudo systemctl status cockpit.socket ls -l /run/cockpit/motd ? Thanks!
[chris@f28s ~]$ sudo systemctl status cockpit.socket ● cockpit.socket - Cockpit Web Service Socket Loaded: loaded (/usr/lib/systemd/system/cockpit.socket; enabled; vendor preset: enabled) Active: failed (Result: timeout) since Wed 2018-06-06 22:09:58 MDT; 11h ago Docs: man:cockpit-ws(8) Listen: [::]:9090 (Stream) Process: 641 ExecStopPost=/bin/ln -snf /usr/share/cockpit/motd/inactive.motd /run/cockpit/motd (code=exited, status=0/SUCCESS) Tasks: 0 (limit: 4519) Memory: 1.0M CGroup: /system.slice/cockpit.socket Jun 06 22:08:27 f28s.local systemd[1]: Starting Cockpit Web Service Socket. Jun 06 22:09:57 f28s.local systemd[1]: cockpit.socket: Starting timed out. Stopping. Jun 06 22:09:58 f28s.local systemd[1]: cockpit.socket: Failed with result 'timeout'. Jun 06 22:09:58 f28s.local systemd[1]: Failed to listen on Cockpit Web Service Socket. [chris@f28s ~]$ ls -l /run/cockpit/motd lrwxrwxrwx. 1 root root 37 Jun 6 22:09 /run/cockpit/motd -> /usr/share/cockpit/motd/inactive.motd [chris@f28s ~]$
Thanks. Unfortunately that doesn't tell me that much yet. Can you please try this: sudo systemctl stop cockpit.socket sudo systemctl reset-failed cockpit.socket sudo systemctl start cockpit.socket Does that finish immediately, or hang? If it hangs, please run this command in a different terminal (during the 90s hang of the previous command) and copy&paste the output: sudo systemctl status cockpit.socket The most likely thing is that update-motd hangs during early boot. But let's see if it also hangs for you in a running system: sh -ex /usr/share/cockpit/motd/update-motd Does that finish quickly? It will fail in the end with a "Permission denied", unless you run it as root, but that's expected. If it hangs, what does it hang on? (should be the "ip" or "hostname" command).
Finishes immediately and now it's reported as running. [chris@f28s ~]$ sudo systemctl status cockpit.socket ● cockpit.socket - Cockpit Web Service Socket Loaded: loaded (/usr/lib/systemd/system/cockpit.socket; enabled; vendor preset: enabled) Active: active (listening) since Thu 2018-06-07 12:49:45 MDT; 4s ago Docs: man:cockpit-ws(8) Listen: [::]:9090 (Stream) Process: 641 ExecStopPost=/bin/ln -snf /usr/share/cockpit/motd/inactive.motd /run/cockpit/motd (code=exited, status=0/SUCCESS) Process: 2282 ExecStartPost=/bin/ln -snf active.motd /run/cockpit/motd (code=exited, status=0/SUCCESS) Process: 2267 ExecStartPost=/usr/share/cockpit/motd/update-motd (code=exited, status=0/SUCCESS) Tasks: 0 (limit: 4519) Memory: 796.0K CGroup: /system.slice/cockpit.socket Jun 07 12:49:44 f28s.local systemd[1]: Starting Cockpit Web Service Socket. Jun 07 12:49:45 f28s.local systemd[1]: Listening on Cockpit Web Service Socket. [chris@f28s ~]$ sh -ex /usr/share/cockpit/motd/update-motd ++ systemctl show --property Listen cockpit.socket ++ sed -E '$!d;$s/.*[^0-9]([0-9]+).*/\1/;' + port=9090 + port=9090 ++ hostname -f + hostname=f28s.local ++ ip -o route get 255.0 ++ sed -e 's/.*src \([^ ]*\) .*/\1/' + ip=192.168.1.115 + protocol=https + hostname_url=https://f28s.local:9090/ + ip_url=' or https://192.168.1.115:9090/' + printf 'Web console: %s%s\n\n' https://f28s.local:9090/ ' or https://192.168.1.115:9090/' /usr/share/cockpit/motd/update-motd: line 24: /run/cockpit/active.motd: Permission denied [chris@f28s ~]$
OK, so this only happens during early boot. Unfortunately systemctl status doesn't show the ExecStart= commands, as the unit is already stopped again. Perhaps like this: Can you please reboot, ensure it hangs, and give me the output of systemctl show cockpit.socket | grep Exec If that still isn't useful, can you please do sudo journalctl -b > /tmp/journal.txt and attach that here?
Journal is already attached for one of the failed boots but it doesn't reveal why it failed.
OK rebooted and I get this after it fails and finally comes back up: [chris@f28s ~]$ systemctl show cockpit.socket | grep Exec ExecStartPost={ path=/usr/share/cockpit/motd/update-motd ; argv[]=/usr/share/cockpit/motd/update-motd ; ignore_errors=yes ; start_time=[Thu 2018-06-07 13:29:48 MDT] ; stop_time=[n/a] ; pid=639 ; code=(null) ; status=0/0 } ExecStartPost={ path=/bin/ln ; argv[]=/bin/ln -snf active.motd /run/cockpit/motd ; ignore_errors=yes ; start_time=[n/a] ; stop_time=[n/a] ; pid=0 ; code=(null) ; status=0/0 } ExecStopPost={ path=/bin/ln ; argv[]=/bin/ln -snf /usr/share/cockpit/motd/inactive.motd /run/cockpit/motd ; ignore_errors=yes ; start_time=[Thu 2018-06-07 13:31:18 MDT] ; stop_time=[Thu 2018-06-07 13:31:18 MDT] ; pid=644 ; code=exited ; status=0 } MemoryDenyWriteExecute=no [chris@f28s ~]$ Lemme know if booting with systemd.log_level=debug will be revealing; often it's too verbose.
Created attachment 1448874 [details] journal, systemd debug
Martin it seems that the cockpit.socket ExecStart= commands are run in the wrong place/time during boot, is that correct? Previously cockpit.socket would be started before networking was enabled. This is expected and desired behavior for socket activated sockets. However by running the ExecStart= motd and issue commands at that point, they do not have the necessary host/ip information to do their task. Do you think it makes sense to move those commands to be triggered by something else on the system? Perhaps another unit?
Yes, we already have another unit for that; cockpit-motd.service is Wanted by cockpit.socket, for exactly that case: it runs after network-online.target and updates motd with the correct host name/IPs. It just also gets called in cockpit.socket for the case that network-online.target never comes up or is disabled. We could simplify this by not calling update-motd in cockpit.socket, only in cockpit-motd.service. Then motd would not show anything about Cockpit between booting and the network coming up, as /run/cockpit/active.motd would not exist yet. But if that causes trouble during boot which we cannot figure out, we might have to live with that. FWIW, we don't see that boot trouble in our integration tests, or on Allison's or my machine, so I cannot reproduce this. Indeed the journal looks fine: [ 11.004303] f28s.local systemd[1]: cockpit.socket: About to execute: /usr/share/cockpit/motd/update-motd [ 11.007765] f28s.local systemd[647]: cockpit.socket: Executing: /usr/share/cockpit/motd/update-motd [ 11.030164] f28s.local update-motd[647]: Sent message type=method_call sender=n/a destination=org.freedesktop.systemd1 path=/org/freedesktop/systemd1/unit/cockpit_2esocket interface=org.freedesktop.DBus.Properties member=GetAll cookie=1 reply_cookie=0 signature=s error-name=n/a error-message=n/a[ 11.036307] f28s.local update-motd[647]: Got message type=method_return sender=org.freedesktop.systemd1 destination=n/a path=n/a interface=n/a member=n/a cookie=1 reply_cookie=1 signature=a{sv} error-name=n/a error-message=n/a [ 11.038288] f28s.local update-motd[647]: Bus n/a: changing state RUNNING → CLOSED So there is no significant runtime of update-motd until after the "systemctl show" command - it's not waiting for D-Bus coming up or anything (it just uses systemd's internal bus). The next relevant message is just [ 101.250242] f28s.local systemd[1]: cockpit.socket: Starting timed out. Stopping. Which means that update-motd didn't terminate in 90s. I'd still like to understand what is going wrong here, before we blindly rip this out again. Chris, can you please add a "set -x" to /usr/share/cockpit/motd/update-motd (second line), reboot, and attach the journal again? Thanks!
@Allison: Possible fallback is to just statically echo "Web Console: https://localhost:9090" into active.motd until the network is up? Then at least we show something useful.
Looking at strace, it seems that "hostname -f" does a DNS lookup. That's my prime suspect.
Can you please test this fix: https://github.com/cockpit-project/cockpit/pull/9346 ?
set -x [ 19.241866] f28s.local systemd[1]: Listening on D-Bus System Message Bus Socket. [ 19.243270] f28s.local update-motd[657]: ++ systemctl show --property Listen cockpit.socket [ 19.244398] f28s.local update-motd[657]: ++ sed -E '$!d;$s/.*[^0-9]([0-9]+).*/\1/;' [ 19.270643] f28s.local update-motd[657]: + port=9090 [ 19.270929] f28s.local update-motd[657]: + port=9090 [ 19.272558] f28s.local update-motd[657]: ++ hostname -f [ 23.469100] f28s.local audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-rfkill comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' [ 109.232402] f28s.local systemd[1]: cockpit.socket: Starting timed out. Stopping. [ 109.538049] f28s.local systemd[1]: cockpit.socket: Failed with result 'timeout'. [ 110.494424] f28s.local audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=smartd comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' [ 110.494907] f28s.local systemd[1]: Failed to listen on Cockpit Web Service Socket.
(In reply to Allison Karlitskaya from comment #15) > Can you please test this fix: > > https://github.com/cockpit-project/cockpit/pull/9346 > > ? That appears to fix the problem.
Thanks for confirming!
Fix landed here: https://bodhi.fedoraproject.org/updates/cockpit-170-1.fc28 Testing appreciated!
This message is a reminder that Fedora 28 is nearing its end of life. On 2019-May-28 Fedora will stop maintaining and issuing updates for Fedora 28. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a Fedora 'version' of '28'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 28 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete.
This bodhi update landed long ago.