Red Hat Bugzilla – Bug 1291172
systemctl restart/start sshd shows no error if start fails
Last modified: 2016-11-16 14:26:24 EST
Description of problem: systemctl restart|start sshd show no error message if restart/start fails. [root@xevws029 ~]# systemctl start sshd [root@xevws029 ~]# [root@xevws029 ~]# systemctl status sshd ● sshd.service - OpenSSH server daemon Loaded: loaded (/usr/lib/systemd/system/sshd.service; enabled; vendor preset: enabled) Active: activating (auto-restart) (Result: exit-code) since Fri 2015-12-11 10:13:31 CET; 6s ago Docs: man:sshd(8) man:sshd_config(5) Process: 38753 ExecStart=/usr/sbin/sshd -D $OPTIONS (code=exited, status=255) Main PID: 38753 (code=exited, status=255) CGroup: /system.slice/sshd.service ├─38552 sshd: root@pts/0 ├─38556 -bash └─38754 systemctl status sshd Dec 11 10:13:31 xevws029.xeop.de systemd[1]: Unit sshd.service entered failed state. Dec 11 10:13:31 xevws029.xeop.de systemd[1]: sshd.service failed. Start failed because of a wrong sshd_config. But systemctl should show if a start has failed. Version-Release number of selected component (if applicable): Latest RHEL7.2 RPM. How reproducible: Put invalid configuration directive into sshd_config, restart the server. No error message is printed on stdout. Check server status and see that the server is actually not running. Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: I think the problem is the "Type=simple" parameter which is used when the type is not explicitly defined. I think "Type=forking" would be a better match here.
(In reply to Thorsten Scherf from comment #0) > I think the problem is the "Type=simple" parameter which is used when the > type is not explicitly defined. I think "Type=forking" would be a better > match here. It probably doesn't matter if a service is forking or simple in this case. 'Type=simple' is used with '/usr/sbind/sshd -D' as it's simpler for systemd to track a main sshd process which doesn't do fork() & exec(). But we could change sshd.service to run '/usr/sbin/sshd -t' before the sshd daemon: # cat /etc/systemd/system/sshd.service.d/execstartpre.conf [Service] ExecStartPre=/usr/sbin/sshd -t $OPTIONS # echo "NonSense 1" >> /etc/ssh/sshd_config # systemctl restart sshd Job for sshd.service failed. See 'systemctl status sshd.service' and 'journalctl -xn' for details.
(In reply to Petr Lautrbach from comment #2) > It probably doesn't matter if a service is forking or simple in this case. > 'Type=simple' is used with '/usr/sbind/sshd -D' as it's simpler for systemd > to track a main sshd process which doesn't do fork() & exec(). I don't know, but setting Type=forking helps systemctl to report errors (unlike the simple one). But it does not suit the description of the daemon behavior from manual pages and changing invocation is probably not a thing we would like to do. > But we could change sshd.service to run '/usr/sbin/sshd -t' before the sshd > daemon This is actually good idea. The ExecStartPre makes the service fail hard if there is problem only in config, but it is checking the config fwice twice with every start. I was also thinking about possibility to differentiate exit status for wrong configuration and let service fail hard. If I set RestartPreventExitStatus=255 "systemctl start sshd" is still not returning any failure, but the service is in failed state (instead of activating as before). It looks for me like systemd problem. I think some insight from systemd maintainers would be useful.
With Type=simple you always have a problem that basic startup (forking new process, setting up execution environment) succeeds but start of an *actual* daemon fails for some reason. However, failure happens "down the road" after systemd transitioned service to active-running state and systemctl command already returned no error to the user at the command line. As for whether there is some bug in systemd as comment #3 suggests...Sure it might be the case. Please come up with simple reproducer which exhibits the problem. Frankly, I have a hard time understanding what is an actual problem you see and what is the behavior you expect. At any rate, I think the best solution for sshd would be to have an actual integration with systemd, i.e. make sshd Type=notify service. See man 3 sd_notify for details. Patch for this should be very small, just couple lines of code and it can be easily maintained downstream in case upstream doesn't care for this.
Michal, thanks for insight about possibilities. The notify type should work for sure. I can try to implement some patch for openssh just to give it a try. But to current state of systemd, I tested the reproducers once more on current RHEL7.2 and here are the results: Forking which works for me can be simply put together by modifying [Service] section of sshd.service file: Type=forking PIDFile=/var/run/sshd.pid ExecStart=/usr/sbin/sshd $OPTIONS Issuing start/restart works just fine if the config is ok. If not, I am getting relevant error: # systemctl start sshd Job for sshd.service failed because the control process exited with error code. See "systemctl status sshd.service" and "journalctl -xe" for details. > However, failure happens "down the road" after systemd transitioned service to active-running state and systemctl command already returned no error to the user at the command line. I don't think this is the case. As mentioned in the original description, systemd knows about the exit status code (code=exited, status=255), but does not report it as a failure (left in activating state for auto-restart). I thought the auto-restart will be the root of problems, but getting rid of it didn't help either (from the original configuration): #Restart=on-failure #RestartSec=42s # systemctl daemon-reload # systemctl restart sshd # systemctl status sshd [..] Active: failed (Result: exit-code) since Thu 2015-12-17 10:53:11 CET; 1s ago Service is now reported as failed, but the error is not printed during start/restart (but we need that restart).
See also https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=778913 and https://lists.debian.org/debian-ssh/2015/12/msg00072.html
My tests confirm that indeed changing the unit file to 'forking' and then no longer add '-D' option, work as expected. [Unit] Description=OpenSSH server daemon Documentation=man:sshd(8) man:sshd_config(5) After=network.target sshd-keygen.service Wants=sshd-keygen.service [Service] Type=forking EnvironmentFile=/etc/sysconfig/sshd ExecStart=/usr/sbin/sshd $OPTIONS ExecReload=/bin/kill -HUP $MAINPID KillMode=process Restart=on-failure RestartSec=42s [Install] WantedBy=multi-user.target
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2016-2588.html