RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1398360 - systemctl restart/start sshd shows no error if start fails
Summary: systemctl restart/start sshd shows no error if start fails
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: openssh
Version: 7.3
Hardware: All
OS: Linux
unspecified
medium
Target Milestone: rc
: ---
Assignee: Jakub Jelen
QA Contact: BaseOS QE Security Team
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-11-24 14:49 UTC by Mario Trangoni
Modified: 2024-11-17 23:45 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1291172
Environment:
Last Closed: 2017-03-28 13:08:59 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Debian BTS 778913 0 None None None 2016-11-24 14:49:16 UTC

Description Mario Trangoni 2016-11-24 14:49:17 UTC
Description of problem:
1. Because of the 'RestartSec=42s' systemd setting, sshd.service stays always in status 'activating (auto-restart)' instead of the expected 'failed'.

How reproducible:
Put invalid configuration directive into sshd_config, restart the server.

Actual results:
sshd.service shows 
   Active: activating (auto-restart) (Result: exit-code)

Expected results:
sshd.service shows
   Active: failed (Result: exit-code)

Additional info:

I think RestartPreventExitStatus=255 would be an alternative, or the debian notify implementation.  https://lists.debian.org/debian-ssh/2015/05/msg00017.html


+++ This bug was initially created as a clone of Bug #1291172 +++

Description of problem:
systemctl restart|start sshd show no error message if restart/start fails. 


[root@xevws029 ~]# systemctl start sshd
[root@xevws029 ~]# 
[root@xevws029 ~]# systemctl status sshd
● sshd.service - OpenSSH server daemon
   Loaded: loaded (/usr/lib/systemd/system/sshd.service; enabled; vendor preset: enabled)
   Active: activating (auto-restart) (Result: exit-code) since Fri 2015-12-11 10:13:31 CET; 6s ago
     Docs: man:sshd(8)
           man:sshd_config(5)
  Process: 38753 ExecStart=/usr/sbin/sshd -D $OPTIONS (code=exited, status=255)
 Main PID: 38753 (code=exited, status=255)
   CGroup: /system.slice/sshd.service
           ├─38552 sshd: root@pts/0
           ├─38556 -bash
           └─38754 systemctl status sshd

Dec 11 10:13:31 xevws029.xeop.de systemd[1]: Unit sshd.service entered failed state.
Dec 11 10:13:31 xevws029.xeop.de systemd[1]: sshd.service failed.

Start failed because of a wrong sshd_config. But systemctl should show if a start has failed.


Version-Release number of selected component (if applicable):
Latest RHEL7.2 RPM. 

How reproducible:
Put invalid configuration directive into sshd_config, restart the server. No error message is printed on stdout. Check server status and see that the server is actually not running.


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

I think the problem is the "Type=simple" parameter which is used when the type is not explicitly defined. I think "Type=forking" would be a better match here.

--- Additional comment from Petr Lautrbach on 2015-12-14 06:33:36 EST ---

(In reply to Thorsten Scherf from comment #0)
> I think the problem is the "Type=simple" parameter which is used when the
> type is not explicitly defined. I think "Type=forking" would be a better
> match here.

It probably doesn't matter if a service is forking or simple in this case. 'Type=simple' is used with '/usr/sbind/sshd -D' as it's simpler for systemd to track a main sshd process which doesn't do fork() & exec(). 

But we could change sshd.service to run '/usr/sbin/sshd -t' before the sshd daemon:


# cat /etc/systemd/system/sshd.service.d/execstartpre.conf
[Service]
ExecStartPre=/usr/sbin/sshd -t $OPTIONS

# echo "NonSense 1" >> /etc/ssh/sshd_config

# systemctl restart sshd                                  
Job for sshd.service failed. See 'systemctl status sshd.service' and 'journalctl -xn' for details.

--- Additional comment from Jakub Jelen on 2015-12-14 10:19:54 EST ---

(In reply to Petr Lautrbach from comment #2)
> It probably doesn't matter if a service is forking or simple in this case.
> 'Type=simple' is used with '/usr/sbind/sshd -D' as it's simpler for systemd
> to track a main sshd process which doesn't do fork() & exec().

I don't know, but setting Type=forking helps systemctl to report errors (unlike the simple one). But it does not suit the description of the daemon behavior from manual pages and changing invocation is probably not a thing we would like to do.

> But we could change sshd.service to run '/usr/sbin/sshd -t' before the sshd
> daemon

This is actually good idea. The ExecStartPre makes the service fail hard if there is problem only in config, but it is checking the config fwice twice with every start.

I was also thinking about possibility to differentiate exit status for wrong configuration and let service fail hard. If I set

  RestartPreventExitStatus=255

"systemctl start sshd" is still not returning any failure, but the service is in failed state (instead of activating as before).

It looks for me like systemd problem. I think some insight from systemd maintainers would be useful.

--- Additional comment from Michal Sekletar on 2015-12-16 04:17:03 EST ---

With Type=simple you always have a problem that basic startup (forking new process, setting up execution environment) succeeds but start of an *actual* daemon fails for some reason. However, failure happens "down the road" after systemd transitioned service to active-running state and systemctl command already returned no error to the user at the command line.

As for whether there is some bug in systemd as comment #3 suggests...Sure it might be the case. Please come up with simple reproducer which exhibits the problem. Frankly, I have a hard time understanding what is an actual problem you see and what is the behavior you expect.

At any rate, I think the best solution for sshd would be to have an actual integration with systemd, i.e. make sshd Type=notify service. See man 3 sd_notify for details. Patch for this should be very small, just couple lines of code and it can be easily maintained downstream in case upstream doesn't care for this.

--- Additional comment from Jakub Jelen on 2015-12-17 05:10:12 EST ---

Michal, thanks for insight about possibilities. The notify type should work for sure. I can try to implement some patch for openssh just to give it a try. But to current state of systemd, I tested the reproducers once more on current RHEL7.2 and here are the results:

Forking which works for me can be simply put together by modifying [Service] section of sshd.service file:

Type=forking
PIDFile=/var/run/sshd.pid
ExecStart=/usr/sbin/sshd $OPTIONS

Issuing start/restart works just fine if the config is ok. If not, I am getting relevant error:

# systemctl start sshd
Job for sshd.service failed because the control process exited with error code. See "systemctl status sshd.service" and "journalctl -xe" for details.



> However, failure happens "down the road" after systemd transitioned service to active-running state and systemctl command already returned no error to the user at the command line.

I don't think this is the case. As mentioned in the original description, systemd knows about the exit status code (code=exited, status=255), but does not report it as a failure (left in activating state for auto-restart). I thought the auto-restart will be the root of problems, but getting rid of it didn't help either (from the original configuration):

#Restart=on-failure
#RestartSec=42s

# systemctl daemon-reload 
# systemctl restart sshd
# systemctl status sshd
[..]
   Active: failed (Result: exit-code) since Thu 2015-12-17 10:53:11 CET; 1s ago

Service is now reported as failed, but the error is not printed during start/restart (but we need that restart).

--- Additional comment from Alec Leamas on 2016-01-11 08:49:03 EST ---

See also https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=778913 and 
https://lists.debian.org/debian-ssh/2015/12/msg00072.html

--- Additional comment from Tim Speetjens on 2016-02-25 06:37:11 EST ---

My tests confirm that indeed changing the unit file to 'forking' and then no longer add '-D' option, work as expected.


[Unit]
Description=OpenSSH server daemon
Documentation=man:sshd(8) man:sshd_config(5)
After=network.target sshd-keygen.service
Wants=sshd-keygen.service

[Service]
Type=forking
EnvironmentFile=/etc/sysconfig/sshd
ExecStart=/usr/sbin/sshd $OPTIONS
ExecReload=/bin/kill -HUP $MAINPID
KillMode=process
Restart=on-failure
RestartSec=42s

[Install]
WantedBy=multi-user.target

--- Additional comment from errata-xmlrpc on 2016-11-03 16:18:45 EDT ---

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2016-2588.html

Comment 2 Jakub Jelen 2017-03-08 14:20:48 UTC
This would be a problem. If I read the original bug correctly, this is the same behavior as it was in RHEL7 so far (activating status even after failure).

We can set RestartPreventExitStatus=255 but all the errors in SSH return 255 (except some command-line options) so this would basically prevent the below expected behavior with "slow networks":


In the case that a computer gets IP from DHCP server later than sshd is started (#1352214) sshd simply failed and was restarted in 42 seconds, which was enough to make the service accessible again.


We could, on the other hand, change the sshd dependency to network-online.target, but I didn't get the explanation what does it mean and do not know what other dependencies will be brought together with this target during the boot.

As a bottomline I am quite opposed to this change, unless it would cause significant problems.

Comment 3 Jakub Jelen 2017-03-22 16:59:41 UTC
Just tested the behavior of fresh install of RHEL7.2, which behaves the same way as the proposed solution without RestartPreventExitStatus=255 (except it does not report the errors in case the configuration is broken).

This is intended behavior especially for the "slow networks" as described in the previous comment.

There is difference from the Debian bug, that they do not have the 42 seconds timeout. Debian sshd would:
 * throttle processor
 * after several tries hits limit and fails permanently -- hard fail
(none of that will happen in RHEL)

Let me know if there is something to clarify, bu otherwise I will close this bug as WONTFIX. It would be nice, if systemd would report exit code 3 from "systemctl status" but it is nothing that ever worked before.

Comment 4 sinopma53 2024-11-12 08:48:13 UTC Comment hidden (spam)

Note You need to log in before you can comment on or make changes to this bug.