Bug 756503

Summary: Restarting sshd kills active connections
Product: [Fedora] Fedora Reporter: Ben Webb <ben>
Component: opensshAssignee: Jan F. Chadima <jchadima>
Status: CLOSED NOTABUG QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: 16CC: jchadima, mattias.ellert, mgrepl, michal, plautrba, tmraz
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-11-28 04:39:24 EST Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Description Ben Webb 2011-11-23 14:25:42 EST
Description of problem:
If sshd is restarted with 'sudo systemctl restart sshd.service' not only is the sshd binary killed, but all children. This forcibly logs out anybody currently connected via ssh. Also, if sshd is being upgraded by yum over an ssh connection, cleanup of the old openssh-server package fails, because the script tries to restart sshd (and thus kills the session, including yum). The old package must be manually removed with 'rpm -e --noscripts'.

This seems to be a problem with the systemd unit for sshd introduced in F16; restarts work OK on F15 or F14 systems via the old init scripts.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. ssh myserver
2. myserver$ sudo systemctl restart sshd.service
Actual results:
myserver$ sudo systemctl restart sshd.service
Connection to myserver closed by remote host.
Connection to myserver closed.

Expected results:
Main sshd process is restarted but active sessions are unaffected.

Additional info:
The main sshd process is at least successfully restarted, so we can log back in. But the cleanup is a nuisance.

The problem seems to be that in F16, everything (including connected ssh sessions) ends up in the sshd.service cgroup:

myserver$ systemctl status sshd.service
sshd.service - OpenSSH server daemon
	  Loaded: loaded (/lib/systemd/system/sshd.service; enabled)
	  Active: active (running) since Mon, 21 Nov 2011 07:02:32 -0800; 2 days ago
	Main PID: 12450 (sshd)
	  CGroup: name=systemd:/system/sshd.service
		  ├ 11258 sshd: ben [priv]
		  ├ 11261 sshd: ben@pts/0
		  ├ 11262 -bash
		  ├ 11284 systemctl status sshd.service
		  └ 12450 /usr/sbin/sshd -D

Whereas on a F15 machine only the main sshd service is in there:
f15server$ systemctl status sshd.service
sshd.service - LSB: Start up the OpenSSH server daemon
	  Loaded: loaded (/etc/rc.d/init.d/sshd)
	  Active: active (running) since Wed, 23 Nov 2011 11:01:40 -0800; 19min ago
	 Process: 10394 ExecStop=/etc/rc.d/init.d/sshd stop (code=exited, status=0/SUCCESS)
	 Process: 10405 ExecStart=/etc/rc.d/init.d/sshd start (code=exited, status=0/SUCCESS)
	Main PID: 10412 (sshd)
	  CGroup: name=systemd:/system/sshd.service
		  └ 10412 /usr/sbin/sshd
Comment 1 Tomas Mraz 2011-11-23 15:06:20 EST
This happens only when there is no pam_systemd in the /etc/pam.d/password-auth. What's in your /etc/pam.d/password-auth?
Comment 2 Ben Webb 2011-11-23 15:32:43 EST
(In reply to comment #1)
> This happens only when there is no pam_systemd in the /etc/pam.d/password-auth.

Ah, that's it, thanks. Our configuration files were inherited from pre-systemd days. With pam_systemd added in, sshd restarts work successfully now.
Comment 3 Michal Jaegermann 2011-11-27 13:57:53 EST
(In reply to comment #1)
> This happens only when there is no pam_systemd in the /etc/pam.d/password-auth.
> What's in your /etc/pam.d/password-auth?

Apparently this is only a part of a story.  On a system I just switched from F14 to F16 I do have '-session optional pam_systemd.so' in /etc/pam.d/password-auth.  Still '/bin/systemctl try-restart sshd.service' immediately drops all connections.

Moreover this left me with the following after the last updates:

Nov 27 11:02:08 Updated: glibc-common-2.14.90-19.x86_64
Nov 27 11:02:13 Updated: glibc-2.14.90-19.x86_64
Nov 27 11:02:13 Updated: openssh-5.8p2-22.fc16.x86_64
Nov 27 11:02:15 Updated: glibc-headers-2.14.90-19.x86_64
Nov 27 11:02:16 Updated: glibc-devel-2.14.90-19.x86_64
Nov 27 11:02:17 Updated: openssh-server-5.8p2-22.fc16.x86_64
Nov 27 11:02:17 Updated: openssh-clients-5.8p2-22.fc16.x86_64

and no transaction cleanup so all these are now duplicates with a strange exception of glibc-devel.

To an added attraction an attempt to run yum-complete-transaction to cleanup that mess ended up with:

Transaction size changed - this means we are not doing the
same transaction as we were before. Aborting and disabling
this transaction.

Very nice, indeed!

It does not matter if in /etc/pam.d/password-auth I have 

-session optional pam_systemd.so


session optional pam_systemd.so

Effects if "try-restart" are exactly the same. BTW - I tried to find out
in pam documentation what "-session" may mean, as opposed to "session" and I am still in a dark.

Curiously enough my rawhide installation, continuously updated for a very long time, and with a similar password-auth, is NOT killing ssh connection on this "try-restart".
Comment 4 Michal Jaegermann 2011-11-27 14:41:25 EST
Hm, on rawhide openssh happens to be now openssh-server-5.9p1-13.fc17 while the one updated on F16 is openssh-5.8p2-22.fc16.  OTOH I do not remember this problem  on rawhide for a long, long time.

To make it even more annoying it is also impossible to run
'package-cleanup --cleandupes' on a remote machine as this not only drops connections but also abandons a transaction so 'rpm -e --noscripts ...' is required.

I do not see any real differences between password-auth from rawhide
(no problems with sshd restarts) and F16.
Comment 5 Tomas Mraz 2011-11-28 02:29:30 EST
Michal, do I understand it right that you have password-auth in /etc/pam.d/sshd and pam_systemd in /etc/pam.d/password-auth. And still if you do 'systemctl try-restart sshd.service' it will drop your ssh connection? That would be a bug in the systemd or pam_systemd then.
Comment 6 Tomas Mraz 2011-11-28 02:31:28 EST
Also the '-' before the pam entry means that if the module is missing on the system the pam library will not report it in the syslog. It is documented in the pam.conf(5) manpage.
Comment 7 Tomas Mraz 2011-11-28 04:39:24 EST
Let's track this in bug 757545 as the original reporter of this bug did not have the pam_systemd in the configuration.
Comment 8 Michal Jaegermann 2011-11-28 13:57:39 EST
(In reply to comment #7)
> Let's track this in bug 757545

As this was closed then I will put replies to comment #5 in comments to bug 757545.