Bug 756503 - Restarting sshd kills active connections
Summary: Restarting sshd kills active connections
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Fedora
Classification: Fedora
Component: openssh
Version: 16
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Jan F. Chadima
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-11-23 19:25 UTC by Ben Webb
Modified: 2011-11-28 18:57 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-11-28 09:39:24 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)

Description Ben Webb 2011-11-23 19:25:42 UTC
Description of problem:
If sshd is restarted with 'sudo systemctl restart sshd.service' not only is the sshd binary killed, but all children. This forcibly logs out anybody currently connected via ssh. Also, if sshd is being upgraded by yum over an ssh connection, cleanup of the old openssh-server package fails, because the script tries to restart sshd (and thus kills the session, including yum). The old package must be manually removed with 'rpm -e --noscripts'.

This seems to be a problem with the systemd unit for sshd introduced in F16; restarts work OK on F15 or F14 systems via the old init scripts.

Version-Release number of selected component (if applicable):
openssh-server-5.8p2-21.fc16.x86_64

How reproducible:
Always.

Steps to Reproduce:
1. ssh myserver
2. myserver$ sudo systemctl restart sshd.service
  
Actual results:
myserver$ sudo systemctl restart sshd.service
Connection to myserver closed by remote host.
Connection to myserver closed.

Expected results:
Main sshd process is restarted but active sessions are unaffected.

Additional info:
The main sshd process is at least successfully restarted, so we can log back in. But the cleanup is a nuisance.

The problem seems to be that in F16, everything (including connected ssh sessions) ends up in the sshd.service cgroup:

myserver$ systemctl status sshd.service
sshd.service - OpenSSH server daemon
	  Loaded: loaded (/lib/systemd/system/sshd.service; enabled)
	  Active: active (running) since Mon, 21 Nov 2011 07:02:32 -0800; 2 days ago
	Main PID: 12450 (sshd)
	  CGroup: name=systemd:/system/sshd.service
		  ├ 11258 sshd: ben [priv]
		  ├ 11261 sshd: ben@pts/0
		  ├ 11262 -bash
		  ├ 11284 systemctl status sshd.service
		  └ 12450 /usr/sbin/sshd -D


Whereas on a F15 machine only the main sshd service is in there:
f15server$ systemctl status sshd.service
sshd.service - LSB: Start up the OpenSSH server daemon
	  Loaded: loaded (/etc/rc.d/init.d/sshd)
	  Active: active (running) since Wed, 23 Nov 2011 11:01:40 -0800; 19min ago
	 Process: 10394 ExecStop=/etc/rc.d/init.d/sshd stop (code=exited, status=0/SUCCESS)
	 Process: 10405 ExecStart=/etc/rc.d/init.d/sshd start (code=exited, status=0/SUCCESS)
	Main PID: 10412 (sshd)
	  CGroup: name=systemd:/system/sshd.service
		  └ 10412 /usr/sbin/sshd

Comment 1 Tomas Mraz 2011-11-23 20:06:20 UTC
This happens only when there is no pam_systemd in the /etc/pam.d/password-auth. What's in your /etc/pam.d/password-auth?

Comment 2 Ben Webb 2011-11-23 20:32:43 UTC
(In reply to comment #1)
> This happens only when there is no pam_systemd in the /etc/pam.d/password-auth.

Ah, that's it, thanks. Our configuration files were inherited from pre-systemd days. With pam_systemd added in, sshd restarts work successfully now.

Comment 3 Michal Jaegermann 2011-11-27 18:57:53 UTC
(In reply to comment #1)
> This happens only when there is no pam_systemd in the /etc/pam.d/password-auth.
> What's in your /etc/pam.d/password-auth?

Apparently this is only a part of a story.  On a system I just switched from F14 to F16 I do have '-session optional pam_systemd.so' in /etc/pam.d/password-auth.  Still '/bin/systemctl try-restart sshd.service' immediately drops all connections.

Moreover this left me with the following after the last updates:

Nov 27 11:02:08 Updated: glibc-common-2.14.90-19.x86_64
Nov 27 11:02:13 Updated: glibc-2.14.90-19.x86_64
Nov 27 11:02:13 Updated: openssh-5.8p2-22.fc16.x86_64
Nov 27 11:02:15 Updated: glibc-headers-2.14.90-19.x86_64
Nov 27 11:02:16 Updated: glibc-devel-2.14.90-19.x86_64
Nov 27 11:02:17 Updated: openssh-server-5.8p2-22.fc16.x86_64
Nov 27 11:02:17 Updated: openssh-clients-5.8p2-22.fc16.x86_64

and no transaction cleanup so all these are now duplicates with a strange exception of glibc-devel.

To an added attraction an attempt to run yum-complete-transaction to cleanup that mess ended up with:

Transaction size changed - this means we are not doing the
same transaction as we were before. Aborting and disabling
this transaction.

Very nice, indeed!

It does not matter if in /etc/pam.d/password-auth I have 

-session optional pam_systemd.so

or

session optional pam_systemd.so

Effects if "try-restart" are exactly the same. BTW - I tried to find out
in pam documentation what "-session" may mean, as opposed to "session" and I am still in a dark.

Curiously enough my rawhide installation, continuously updated for a very long time, and with a similar password-auth, is NOT killing ssh connection on this "try-restart".

Comment 4 Michal Jaegermann 2011-11-27 19:41:25 UTC
Hm, on rawhide openssh happens to be now openssh-server-5.9p1-13.fc17 while the one updated on F16 is openssh-5.8p2-22.fc16.  OTOH I do not remember this problem  on rawhide for a long, long time.

To make it even more annoying it is also impossible to run
'package-cleanup --cleandupes' on a remote machine as this not only drops connections but also abandons a transaction so 'rpm -e --noscripts ...' is required.

I do not see any real differences between password-auth from rawhide
(no problems with sshd restarts) and F16.

Comment 5 Tomas Mraz 2011-11-28 07:29:30 UTC
Michal, do I understand it right that you have password-auth in /etc/pam.d/sshd and pam_systemd in /etc/pam.d/password-auth. And still if you do 'systemctl try-restart sshd.service' it will drop your ssh connection? That would be a bug in the systemd or pam_systemd then.

Comment 6 Tomas Mraz 2011-11-28 07:31:28 UTC
Also the '-' before the pam entry means that if the module is missing on the system the pam library will not report it in the syslog. It is documented in the pam.conf(5) manpage.

Comment 7 Tomas Mraz 2011-11-28 09:39:24 UTC
Let's track this in bug 757545 as the original reporter of this bug did not have the pam_systemd in the configuration.

Comment 8 Michal Jaegermann 2011-11-28 18:57:39 UTC
(In reply to comment #7)
> Let's track this in bug 757545

As this was closed then I will put replies to comment #5 in comments to bug 757545.


Note You need to log in before you can comment on or make changes to this bug.