1047614 – [GSS 7.0 Disc] Powering off remote node doesn't close ssh session

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1047614 - [GSS 7.0 Disc] Powering off remote node doesn't close ssh session

Summary: [GSS 7.0 Disc] Powering off remote node doesn't close ssh session

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	systemd
Sub Component:
Version:	7.0
Hardware:	All
OS:	Linux
Priority:	urgent
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Michal Sekletar
QA Contact:	Leos Pol
Docs Contact:
URL:
Whiteboard:
Duplicates (3):	1032109 1039806 1078906 (view as bug list)
Depends On:
Blocks:	RHEL7CCC 860099 1018952 1050219 1203710 1289485 1313485
TreeView+	depends on / blocked

Reported:	2014-01-01 04:39 UTC by Madison Kelly
Modified:	2019-11-14 06:23 UTC (History)
CC List:	36 users (show)
Fixed In Version:	systemd-208-9.el7
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2016-07-28 08:58:56 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
journal.log (170.56 KB, text/plain) 2014-02-13 14:49 UTC, Petr Lautrbach	no flags	Details
journal-NM.log (176.84 KB, text/plain) 2014-02-13 14:55 UTC, Petr Lautrbach	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Knowledge Base (Solution)	2490261	0	None	None	None	2016-08-09 16:37:34 UTC

Description Madison Kelly 2014-01-01 04:39:25 UTC

Description of problem:

Simple one. If you ssh into a RHEL 7 beta server and power it off with 'poweroff', the ssh session hangs instead of closes.


Version-Release number of selected component (if applicable):

openssh-6.4p1-1.el7.x86_64


How reproducible:

Seems to be 100% (based on minimal installs on KVM VMs)


Steps to Reproduce:
1. Install RHEL 7 minimal
2. SSH into RHEL 7 machine
3. Type 'poweroff'.

Actual results:

terminal hangs until ~.<enter> pressed.


Expected results:

ssh session closes


Additional info:

Comment 2 Petr Lautrbach 2014-01-02 08:49:21 UTC

This is the problem in systemd logic. systemd doesn't stop user sessions before it shuts down a network. It's already reported for Fedora 20 - https://bugzilla.redhat.com/show_bug.cgi?id=1023788

Comment 3 Harald Reindl 2014-01-15 02:01:26 UTC

how comes that such regressions compared to systemd-204 of F19 make it in a systemd-release and stay there for many weeks?

even with ssh root@host "systemctl reboot; exit" you have a frozen VT 

have fun if you are on a physical machine with no X11
and VT1-VT6 are connected to F20/RHEL7 machines you
like to reboot

Comment 5 Michal Sekletar 2014-01-17 19:59:34 UTC

(In reply to Petr Lautrbach from comment #2)
> This is the problem in systemd logic. systemd doesn't stop user sessions
> before it shuts down a network. It's already reported for Fedora 20 -
> https://bugzilla.redhat.com/show_bug.cgi?id=1023788

Hmmm, I am not convinced this is the problem. AFAIKT, in my tests I made sure that order in which service are stopped is correct. But following message popped up in journal:

Jan 17 20:29:36 localhost systemd[1]: Failed to destroy cgroup /user.slice/user-0.slice: Device or resource busy

As I see it, systemd should successfully destroy cgroup corresponding to slice/session (and processes in it) or try harder later if it is not possible at first try.

Peter can you please attach journal log from your machine? Make sure systemd is running with log level set to debug (kill -56 1). Please use persistent journal (mkdir -p /var/log/journal && systemctl restart systemd-journald), because I want everything not just what rsyslog is able to dig from journal files.

Thanks!

Comment 7 Paul Wouters 2014-02-06 20:28:44 UTC

I'm seeing this issue too when running libreswan testcases using f20/rhel7 VMs. It is causing many false positives, so it would be _really_ nice to get this fixed.

Comment 8 Harald Reindl 2014-02-06 20:31:30 UTC

nice?

that should have been a realease blocker for F20 given taht i reported this *months* before GA at https://bugzilla.redhat.com/show_bug.cgi?id=1023788

here you have a hit-list of systemd-troubles in F20/RHEL7
which are the biggest regressions since Fedora 15 

https://bugzilla.redhat.com/show_bug.cgi?id=1023820
https://bugzilla.redhat.com/show_bug.cgi?id=1010572
https://bugzilla.redhat.com/show_bug.cgi?id=1057811
https://bugzilla.redhat.com/show_bug.cgi?id=1057618
https://bugzilla.redhat.com/show_bug.cgi?id=1023788#c

Comment 9 Steve Grubb 2014-02-12 16:54:48 UTC

Just for the record, this problem is also messing up the audit trail. I can't see user sessions getting terminated and they look like a crash.

I also think there are pam modules that allocate things like name spaces, mounts, devices, etc. Meaning that not being able to properly close out pam means the resources never get released back to the OS. So, this bug is kind of important to have fixed.

Comment 10 Petr Lautrbach 2014-02-13 14:49:51 UTC

Created attachment 862821 [details]
journal.log

You need to disable NetworkManager.service and enable network.service.

[root@rhel-7-devel ~]# kill -56 1
[root@rhel-7-devel ~]# date
Thu Feb 13 15:43:52 CET 2014
[root@rhel-7-devel ~]# reboot
Write failed: Broken pipe
$ ssh root@rhel-7-devel 
root@rhel-7-devel's password: 
Last login: Thu Feb 13 15:43:24 2014 from master.virt
[root@rhel-7-devel ~]# journalctl -l --since="15:43:52" > journal.log

Comment 11 Petr Lautrbach 2014-02-13 14:55:50 UTC

Created attachment 862834 [details]
journal-NM.log

using NetworkManager.service it seems to work:

[root@rhel-7-devel ~]# systemctl disable network.service
network.service is not a native service, redirecting to /sbin/chkconfig.
Executing /sbin/chkconfig network off
[root@rhel-7-devel ~]# systemctl enable NetworkManager.service
ln -s '/usr/lib/systemd/system/NetworkManager.service' '/etc/systemd/system/dbus-org.freedesktop.NetworkManager.service'
ln -s '/usr/lib/systemd/system/NetworkManager.service' '/etc/systemd/system/multi-user.target.wants/NetworkManager.service'
ln -s '/usr/lib/systemd/system/NetworkManager-dispatcher.service' '/etc/systemd/system/dbus-org.freedesktop.nm-dispatcher.service'
[root@rhel-7-devel ~]# reboot
Write failed: Broken pipe
$ ssh root@rhel-7-devel 
root@rhel-7-devel's password: 
Last login: Thu Feb 13 15:45:29 2014 from master.virt
[root@rhel-7-devel ~]# kill -56 1
[root@rhel-7-devel ~]# date
Thu Feb 13 15:51:51 CET 2014
[root@rhel-7-devel ~]# reboot

Broadcast message from root@rhel-7-devel on pts/0 (Thu 2014-02-13 15:51:54 CET):

The system is going down for reboot NOW!

[root@rhel-7-devel ~]# Connection to rhel-7-devel closed by remote host.
Connection to rhel-7-devel closed.

$ ssh root@rhel-7-devel 
root@rhel-7-devel's password: 
X11 forwarding request failed on channel 0
Last login: Thu Feb 13 15:51:35 2014 from master.virt

[root@rhel-7-devel ~]# journalctl -l --since="15:51:51" > journal-NM.log

Comment 12 Harald Reindl 2014-02-13 15:16:48 UTC

systemd-upstream claims this to be fixed somewhere and sometime
asked yesterday on the systemd-list and only got a arrogant 
reply that only active systemd  developers are allowed for critism

-------- Original-Nachricht --------
Betreff: Re: [systemd-devel] https://bugzilla.redhat.com/show_bug.cgi?id=1047614
Datum: Wed, 12 Feb 2014 21:19:02 +0100
Von: Lennart Poettering <lennart>
Organisation: Red Hat, Inc.
An: Reindl Harald <h.reindl>
Kopie (CC): Mailing-List systemd <systemd-devel.org>

On Wed, 12.02.14 20:05, Reindl Harald (h.reindl) wrote:

> https://bugzilla.redhat.com/show_bug.cgi?id=1047614
> 
> Product: 	Red Hat Enterprise Linux 7
> Component: 	systemd (Show other bugs)
> Version: 	7.0
> Hardware: 	Unspecified Unspecified
> Priority 	urgent Severity high
> 
> first reported more than 3 months ago
> https://bugzilla.redhat.com/show_bug.cgi?id=1023788
> 
> maybe systemd-upstream should consider slow down development
> and spend more energy in quality and stability

Well, firstly, it's hardly your business how we spend our time.

Secondly, this bug is fixed upstream.

Thirdly, patches count more than complaining.

Comment 16 Lukáš Nykrýn 2014-02-25 16:13:46 UTC

*** Bug 1032109 has been marked as a duplicate of this bug. ***

Comment 17 Lukáš Nykrýn 2014-02-25 16:51:26 UTC

*** Bug 1039806 has been marked as a duplicate of this bug. ***

Comment 19 Harald Reindl 2014-03-02 15:49:36 UTC

systemd-210 in Fedora Rawhide fixes this problem and some other nasty things - hopefully it is considered to switch to version 210 in RHEL7 as well as in F20 instead try to backport cherry pickings

Comment 20 Michal Sekletar 2014-03-05 07:01:43 UTC

It is not planned to rebase to 210 in RHEL7 or Fedora 20. Backport of required fixes for this is underway, however there has been a ton a changes introduced in 209 release cycle so backporting is hard. Anyway, this will be fixed soon.

Comment 22 Leos Pol 2014-03-17 14:14:09 UTC

[root@rhel7 ~]# rpm -q systemd
systemd-208-9.el7.x86_64
[root@rhel7 ~]# poweroff
Connection to rhel7.virt closed by remote host.
Connection to rhel7.virt closed.

Comment 23 Lukáš Nykrýn 2014-03-20 15:56:18 UTC

*** Bug 1078906 has been marked as a duplicate of this bug. ***

Comment 24 Ludek Smid 2014-06-13 11:46:33 UTC

This request was resolved in Red Hat Enterprise Linux 7.0.

Contact your manager or support representative in case you have further questions about the request.

Comment 29 Michal Sekletar 2016-01-25 12:21:16 UTC

Possible workaround,

add drop-in configuration file /etc/systemd/system/systemd-user-sessions.service.d/after-network.conf

with following content,

[Unit]
After=network.target

and reload systemd,

systemctl daemon-reload

Comment 30 Sean Mullen 2016-02-17 14:15:04 UTC

I'm on RHEL 7.2 using network.service (NetworkManager.service is stopped / disabled) because we're 100% static IP, no wifi.  NetworkManager keeps modifying crap so we just disabled it and enabled network.service.

I've tried Michal's work around, no luck.

systemd-219-19.el7.x86_64 is installed.

I'm still having this issue. Any ideas how to fix this?

Comment 31 Freddy Wissing 2016-02-17 14:46:45 UTC

I hesitate to call it a work around, but at least it prevents having to kill the session if you're rebooting a server through a jump box.  

# (nohup sleep 10;reboot) &

# logout

Comment 32 Lukáš Nykrýn 2016-04-11 11:21:50 UTC

I was unable to reproduce this issue, could you try to get a shutdown log for the issue with the new version of systemd?

https://freedesktop.org/wiki/Software/systemd/Debugging/#index2h1

Comment 33 Michal Sekletar 2016-05-05 13:27:57 UTC

I've built some test packages which contain related fix. Feel free to try them out.

http://people.redhat.com/~msekleta/systemd-219-20.el7.0.bz1047614/

Comment 34 Susant Sahani 2016-05-24 07:58:52 UTC

I believe this bug should be fix by this upstream commit.

Could you please try out.

https://github.com/systemd/systemd/issues/2390
https://github.com/systemd/systemd/commit/8c856804780681e135d98ca94d08afe247557770

please network.target in the After= directive.
----------------------------------------------------
# cat system/systemd-user-sessions.service
#  This file is part of systemd.
#
#  systemd is free software; you can redistribute it and/or modify it
#  under the terms of the GNU Lesser General Public License as published by
#  the Free Software Foundation; either version 2.1 of the License, or
#  (at your option) any later version.

[Unit]
Description=Permit User Sessions
Documentation=man:systemd-user-sessions.service(8)
After=remote-fs.target nss-user-lookup.target network.target

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/usr/lib/systemd/systemd-user-sessions start
ExecStop=/usr/lib/systemd/systemd-user-sessions stop
--------------------------------------------------

Comment 39 Michal Sekletar 2016-05-26 16:01:56 UTC

Hang on client side happens because processes under PAM session created by ssh are not put into a .scope systemd unit, e.g. systemd is not aware of those processes (loginctl knows about 0 sessions). Because of that, such group of processes is not scheduled to stop at shutdown and stays running up until the final killing spree, but even then, we first send SIGTERM and then SIGKILL. Hence if network connection is still up then there should be a chance to close ssh connection correctly. From shutdown log I see that customer is using network initscripts and ifdown script will put interface down if NetworkManager is not used.

Weird thing is that pam config actually looks ok because pam_systemd is listed in password-auth and that is then included by sshd pam config, so processes should really be inside .scope units.

Can you verify that pam_systemd module is present on the system? To do that you can run rpm -qV systemd-libs.

Comment 48 Michal Sekletar 2016-07-28 08:58:56 UTC

I've analysed sos_report and confirmed my suspicion about incorrect PAM configuration.

I think there are two ways how to resole this issue,

1) either fix PAM config to include pam_systemd.so module, hence all user processes are registered in respective scope units at those are scheduled to shut down before network connections are terminated by ifdown scripts.

or

2) use NetworkManager instead of initscripts. NM doesn't put interfaces down when it is stopped and ssh session can get gracefully terminated.

Either way there is nothing to fix in systemd or related component Closing as CURRENTRELEASE. Feel free to reopen in case I've missed something.

Comment 49 Helmut K. C. Tessarek 2017-05-10 22:12:10 UTC

This bug still exists in Redhat 7.3, btw.

$ reboot

PolicyKit daemon disconnected from the bus.
We are no longer a registered authentication agent.
<after about 60 seconds>
packet_write_wait: Connection to xx.xx.xx.xx: Broken pipe

Comment 50 Michal Sekletar 2017-05-11 07:01:58 UTC

(In reply to Helmut Tessarek from comment #49)

> This bug still exists in Redhat 7.3, btw.

This issue should be reproducible only when you don't use pam_systemd and you use legacy network initscripts instead of NetworkManager.

In case you see the issue but your system is not set up in a way I described above then please file a new bug report.

Note You need to log in before you can comment on or make changes to this bug.

alexander.hass
andriusb
ayadav
ebenes
fdanapfe
ffotorel
fwissing
h.reindl
ichute
juzhang
kajtzu
lmiccini
lnykryn
lpol
mihai
mkolman
msekleta
mullens
myllynen
nagata3333333
nparmar
ohudlick
pasteur
plautrba
pwouters
rjones
rsawhill
sbeal
sgrubb
ssahani
stephan.wiesand
systemd-maint-list
tessarek
theinric
toracat
vanhoof