Bug 1031158

Summary: swapoff during reboot/shutdown results in oom
Product: [Fedora] Fedora Reporter: Dave Jones <davej>
Component: systemdAssignee: systemd-maint
Status: CLOSED EOL QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 25CC: extras-qa, johannbg, lnykryn, masanari.iida, msekleta, pfrields, plautrba, steveh1966, systemd-maint, thierry.laurion, vpavlin, zbyszek
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-12-12 10:23:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Dave Jones 2013-11-15 18:19:38 UTC
Doing a swapoff at reboot time can be fatal.
If there's insufficient memory to page everything back in, the swapoff process can get killed, and then systemd craps itself and decides hanging the box is a better option than rebooting.


[61035.317344] Out of memory: Kill process 27030 (swapoff) score -1369716241 or sacrifice child
[61035.318256] Killed process 27030 (swapoff) total-vm:123380kB, anon-rss:164kB, file-rss:592kB
[61035.406573] systemd[1]: Unit dev-sda2.swap entered failed state.

Why exactly are we doing a swapoff anyway? It's not like a filesystem where we have to maintain a coherent state across reboots.

Comment 1 Zbigniew Jędrzejewski-Szmek 2013-11-17 21:47:31 UTC
There are cases where swapoff is necessary — when the backing device must be destroyed. For example, if the swap device is a LVM lv, and the LVM is ontop a raid array... I doubt that we can come up without logic to distinguish those special cases where that's not needed without hardcoding stuff.

Anyway, swaps are ordered before sysinit.target, and should be destroyed only after all services are gone. What is using the memory in your case?

Comment 2 Steve 2016-04-12 12:53:10 UTC
Please reopen:

CentOS 7 KVM

reboot process hangs when these processes are running:

MariaDB
Clamd
Spamd

It appears systemd is not shutting down all processes before the swap is destroyed.

Comment 3 Steve 2016-04-12 13:24:35 UTC
on this reboot I manually stopped spamd last before shutdown -r was sent:

Apr 12 08:44:46 18-98-60-69 systemd: Stopped Spamassassin daemon.
** shutdown -r issued
Apr 12 08:45:15 18-98-60-69 systemd: Deactivating swap /dev/sdb1...

log shows that swap is the FIRST process systemd is deactivating

Comment 4 Steve 2016-04-12 13:33:39 UTC
This reboot shows systemd not shutting down all processes:

The only process that is stopped is MariaDB, but after swap is destroyed

clamd, exim, spamd.  Maybe it is stopping these when "Stopping Multi-User System", but again this is after swap is destroyed.

All of my CentOS 7 VPS's have low (512MB) physical memory, and 2-3GB swap.  Everything runs fine with the virtual memory, but this reboot issue is affecting them.

Apr 12 07:58:03 MIA-VPS-VM01013 systemd: Stopping Session 16 of user root.
Apr 12 07:58:03 MIA-VPS-VM01013 systemd: Stopped Dump dmesg to /var/log/dmesg.
Apr 12 07:58:03 MIA-VPS-VM01013 systemd: Stopping Dump dmesg to /var/log/dmesg...
Apr 12 07:58:03 MIA-VPS-VM01013 systemd: Stopped target Timers.
Apr 12 07:58:03 MIA-VPS-VM01013 systemd: Stopping Timers.
Apr 12 07:58:03 MIA-VPS-VM01013 systemd: Deactivating swap /dev/sdb1...
Apr 12 07:58:03 MIA-VPS-VM01013 systemd: Stopped Daily Cleanup of Temporary Directories.
Apr 12 07:58:03 MIA-VPS-VM01013 systemd: Stopping Daily Cleanup of Temporary Directories.
Apr 12 07:58:03 MIA-VPS-VM01013 systemd: Stopping Authorization Manager...
Apr 12 07:58:03 MIA-VPS-VM01013 systemd: Stopped target Multi-User System.
Apr 12 07:58:03 MIA-VPS-VM01013 systemd: Stopping Multi-User System.
Apr 12 07:58:03 MIA-VPS-VM01013 systemd: Stopping MariaDB database server...
Apr 12 07:58:03 MIA-VPS-VM01013 systemd: Stopped target Login Prompts.
Apr 12 07:58:03 MIA-VPS-VM01013 systemd: Stopping Login Prompts.
Apr 12 07:58:03 MIA-VPS-VM01013 systemd: Stopping Command Scheduler...
Apr 12 07:58:03 MIA-VPS-VM01013 systemd: Stopping D-Bus System Message Bus...
Apr 12 07:58:03 MIA-VPS-VM01013 systemd: Stopping Avahi mDNS/DNS-SD Stack...
Apr 12 07:58:04 MIA-VPS-VM01013 rsyslogd: [origin software="rsyslogd" swVersion="7.4.7" x-pid="444" x-info="http://www.rsyslog.com"] exiting on signal 15.

Comment 5 Steve 2016-04-12 13:49:35 UTC
Seems this might be a systemd WIDE issue

here is a Ubuntu 15.10 systemd VPS with the same problem.  The last comment here in another CentOS user with same issue who links this bug to the post on Feb 21, 2016 after it was closed with "CLOSED INSUFFICIENT_DATA".  His logs also show swap deactivating very early in the shutdown sequence.

https://askubuntu.com/questions/732929/server-hangs-on-reboot-when-clam-is-running

Comment 6 Zbigniew Jędrzejewski-Szmek 2016-04-14 12:15:01 UTC
*** Bug 1327177 has been marked as a duplicate of this bug. ***

Comment 7 Zbigniew Jędrzejewski-Szmek 2016-04-14 12:16:56 UTC
From #1327177:

> Description of problem:
> systemd disables swap before stopping all processes that are using swap
> memory during shutdown resulting in a 30 min suspended state after rsyslogd
> is stopped.
> 
> Version-Release number of selected component (if applicable):
> CentOS 7.2
> 
> How reproducible:
> CentOS 7.2 KVM VPS with 512MB memory and 3GB swap
> 
> running mysql
> clamd
> exim
> spamd
> 
> Steps to Reproduce:
> 1. reboot server
> 
> Actual results:
> almost 30 after rsyslogd is stopped, the last process logged to message bus,
> almost exactly 30 elapses before system restarts
> 
> Expected results:
> system should reboot/restart immediately after rsyslogd stops

Yeah, we should look into this. See also https://github.com/systemd/systemd/issues/2930.

Comment 8 masanari iida 2016-05-17 11:52:50 UTC
systemd-219-19.el7.x86_64 on RHEL7.

In my case, Total RAM 7.6GB(fully used),and 2GB swap used.
OS shutdown while application running.
The systemd started to stop all application and services simultaneously,
and the systemd also "swapoff" at the same moment.
At the end result,  it failed to shutdown the system.

My understanding is, swapoff should be happen _after_ all the 
applications and OS related processes are terminated.
But according to the log,  it happened just after the shutdown command
executed.

Log: 
May 11 11:00:25 daemon.info: systemd: Started Delayed Shutdown Service.
May 11 11:00:25 daemon.info: systemd: Starting Delayed Shutdown Service...
May 11 11:00:25 daemon.info: systemd-shutdownd: Shutting down at Wed 2016-05-11 11:00:25 JST (poweroff)...
May 11 11:00:25 daemon.info: systemd-shutdownd: Creating /run/nologin, blocking further logins...
May 11 11:00:25 daemon.info: systemd: Stopping Session 1 of user osemerg2.
May 11 11:00:25 daemon.info: systemd: Stopping Authorization Manager...
May 11 11:00:25 daemon.info: systemd: Deactivating swap /dev/mapper/vg_system-lv_swap...
May 11 11:00:25 daemon.info: systemd: Stopped target Timers.

(snip)

May 11 11:00:25 daemon.info: systemd: Stopped Authorization Manager.
May 11 11:00:25 daemon.notice: systemd: dev-mapper-vg_system\x2dlv_swap.swap swap process exited, code=exited status=255
May 11 11:00:25 daemon.info: systemd: Deactivated swap /dev/mapper/vg_system-lv_swap.
May 11 11:00:25 daemon.notice: systemd: Unit dev-mapper-vg_system\x2dlv_swap.swap entered failed state.

As you see the log, all events happend in the same time. (11:00:25)

Comment 9 masanari iida 2016-07-07 03:00:53 UTC
I checked out systemd's upstream site and found this symptom has been
fixed by following cases.

https://github.com/systemd/systemd/pull/1997
make sure all swap units are ordered before the swap target

https://github.com/systemd/systemd/issues/1902
Swap units are deactivated too early

As long as we go with Fedora 24(which uses systemd-229-8), 
I have confirmed the systemd-229.tar.gz in src.rpm include the fix.
I didn't check minimum required version to fix this symptom, 
as I have a plan to move F24 any time soon.

FYI, the fix is not in systemd-219-19.el7_2.11, (as of RHEL7.2).
I hope RH will backport the patch, or re-base the systemd to latest.

Regards,
Masanari

Comment 10 Jan Kurik 2016-07-26 05:06:21 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 25 development cycle.
Changing version to '25'.

Comment 11 thierry.laurion 2017-03-30 18:48:34 UTC
Hi,

This bug is still present in the version of systemd deployed with CentOS 7.2.
Would it be possible to push the fix upstream?

Comment 12 thierry.laurion 2017-03-31 17:50:36 UTC
Applied workaround would be something like this:

grep swap /var/log/dmesg |grep "dead -> active"
[    1.995413] systemd[1]: dev-dm\x2d1.swap changed dead -> active
[    1.995495] systemd[1]: dev-cl-swap.swap changed dead -> active
[    1.995550] systemd[1]: dev-disk-by\x2did-dm\x2dname\x2dcl\x2dswap.swap changed dead -> active
[    1.995616] systemd[1]: dev-disk-by\x2did-dm\x2duuid\x2dLVM\x2dXOAK7DHxMdmQCrNdwWE3Pt836Q9pHYSGyrO9ycCGeIYavzbamVWNKMaVUMLf1NWZ.swap changed dead -> active
[    1.995678] systemd[1]: dev-disk-by\x2duuid-6509e6e1\x2daf2d\x2d4d23\x2d9ebd\x2da9aa8801e658.swap changed dead -> active


For each problematic system one would need to create /etc/systemd/system/swap.target with the following content from precedent output:
[Unit]
Description=Swap
Documentation=man:systemd.special(7)
After=dev-disk-by\x2duuid-6509e6e1\x2daf2d\x2d4d23\x2d9ebd\x2da9aa8801e658.swap dev-dm1.swap dev-disk-by\x2did-dm\x2duuid\x2dLVM\x2dXOAK7DHxMdmQCrNdwWE3Pt836Q9pHYSGyrO9ycCGeIYavzbamVWNKMaVUMLf1NWZ.swap dev-disk-by\x2did-dm\x2dname\x2dcl\x2dswap.swap dev-cl-swap.swap dev-dm\x2d1.swap

Else, the system attempts to swapoff each alias prior to stopping the rest of the system.

Comment 13 masanari iida 2017-04-03 02:19:23 UTC
It seemed RH has been working on bz#1379268 and bz#1298355 to back port
upstream patch to fix the issue.

Bug 1379268 - backport upstream commit 681c8d8 to make sure all swap units are ordered before the swap target

Bug 1298355 - kickstart stuck at "Reached Target Shutdown" stage when removing media before shutdown completes

Comment 14 thierry.laurion 2017-04-13 15:42:51 UTC
https://bugzilla.redhat.com/show_bug.cgi?id=1379268 is reported as being a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1298355, which it is not.

https://bugzilla.redhat.com/show_bug.cgi?id=1379268 should be reopened.

Comment 15 Fedora End Of Life 2017-11-16 18:59:50 UTC
This message is a reminder that Fedora 25 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 25. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '25'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version'
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not
able to fix it before Fedora 25 is end of life. If you would still like
to see this bug fixed and are able to reproduce it against a later version
of Fedora, you are encouraged  change the 'version' to a later Fedora
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.

Comment 16 Fedora End Of Life 2017-12-12 10:23:39 UTC
Fedora 25 changed to end-of-life (EOL) status on 2017-12-12. Fedora 25 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.