Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1437114

Summary:	systemd stops servicing with "Freezing execution" message upon memory exhaustion
Product:	Red Hat Enterprise Linux 7	Reporter:	Renaud Métrich <rmetrich>
Component:	systemd	Assignee:	Lukáš Nykrýn <lnykryn>
Status:	CLOSED ERRATA	QA Contact:	Frantisek Sumsal <fsumsal>
Severity:	urgent	Docs Contact:
Priority:	urgent
Version:	7.3	CC:	bugproxy, chorn, fkrska, fsumsal, hannsj_uhl, jokot3, jsynacek, kwalker, lnykryn, mlinden, mmatsuya, msekleta, myllynen, nkresic, rzaleski, systemd-maint-list
Target Milestone:	rc	Keywords:	ZStream
Target Release:	7.5
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:	systemd-219-44.el7	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:
Clones:	1546658 1624756 1703945 (view as bug list)		Environment:
Last Closed:	2018-04-10 11:19:30 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1399177, 1420851, 1465901, 1466365, 1476742, 1496263, 1546658, 1624756, 1703945

Description Renaud Métrich 2017-03-29 14:11:03 UTC

Description of problem:

When memory is exhausted on the system and systemd tries to fork to start or stop a service, systemd fails and enters "freeze" mode:

systemd: Failed to fork: Cannot allocate memory
systemd: Assertion 'pid >= 1' failed at src/core/unit.c:1997, function unit_watch_pid(). Aborting.
systemd: Caught <ABRT>, cannot fork for core dump: Cannot allocate memory
systemd: Freezing execution.

Once in "freeze" mode, systemd doesn't provide service anymore and system can only be rebooted using "systemctl reboot -ff":

# reboot
Error getting authority: Error initializing authority: Error calling StartServiceByName for org.freedesktop.PolicyKit1: Timeout was reached (g-io error-quark, 24)
Could not watch jobs: Connection timed out
Failed to open /dev/initctl: No such device or address
Failed to talk to init daemon.

systemd doesn't listen on any socket anymore:

# ls -l /proc/1/fd
total 0
lrwx------. 1 root root 64 Mar 29 16:00 0 -> /dev/null
lrwx------. 1 root root 64 Mar 29 16:00 1 -> /dev/null
lrwx------. 1 root root 64 Mar 29 16:00 2 -> /dev/null

And of courses services cannot be restarted.

Version-Release number of selected component (if applicable):

219-30.el7_3.7

How reproducible:

Always

Steps to Reproduce:
1. Start a VM with swap in it
2. Log onto the console
3. From a ssh terminal, create a tmpfs filesystem and a file in it to exhaust memory

# mount -t tmpfs -o size=20G tmpfs /mnt
# dd if=/dev/zero of=/mnt/file bs=1M
4. Wait for some seconds

If it doesn't reproduce immediately, restarting a service in loop may help:

# while :; do systemctl restart iptables.service; sleep 5; done

Logging onto the console helps because it is likely that oom-killer will kill getty and systemd will try to restart it, causing systemd to enter freeze mode.

Comment 2 Jan Synacek 2017-04-03 08:41:41 UTC

I'm not sure how systemd should behave when the system is continuosly memory thrashed by a misbehaving process. If you run the "dd" command from the reproducer, it does exactly that and causes the kernel to kill everything, continuosly.

I suggest running the misbehaving process in its own service file which will make use of the MemoryLimit= directive. See systemd.resource-control(5) for more information.

Comment 3 Marko Myllynen 2017-04-04 14:23:57 UTC

(In reply to Jan Synacek from comment #2)
> I'm not sure how systemd should behave when the system is continuosly memory
> thrashed by a misbehaving process. If you run the "dd" command from the
> reproducer, it does exactly that and causes the kernel to kill everything,
> continuosly.
> 
> I suggest running the misbehaving process in its own service file which will
> make use of the MemoryLimit= directive. See systemd.resource-control(5) for
> more information.

This is fair approach wrt the reproducer or with a hopelessly out of control application.

However, bugs happen and in theory any component running on a system may cause OOM situation meaning that if, for whatever reason, the system is momentarily out of memory and someone/something happens to restart a service at that moment, then systemd is rendered unusable even if the offending process is terminated right after the fact. Requiring a reboot to recover from such a temporarily hickup would seem to be an unreasonable request for an enterprise operating system.

In fact, this could be even seen as a variant of DoS attack.

Thanks.

Comment 4 Marko Myllynen 2017-04-04 14:26:05 UTC

(In reply to Marko Myllynen from comment #3)
> 
> In fact, this could be even seen as a variant of DoS attack.

This of course iff this happens with regular user controlled user services.

Comment 11 Lukáš Nykrýn 2017-05-05 11:45:13 UTC

https://github.com/lnykryn/systemd-rhel/pull/119

Comment 16 Lukáš Nykrýn 2017-09-07 08:22:36 UTC

fix merged to upstream staging branch -> https://github.com/lnykryn/systemd-rhel/pull/119 -> post

Comment 20 Jan Synacek 2018-02-14 09:42:19 UTC

*** Bug 1496263 has been marked as a duplicate of this bug. ***

Comment 21 IBM Bug Proxy 2018-02-14 09:45:39 UTC

------- Comment From ruddk.com 2017-09-27 13:43 EDT-------
(In reply to comment #7)
> Well even if we fix that particular bug, I think this is a desing problem
> that basically can't be fixed. In the old world of sysvinit the service
> managers were generally state-less. So they were able to recover from these
> state because there was nothing that needed a recovery.
> But with state-full managers like systemd the situation is different. If a
> state of a service changes and the service manager can't allocate memory to
> record that change, then we can't guarantee that the system will work as
> expected even in the case that memory is freed again.

I think that Comment 3 in RH Bug 1437114 (and LTC Bug 159344 Comment 6) does a pretty good job of covering the main concern.

The goal of the fix is to make sure that the primary systemd process gets accurate information back in the event of a subsequent fork failing.

------- Comment From harihare.com 2018-02-14 03:40 EDT-------
Still the issue is reproduced on Pegas 1.1 Sanapshot 2 too.

[root@localhost ~]# dmesg
Segmentation fault
[root@localhost ~]# dmesg
-bash: fork: Cannot allocate memory
[root@localhost ~]# dmesg
-bash: fork: Cannot allocate memory
[root@localhost ~]# dmesg
-bash: fork: Cannot allocate memory

------- Comment From harihare.com 2018-02-14 03:40 EDT-------
Steps to Reproduce.

If it is not reproduce immediately, restarting a service in loop:

Comment 32 errata-xmlrpc 2018-04-10 11:19:30 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0711

Comment 35 IBM Bug Proxy 2018-10-22 21:40:33 UTC

------- Comment From chavez.com 2018-10-22 17:36 EDT-------
*** Bug 159283 has been marked as a duplicate of this bug. ***

Comment 36 Jan Synacek 2018-10-23 07:51:17 UTC

(In reply to IBM Bug Proxy from comment #35)
> ------- Comment From chavez.com 2018-10-22 17:36 EDT-------
> *** Bug 159283 has been marked as a duplicate of this bug. ***

I'm not sure where this came from, but that's certainly a mistake.