RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1437114 - systemd stops servicing with "Freezing execution" message upon memory exhaustion
Summary: systemd stops servicing with "Freezing execution" message upon memory exhaustion
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: systemd
Version: 7.3
Hardware: All
OS: Linux
urgent
urgent
Target Milestone: rc
: 7.5
Assignee: Lukáš Nykrýn
QA Contact: Frantisek Sumsal
URL:
Whiteboard:
: 1496263 (view as bug list)
Depends On:
Blocks: 1399177 1420851 1465901 1466365 1476742 1496263 1546658 1624756 1703945
TreeView+ depends on / blocked
 
Reported: 2017-03-29 14:11 UTC by Renaud Métrich
Modified: 2023-10-06 17:36 UTC (History)
16 users (show)

Fixed In Version: systemd-219-44.el7
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1546658 1624756 1703945 (view as bug list)
Environment:
Last Closed: 2018-04-10 11:19:30 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
IBM Linux Technology Center 159344 0 None None None 2019-08-05 01:32:14 UTC
Red Hat Knowledge Base (Solution) 3096191 0 None None None 2017-06-28 18:10:19 UTC
Red Hat Product Errata RHBA-2018:0711 0 None None None 2018-04-10 11:20:54 UTC

Description Renaud Métrich 2017-03-29 14:11:03 UTC
Description of problem:

When memory is exhausted on the system and systemd tries to fork to start or stop a service, systemd fails and enters "freeze" mode:

    systemd: Failed to fork: Cannot allocate memory
    systemd: Assertion 'pid >= 1' failed at src/core/unit.c:1997, function unit_watch_pid(). Aborting.
    systemd: Caught <ABRT>, cannot fork for core dump: Cannot allocate memory
    systemd: Freezing execution.


Once in "freeze" mode, systemd doesn't provide service anymore and system can only be rebooted using "systemctl reboot -ff":

    # reboot
    Error getting authority: Error initializing authority: Error calling StartServiceByName for org.freedesktop.PolicyKit1: Timeout was reached (g-io error-quark, 24)
    Could not watch jobs: Connection timed out
    Failed to open /dev/initctl: No such device or address
    Failed to talk to init daemon.

systemd doesn't listen on any socket anymore:

    # ls -l /proc/1/fd
    total 0
    lrwx------. 1 root root 64 Mar 29 16:00 0 -> /dev/null
    lrwx------. 1 root root 64 Mar 29 16:00 1 -> /dev/null
    lrwx------. 1 root root 64 Mar 29 16:00 2 -> /dev/null

And of courses services cannot be restarted.


Version-Release number of selected component (if applicable):

219-30.el7_3.7

How reproducible:

Always

Steps to Reproduce:
1. Start a VM with swap in it
2. Log onto the console
3. From a ssh terminal, create a tmpfs filesystem and a file in it to exhaust memory

    # mount -t tmpfs -o size=20G tmpfs /mnt
    # dd if=/dev/zero of=/mnt/file bs=1M
4. Wait for some seconds

If it doesn't reproduce immediately, restarting a service in loop may help:

    # while :; do systemctl restart iptables.service; sleep 5; done

Logging onto the console helps because it is likely that oom-killer will kill getty and systemd will try to restart it, causing systemd to enter freeze mode.

Comment 2 Jan Synacek 2017-04-03 08:41:41 UTC
I'm not sure how systemd should behave when the system is continuosly memory thrashed by a misbehaving process. If you run the "dd" command from the reproducer, it does exactly that and causes the kernel to kill everything, continuosly.

I suggest running the misbehaving process in its own service file which will make use of the MemoryLimit= directive. See systemd.resource-control(5) for more information.

Comment 3 Marko Myllynen 2017-04-04 14:23:57 UTC
(In reply to Jan Synacek from comment #2)
> I'm not sure how systemd should behave when the system is continuosly memory
> thrashed by a misbehaving process. If you run the "dd" command from the
> reproducer, it does exactly that and causes the kernel to kill everything,
> continuosly.
> 
> I suggest running the misbehaving process in its own service file which will
> make use of the MemoryLimit= directive. See systemd.resource-control(5) for
> more information.

This is fair approach wrt the reproducer or with a hopelessly out of control application.

However, bugs happen and in theory any component running on a system may cause OOM situation meaning that if, for whatever reason, the system is momentarily out of memory and someone/something happens to restart a service at that moment, then systemd is rendered unusable even if the offending process is terminated right after the fact. Requiring a reboot to recover from such a temporarily hickup would seem to be an unreasonable request for an enterprise operating system.

In fact, this could be even seen as a variant of DoS attack.

Thanks.

Comment 4 Marko Myllynen 2017-04-04 14:26:05 UTC
(In reply to Marko Myllynen from comment #3)
> 
> In fact, this could be even seen as a variant of DoS attack.

This of course iff this happens with regular user controlled user services.

Comment 11 Lukáš Nykrýn 2017-05-05 11:45:13 UTC
https://github.com/lnykryn/systemd-rhel/pull/119

Comment 16 Lukáš Nykrýn 2017-09-07 08:22:36 UTC
fix merged to upstream staging branch -> https://github.com/lnykryn/systemd-rhel/pull/119 -> post

Comment 20 Jan Synacek 2018-02-14 09:42:19 UTC
*** Bug 1496263 has been marked as a duplicate of this bug. ***

Comment 21 IBM Bug Proxy 2018-02-14 09:45:39 UTC
------- Comment From ruddk.com 2017-09-27 13:43 EDT-------
(In reply to comment #7)
> Well even if we fix that particular bug, I think this is a desing problem
> that basically can't be fixed. In the old world of sysvinit the service
> managers were generally state-less. So they were able to recover from these
> state because there was nothing that needed a recovery.
> But with state-full managers like systemd the situation is different. If a
> state of a service changes and the service manager can't allocate memory to
> record that change, then we can't guarantee that the system will work as
> expected even in the case that memory is freed again.

I think that Comment 3 in RH Bug 1437114 (and LTC Bug 159344 Comment 6) does a pretty good job of covering the main concern.

The goal of the fix is to make sure that the primary systemd process gets accurate information back in the event of a subsequent fork failing.

------- Comment From harihare.com 2018-02-14 03:40 EDT-------
Still the issue is reproduced on Pegas 1.1 Sanapshot 2 too.

[root@localhost ~]# dmesg
Segmentation fault
[root@localhost ~]# dmesg
-bash: fork: Cannot allocate memory
[root@localhost ~]# dmesg
-bash: fork: Cannot allocate memory
[root@localhost ~]# dmesg
-bash: fork: Cannot allocate memory

------- Comment From harihare.com 2018-02-14 03:40 EDT-------
Steps to Reproduce.

If it is not reproduce immediately, restarting a service in loop:

Comment 32 errata-xmlrpc 2018-04-10 11:19:30 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0711

Comment 35 IBM Bug Proxy 2018-10-22 21:40:33 UTC
------- Comment From chavez.com 2018-10-22 17:36 EDT-------
*** Bug 159283 has been marked as a duplicate of this bug. ***

Comment 36 Jan Synacek 2018-10-23 07:51:17 UTC
(In reply to IBM Bug Proxy from comment #35)
> ------- Comment From chavez.com 2018-10-22 17:36 EDT-------
> *** Bug 159283 has been marked as a duplicate of this bug. ***

I'm not sure where this came from, but that's certainly a mistake.


Note You need to log in before you can comment on or make changes to this bug.