1710302 – systemd respawns forever services triggered by a timer if fork() fails when executing ExecStart command

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1710302 - systemd respawns forever services triggered by a timer if fork() fails when executing ExecStart command

Summary: systemd respawns forever services triggered by a timer if fork() fails when e...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	systemd
Sub Component:
Version:	7.6
Hardware:	All
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	David Tardon
QA Contact:	Frantisek Sumsal
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1729230
TreeView+	depends on / blocked

Reported:	2019-05-15 09:58 UTC by Renaud Métrich
Modified:	2020-03-31 20:03 UTC (History)
CC List:	5 users (show)
Fixed In Version:	systemd-219-69.el7
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1729230 (view as bug list)
Environment:
Last Closed:	2020-03-31 20:02:28 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Memory eater (377 bytes, text/x-csrc) 2019-05-15 10:00 UTC, Renaud Métrich	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	systemd systemd issues 5969	0	'None'	closed	High cpu usage when timer attempts to start a service whose conditionACPower is not met	2021-01-18 09:25:59 UTC
Red Hat Product Errata	RHBA-2020:1117	0	None	None	None	2020-03-31 20:03:05 UTC

Description Renaud Métrich 2019-05-15 09:58:14 UTC

Description of problem:

When a service is triggered using a timer based on last activation time (e.g. "OnUnitActiveSec", defined in systemd-tmpfiles-clean.timer) and for some reason fork() fails when trying to execute the ExecStart command, then systemd starts respawning forever the service in loop, which fills up the journal then later kills the system. 




Version-Release number of selected component (if applicable):

systemd-219-62.el7.7


How reproducible:

Always


Steps to Reproduce:

1. Create a dummy service with timer spawning regularly

# cat /etc/systemd/system/forkfail.timer
[Timer]
OnActiveSec=30s
OnUnitActiveSec=30s

# cat /etc/systemd/system/forkfail.service
[Service]
Type=oneshot
ExecStart=/bin/bash -c "echo forkfail"


2. Start the timer and wait for forkfail.service to be spawned at least once (this is required to fail later)

# systemctl start forkfail.timer

3. Follow next instructions based on fork() failure

Reproducer "EAGAIN":

1. Spawn tons of PIDs until "-bash: fork: retry: Resource temporarily unavailable" is seen

# while sleep 3600 & do :; done

2. Wait for forkfail.service to execute again

# journalctl --follow

--> starts spinning and flooding the journal

May 15 11:56:49 vm-killer7 systemd[1]: Failed to start Dummy oneshot service.
May 15 11:56:49 vm-killer7 systemd[1]: forkfail.service failed.
May 15 11:56:49 vm-killer7 systemd[1]: start request repeated too quickly for forkfail.service
May 15 11:56:49 vm-killer7 systemd[1]: Failed to start Dummy oneshot service.
May 15 11:56:49 vm-killer7 systemd[1]: forkfail.service failed.
May 15 11:56:49 vm-killer7 systemd[1]: start request repeated too quickly for forkfail.service
...


Reproducer "ENOMEM" (less reliable due to oom killer):

1. Disable the swap and spawn memory eaters until "Cannot allocate memory" is seen (attached)

# while ./memuse 1024 & do :; done

2. Wait for forkfail.service to execute again

# journalctl --follow

--> starts spinning and flooding the journal


Additional info:

This *doesn't* happen on RHEL 8 for some reason, even though the code looks very similar.

Comment 2 Renaud Métrich 2019-05-15 10:00:13 UTC

Created attachment 1568931 [details]
Memory eater

gcc -o memuse memuse.c

Comment 3 Renaud Métrich 2019-05-15 10:07:22 UTC

After digging and instrumenting systemd, I found out the following:

The issue happens only on fork() failing (with ENOMEM or EAGAIN).

Relevant code:
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
1740 void unit_notify(Unit *u, UnitActiveState os, UnitActiveState ns, bool reload_success) {
1741         Manager *m;
1742         bool unexpected;
[...]
1756         /* Update timestamps for state changes */
1757         if (m->n_reloading <= 0) {
1758                 dual_timestamp ts;
1759 
1760                 dual_timestamp_get(&ts);
1761 
1762                 if (UNIT_IS_INACTIVE_OR_FAILED(os) && !UNIT_IS_INACTIVE_OR_FAILED(ns))
1763                         u->inactive_exit_timestamp = ts;
1764                 else if (!UNIT_IS_INACTIVE_OR_FAILED(os) && UNIT_IS_INACTIVE_OR_FAILED(ns))
1765                         u->inactive_enter_timestamp = ts;
1766 
1767                 if (!UNIT_IS_ACTIVE_OR_RELOADING(os) && UNIT_IS_ACTIVE_OR_RELOADING(ns))
1768                         u->active_enter_timestamp = ts;
1769                 else if (UNIT_IS_ACTIVE_OR_RELOADING(os) && !UNIT_IS_ACTIVE_OR_RELOADING(ns))
1770                         u->active_exit_timestamp = ts;
1771         }
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

In such case, the condition updating the inactive_exit_timestamp service property (line 1762) will *not* be met due having Old State "os=UNIT_INACTIVE" and New State "ns=UNIT_FAILED".
Whereas if error happens after forking, the Old State is "os=UNIT_FAILED" and New State "ns=UNIT_ACTIVATING", so condition is met.


Timer code:

-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
341 static void timer_enter_waiting(Timer *t, bool initial) {
[...]
363         t->next_elapse_monotonic_or_boottime = t->next_elapse_realtime = 0;
[...]
365         LIST_FOREACH(value, v, t->values) {
[...]
392                         switch (v->base) {
[...]
410                         case TIMER_UNIT_ACTIVE:
411 
412                                 base = trigger->inactive_exit_timestamp.monotonic;
413 
414                                 if (base <= 0)
415                                         base = t->last_trigger.monotonic;
416 
417                                 if (base <= 0)
418                                         continue;
419 
420                                 break;
[...]
436                         }
[...]
441                         v->next_elapse = base + v->value;
[...]
449                         if (!found_monotonic)
450                                 t->next_elapse_monotonic_or_boottime = v->next_elapse;
451                         else
452                                 t->next_elapse_monotonic_or_boottime = MIN(t->next_elapse_monotonic_or_boottime, v->next_elapse);
453 
454                         found_monotonic = true;
[...]
464         if (found_monotonic) {
465                 char buf[FORMAT_TIMESPAN_MAX];
466 
467                 add_random(t, &t->next_elapse_monotonic_or_boottime);
468 
469                 log_unit_debug(UNIT(t)->id, "%s: Monotonic timer elapses in %s.",
470                                UNIT(t)->id,
471                                format_timespan(buf, sizeof(buf), t->next_elapse_monotonic_or_boottime     > ts_monotonic ? t->next_elapse_monotonic_or_boottime - ts_monotonic : 0, 0));
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

If "inactive_exit_timestamp.monotonic" is not updated (our case), then it ends up having "next_elapse" in the past, so "next_elapse_monotonic_or_boottime" expire immediately in loop ("Monotonic timer elapses in 0" debug message).


The unit_notify() call is made from the following code on line 1413:

-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
1365 static void service_enter_signal(Service *s, ServiceState state, ServiceResult f) {
[...]
1375         r = unit_kill_context(
1376                         UNIT(s),
1377                         &s->kill_context,
1378                         (state != SERVICE_STOP_SIGTERM && state != SERVICE_FINAL_SIGTERM && state != SERVICE_STOP_SIGABRT) ?
1379                         KILL_KILL : (state == SERVICE_STOP_SIGABRT ? KILL_ABORT : KILL_TERMINATE),
1380                         s->main_pid,
1381                         s->control_pid,
1382                         s->main_pid_alien);
1383 
1384         if (r < 0)
1385                 goto fail;
[...]
1406 fail:
1407         log_unit_warning_errno(UNIT(s)->id, r, "%s failed to kill processes: %m", UNIT(s)->id);
1408 
1409         if (state == SERVICE_STOP_SIGTERM || state == SERVICE_STOP_SIGKILL ||
1410             state == SERVICE_STOP_SIGABRT)
1411                 service_enter_stop_post(s, SERVICE_FAILURE_RESOURCES);
1412         else
1413                 service_enter_dead(s, SERVICE_FAILURE_RESOURCES, true);
1414 }
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

We reach the "fail" label. Probably due to unit_kill_context() returning in error due to not having created any process yet.
But I'm unsure because I don't manage to see the log on line 1407, nor additional logs I added on line 1385 for example.
For some reason, adding a breakpoint fails also (due to optimizations???).

Comment 6 David Tardon 2019-05-16 14:04:36 UTC

pull request: https://github.com/systemd-rhel/rhel-7/pull/4

Comment 7 Lukáš Nykrýn 2019-07-11 13:30:23 UTC

fix merged to github master branch -> https://github.com/systemd-rhel/rhel-7/pull/4 -> post

Comment 12 errata-xmlrpc 2020-03-31 20:02:28 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:1117

Note You need to log in before you can comment on or make changes to this bug.