Bug 1710302 - systemd respawns forever services triggered by a timer if fork() fails when executing ExecStart command
Summary: systemd respawns forever services triggered by a timer if fork() fails when e...
Keywords:
Status: VERIFIED
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: systemd
Version: 7.6
Hardware: All
OS: Linux
high
high
Target Milestone: rc
: ---
Assignee: David Tardon
QA Contact: Frantisek Sumsal
URL:
Whiteboard:
Depends On:
Blocks: 1729230
TreeView+ depends on / blocked
 
Reported: 2019-05-15 09:58 UTC by Renaud Métrich
Modified: 2020-02-04 15:31 UTC (History)
5 users (show)

Fixed In Version: systemd-219-69.el7
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1729230 (view as bug list)
Environment:
Last Closed:
Target Upstream Version:


Attachments (Terms of Use)
Memory eater (377 bytes, text/x-csrc)
2019-05-15 10:00 UTC, Renaud Métrich
no flags Details


Links
System ID Priority Status Summary Last Updated
Github systemd systemd issues 5969 'None' closed High cpu usage when timer attempts to start a service whose conditionACPower is not met 2020-02-27 14:03:36 UTC

Description Renaud Métrich 2019-05-15 09:58:14 UTC
Description of problem:

When a service is triggered using a timer based on last activation time (e.g. "OnUnitActiveSec", defined in systemd-tmpfiles-clean.timer) and for some reason fork() fails when trying to execute the ExecStart command, then systemd starts respawning forever the service in loop, which fills up the journal then later kills the system. 




Version-Release number of selected component (if applicable):

systemd-219-62.el7.7


How reproducible:

Always


Steps to Reproduce:

1. Create a dummy service with timer spawning regularly

# cat /etc/systemd/system/forkfail.timer
[Timer]
OnActiveSec=30s
OnUnitActiveSec=30s

# cat /etc/systemd/system/forkfail.service
[Service]
Type=oneshot
ExecStart=/bin/bash -c "echo forkfail"


2. Start the timer and wait for forkfail.service to be spawned at least once (this is required to fail later)

# systemctl start forkfail.timer

3. Follow next instructions based on fork() failure

Reproducer "EAGAIN":

1. Spawn tons of PIDs until "-bash: fork: retry: Resource temporarily unavailable" is seen

# while sleep 3600 & do :; done

2. Wait for forkfail.service to execute again

# journalctl --follow

--> starts spinning and flooding the journal

May 15 11:56:49 vm-killer7 systemd[1]: Failed to start Dummy oneshot service.
May 15 11:56:49 vm-killer7 systemd[1]: forkfail.service failed.
May 15 11:56:49 vm-killer7 systemd[1]: start request repeated too quickly for forkfail.service
May 15 11:56:49 vm-killer7 systemd[1]: Failed to start Dummy oneshot service.
May 15 11:56:49 vm-killer7 systemd[1]: forkfail.service failed.
May 15 11:56:49 vm-killer7 systemd[1]: start request repeated too quickly for forkfail.service
...


Reproducer "ENOMEM" (less reliable due to oom killer):

1. Disable the swap and spawn memory eaters until "Cannot allocate memory" is seen (attached)

# while ./memuse 1024 & do :; done

2. Wait for forkfail.service to execute again

# journalctl --follow

--> starts spinning and flooding the journal


Additional info:

This *doesn't* happen on RHEL 8 for some reason, even though the code looks very similar.

Comment 2 Renaud Métrich 2019-05-15 10:00:13 UTC
Created attachment 1568931 [details]
Memory eater

gcc -o memuse memuse.c

Comment 3 Renaud Métrich 2019-05-15 10:07:22 UTC
After digging and instrumenting systemd, I found out the following:

The issue happens only on fork() failing (with ENOMEM or EAGAIN).

Relevant code:
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
1740 void unit_notify(Unit *u, UnitActiveState os, UnitActiveState ns, bool reload_success) {
1741         Manager *m;
1742         bool unexpected;
[...]
1756         /* Update timestamps for state changes */
1757         if (m->n_reloading <= 0) {
1758                 dual_timestamp ts;
1759 
1760                 dual_timestamp_get(&ts);
1761 
1762                 if (UNIT_IS_INACTIVE_OR_FAILED(os) && !UNIT_IS_INACTIVE_OR_FAILED(ns))
1763                         u->inactive_exit_timestamp = ts;
1764                 else if (!UNIT_IS_INACTIVE_OR_FAILED(os) && UNIT_IS_INACTIVE_OR_FAILED(ns))
1765                         u->inactive_enter_timestamp = ts;
1766 
1767                 if (!UNIT_IS_ACTIVE_OR_RELOADING(os) && UNIT_IS_ACTIVE_OR_RELOADING(ns))
1768                         u->active_enter_timestamp = ts;
1769                 else if (UNIT_IS_ACTIVE_OR_RELOADING(os) && !UNIT_IS_ACTIVE_OR_RELOADING(ns))
1770                         u->active_exit_timestamp = ts;
1771         }
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

In such case, the condition updating the inactive_exit_timestamp service property (line 1762) will *not* be met due having Old State "os=UNIT_INACTIVE" and New State "ns=UNIT_FAILED".
Whereas if error happens after forking, the Old State is "os=UNIT_FAILED" and New State "ns=UNIT_ACTIVATING", so condition is met.


Timer code:

-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
341 static void timer_enter_waiting(Timer *t, bool initial) {
[...]
363         t->next_elapse_monotonic_or_boottime = t->next_elapse_realtime = 0;
[...]
365         LIST_FOREACH(value, v, t->values) {
[...]
392                         switch (v->base) {
[...]
410                         case TIMER_UNIT_ACTIVE:
411 
412                                 base = trigger->inactive_exit_timestamp.monotonic;
413 
414                                 if (base <= 0)
415                                         base = t->last_trigger.monotonic;
416 
417                                 if (base <= 0)
418                                         continue;
419 
420                                 break;
[...]
436                         }
[...]
441                         v->next_elapse = base + v->value;
[...]
449                         if (!found_monotonic)
450                                 t->next_elapse_monotonic_or_boottime = v->next_elapse;
451                         else
452                                 t->next_elapse_monotonic_or_boottime = MIN(t->next_elapse_monotonic_or_boottime, v->next_elapse);
453 
454                         found_monotonic = true;
[...]
464         if (found_monotonic) {
465                 char buf[FORMAT_TIMESPAN_MAX];
466 
467                 add_random(t, &t->next_elapse_monotonic_or_boottime);
468 
469                 log_unit_debug(UNIT(t)->id, "%s: Monotonic timer elapses in %s.",
470                                UNIT(t)->id,
471                                format_timespan(buf, sizeof(buf), t->next_elapse_monotonic_or_boottime     > ts_monotonic ? t->next_elapse_monotonic_or_boottime - ts_monotonic : 0, 0));
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

If "inactive_exit_timestamp.monotonic" is not updated (our case), then it ends up having "next_elapse" in the past, so "next_elapse_monotonic_or_boottime" expire immediately in loop ("Monotonic timer elapses in 0" debug message).


The unit_notify() call is made from the following code on line 1413:

-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
1365 static void service_enter_signal(Service *s, ServiceState state, ServiceResult f) {
[...]
1375         r = unit_kill_context(
1376                         UNIT(s),
1377                         &s->kill_context,
1378                         (state != SERVICE_STOP_SIGTERM && state != SERVICE_FINAL_SIGTERM && state != SERVICE_STOP_SIGABRT) ?
1379                         KILL_KILL : (state == SERVICE_STOP_SIGABRT ? KILL_ABORT : KILL_TERMINATE),
1380                         s->main_pid,
1381                         s->control_pid,
1382                         s->main_pid_alien);
1383 
1384         if (r < 0)
1385                 goto fail;
[...]
1406 fail:
1407         log_unit_warning_errno(UNIT(s)->id, r, "%s failed to kill processes: %m", UNIT(s)->id);
1408 
1409         if (state == SERVICE_STOP_SIGTERM || state == SERVICE_STOP_SIGKILL ||
1410             state == SERVICE_STOP_SIGABRT)
1411                 service_enter_stop_post(s, SERVICE_FAILURE_RESOURCES);
1412         else
1413                 service_enter_dead(s, SERVICE_FAILURE_RESOURCES, true);
1414 }
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

We reach the "fail" label. Probably due to unit_kill_context() returning in error due to not having created any process yet.
But I'm unsure because I don't manage to see the log on line 1407, nor additional logs I added on line 1385 for example.
For some reason, adding a breakpoint fails also (due to optimizations???).

Comment 6 David Tardon 2019-05-16 14:04:36 UTC
pull request: https://github.com/systemd-rhel/rhel-7/pull/4

Comment 7 Lukáš Nykrýn 2019-07-11 13:30:23 UTC
fix merged to github master branch -> https://github.com/systemd-rhel/rhel-7/pull/4 -> post


Note You need to log in before you can comment on or make changes to this bug.