1530556 – systemd fails to kill process in d-state and runs additional process in systemctl restart

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1530556 - systemd fails to kill process in d-state and runs additional process in systemctl restart

Summary: systemd fails to kill process in d-state and runs additional process in syste...

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	systemd
Sub Component:
Version:	7.4
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	rc
Target Release:	---
Assignee:	systemd-maint
QA Contact:	qe-baseos-daemons
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-01-03 10:40 UTC by Yaniv Bronhaim
Modified:	2020-02-14 18:20 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-02-14 18:20:01 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Yaniv Bronhaim 2018-01-03 10:40:02 UTC

systemd daemons usually assumes that systemd is ensuring that only one instance of the process is running. However if the process is in D state when restarted it, systemd seems to ignore the kill failure, and try to start the service twice (actually infinite times).

Restart vdsmd service while the vdsmd process is in D state, accessing non-responsive NFS server- Systemd fail to kill vdsmd, and start it again.

You can find example flow here:
https://bugzilla.redhat.com/show_bug.cgi?id=1518676#c25

Also attach the log from:
https://bugzilla.redhat.com/attachment.cgi?id=1361489

Version: systemd 231


Steps to Reproduce:
1. Create a service  with the attach script (single)

#!/usr/bin/python

import io
import mmap
import os
import time

fd = os.open("/tmp/mnt/test", os.O_RDWR | os.O_CREAT | os.O_DIRECT)
buf = mmap.mmap(-1, 512)

check = 0

while True:
    buf[:] = b"\0" * len(buf)
    buf.seek(0)
    buf.write("pid: %d\ncheck: %d\n" % (os.getpid(), check))

    with io.FileIO(fd, "w", closefd=False) as f:
        f.seek(0)
        f.write(buf)
        os.fsync(f)

    time.sleep(10)
    check += 1

2. Install the service with the attached unit file (single.service)

[Unit]
Description=single

[Service]
Type=simple
ExecStart=/root/single
Restart=always


3. Mount nfs on /tmp/mnt

# mkdir /tmp/mnt

# mount myserver:/path /tmp/mnt


4. Start the service

# systemctl start single

# systemctl status single
● single.service - single
   Loaded: loaded (/root/single.service; linked; vendor preset: disabled)
   Active: active (running) since Sun 2017-12-31 20:15:52 IST; 2s ago
 Main PID: 18110 (single)
   CGroup: /system.slice/single.service
           └─18110 /usr/bin/python /root/single

5. Block access to nfs server:

# iptables -A OUTPUT -p tcp -d myserver --dport 2049 -j DROP


6. Wait 10 seconds until the single service move to D state:

# ps auxf
...
root     18110  0.0  0.1 131808  4816 ?        Ds   20:15   0:00 /usr/bin/python /root/single


7. Restart the service

# systemctl restart single


8. In another shell, run journalctl -f

Dec 31 20:17:04 voodoo6.tlv.redhat.com systemd[1]: Stopping single...
Dec 31 20:18:34 voodoo6.tlv.redhat.com systemd[1]: single.service stop-sigterm timed out. Killing.
Dec 31 20:20:01 voodoo6.tlv.redhat.com systemd[1]: Started Session 15 of user root.
Dec 31 20:20:01 voodoo6.tlv.redhat.com systemd[1]: Starting Session 15 of user root.
Dec 31 20:20:01 voodoo6.tlv.redhat.com CROND[18165]: (root) CMD (/usr/lib64/sa/sa1 1 1)
Dec 31 20:20:02 voodoo6.tlv.redhat.com kernel: nfs: server dumbo.tlv.redhat.com not responding, timed out
Dec 31 20:20:04 voodoo6.tlv.redhat.com kernel: nfs: server dumbo.tlv.redhat.com not responding, still trying
Dec 31 20:20:04 voodoo6.tlv.redhat.com systemd[1]: single.service still around after SIGKILL. Ignoring.
Dec 31 20:21:34 voodoo6.tlv.redhat.com systemd[1]: single.service stop-final-sigterm timed out. Killing.
Dec 31 20:23:04 voodoo6.tlv.redhat.com systemd[1]: single.service still around after final SIGKILL. Entering failed mode.
Dec 31 20:23:04 voodoo6.tlv.redhat.com systemd[1]: Unit single.service entered failed state.
Dec 31 20:23:04 voodoo6.tlv.redhat.com systemd[1]: single.service failed.
Dec 31 20:23:04 voodoo6.tlv.redhat.com systemd[1]: Started single.
Dec 31 20:23:04 voodoo6.tlv.redhat.com systemd[1]: Starting single...


9. Check ps output again

# ps auxf
...
root     18110  0.0  0.0      0     0 ?        Ds   20:15   0:00 [single]
root     18183  0.0  0.1 131804  4820 ?        Ds   20:23   0:00 /usr/bin/python /root/single

# systemctl status single
● single.service - single
   Loaded: loaded (/root/single.service; linked; vendor preset: disabled)
   Active: active (running) since Sun 2017-12-31 20:23:04 IST; 9min ago
 Main PID: 18183 (single)
   CGroup: /system.slice/single.service
           └─18183 /usr/bin/python /root/single


Actual result:
two instances running in the same time.

Expected result:
Only single instance is started.

Comment 2 Lukáš Nykrýn 2018-01-03 13:24:45 UTC

Back in my sysadmin days, I think I have dealt with similar situations and a process in such state cannot be killed from userspace, so, to be honest, I have no idea what should be a correct approach here.

Comment 3 Yaniv Bronhaim 2018-01-06 18:47:05 UTC

Hi Lukas, 
Indeed the process stays in this state until the connection returns. This bugzilla is about not running new instance of the daemon on systemctl restart, when the old pid of the same daemon is hanged or failed to terminate.

Comment 4 Dan Kenigsberg 2018-02-11 12:38:38 UTC

(In reply to Lukáš Nykrýn from comment #2)
> Back in my sysadmin days, I think I have dealt with similar situations and a
> process in such state cannot be killed from userspace, so, to be honest, I
> have no idea what should be a correct approach here.

When unsure, it is advisable to take the safe option, of considering failure to stop the old service as an unrecoverable error. That is basically what the kernel does - if it cannot cleanly stop a process, it does not. It keeps it in D state.

Depending on the specific service, running it twice can lead to a horrible data corruption. Some services may be harmless, so they can ask to be restarted even if in D state.

In any case, this is not relevant for rhel-7.5.0.

Comment 5 Kyle Walker 2019-06-20 19:25:04 UTC

Based on the feedback in comment 2, and the further discussion, I am inclined to close this as WONTFIX. Has the team encountered any further instances of this behaviour? There have been quite a few changes in the kernel to allow processes to not sit in TASK_UNINTERRUPTIBLE. Instead, many processes sit within TASK_WAKEKILL, which means that the SIGKILL will be acted upon.

I would believe that a good deal of the codepaths in the kernel that could result in the processes being completely unresponsive to a SIGKILL have been dealt with at this point. Does this team see any incidents or common use-cases where this assumption is incorrect?

Comment 6 Dan Kenigsberg 2019-06-20 19:59:03 UTC

I do not how prevalent this bug is. Maybe Martin knows.

What I do know is that as long as the possibility exists, there would be a customer somewhere loosing data due to it. If systemd promises to enforce no more than a single instance of a running daemon, it should do just that.

Comment 7 Kyle Walker 2020-02-14 18:20:01 UTC

After monitoring the discussion for an extended period of time, I am unable to find any common instances where this type of failure is encountered. At least not within more modern kernel versions and the ever-increasing prevalence of the TASK_WAKEKILL state in the kernel.

In addition, when Red Hat shipped 7.7 on Aug 6, 2019 Red Hat Enterprise Linux 7 entered Maintenance Support 1 Phase.

    https://access.redhat.com/support/policy/updates/errata#Maintenance_Support_1_Phase

That means only "Critical and Important Security errata advisories (RHSAs) and Urgent Priority Bug Fix errata advisories (RHBAs) may be released". This BZ does not appear to meet Maintenance Support 1 Phase criteria so is being closed WONTFIX. If this is critical for your environment please open a case in the Red Hat Customer Portal, https://access.redhat.com, provide a thorough business justification and ask that the BZ be re-opened for consideration in the next minor release.

Note You need to log in before you can comment on or make changes to this bug.