Bug 2212314
Summary: | systemd deadlocks waiting for its child forever when receiving SIGQUIT signal | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 8 | Reporter: | Renaud Métrich <rmetrich> |
Component: | systemd | Assignee: | Michal Sekletar <msekleta> |
Status: | CLOSED MIGRATED | QA Contact: | Frantisek Sumsal <fsumsal> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 8.8 | CC: | dtardon, systemd-maint-list |
Target Milestone: | rc | Keywords: | MigratedToJIRA, Reproducer |
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2023-09-21 15:15:04 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Renaud Métrich
2023-06-05 09:35:33 UTC
It's very possible other scenarios but SIGQUIT can cause same deadlock. Looking at the systemd code, I see this in particular: -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- 801 void sigkill_wait(pid_t pid) { 802 assert(pid > 1); 803 804 if (kill(pid, SIGKILL) > 0) 805 (void) wait_for_terminate(pid, NULL); 806 } 819 void sigterm_wait(pid_t pid) { 820 assert(pid > 1); 821 822 if (kill_and_sigcont(pid, SIGTERM) > 0) 823 (void) wait_for_terminate(pid, NULL); 824 } -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- sigterm_wait() is called by some DBus thing: -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- 1067 static void bus_kill_exec(sd_bus *bus) { 1068 if (pid_is_valid(bus->busexec_pid) > 0) { 1069 sigterm_wait(bus->busexec_pid); 1070 bus->busexec_pid = 0; 1071 } 1072 } -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< -------- I don't know how to reproduce this code path, but I guess that if PID 1 sends a TERM to that other process, which in turn starts dumping core (because it didn't like receiving the signal), then same deadlock will happen. (In reply to Renaud Métrich from comment #1) > I don't know how to reproduce this code path, but I guess that if PID 1 > sends a TERM to that other process, which in turn starts dumping core > (because it didn't like receiving the signal), then same deadlock will > happen. This code path cannot be triggered in PID1 (or anywhere else in systemd codebase). It requires direct use of sd-bus API, like this: int main(void) { sd_bus *bus; sd_bus_new(&bus); sd_bus_set_exec(bus, path, argv); // Sets bus->exec_path sd_bus_start(bus); // Starts the command in bus->exec_path and sets bus->busexec_pid accordingly sd_bus_close(bus); // Calls bus_kill_exec() } Issue migration from Bugzilla to Jira is in process at this time. This will be the last message in Jira copied from the Bugzilla bug. This BZ has been automatically migrated to the issues.redhat.com Red Hat Issue Tracker. All future work related to this report will be managed there. Due to differences in account names between systems, some fields were not replicated. Be sure to add yourself to Jira issue's "Watchers" field to continue receiving updates and add others to the "Need Info From" field to continue requesting information. To find the migrated issue, look in the "Links" section for a direct link to the new issue location. The issue key will have an icon of 2 footprints next to it, and begin with "RHEL-" followed by an integer. You can also find this issue by visiting https://issues.redhat.com/issues/?jql= and searching the "Bugzilla Bug" field for this BZ's number, e.g. a search like: "Bugzilla Bug" = 1234567 In the event you have trouble locating or viewing this issue, you can file an issue by sending mail to rh-issues. You can also visit https://access.redhat.com/articles/7032570 for general account information. |