Bug 2117087
| Summary: | systemctl is-system-running sometimes returns exit code 15 in 1minutetip | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 9 | Reporter: | Jaroslav Škarvada <jskarvad> |
| Component: | tuned | Assignee: | Jaroslav Škarvada <jskarvad> |
| Status: | CLOSED MIGRATED | QA Contact: | Robin Hack <rhack> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 9.1 | CC: | dtardon, jeder, jskarvad, systemd-maint-list |
| Target Milestone: | rc | Keywords: | MigratedToJIRA, Reopened |
| Target Release: | --- | Flags: | pm-rhel:
mirror+
|
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2023-09-21 21:07:10 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Jaroslav Škarvada
2022-08-09 23:22:33 UTC
(In reply to Jaroslav Škarvada from comment #0) > Steps to Reproduce: > 1. from systemd service which is shutting down, execute 'systemctl > is-system-running' Do you mean "service which is being stopped"? Or does it only happen if the service is being stopped _at shutdown_? > Actual results: > Exit code 15 That's weird. If I understand the code correctly `systemctl is-system-running` cannot return anything except 0 and 1... (In reply to David Tardon from comment #1) > > Actual results: > > Exit code 15 > > That's weird. If I understand the code correctly `systemctl > is-system-running` cannot return anything except 0 and 1... Maybe this is not an exit code but a signal number, i.e., the systemctl process has been terminated by SIGTERM (by systemd, as it cleans up tuned.service's cgroup). That would make sense and it would also explain why it only happens sometimes. (In reply to David Tardon from comment #1) > (In reply to Jaroslav Škarvada from comment #0) > > Steps to Reproduce: > > 1. from systemd service which is shutting down, execute 'systemctl > > is-system-running' > > Do you mean "service which is being stopped"? Or does it only happen if the > service is being stopped _at shutdown_? > Yes, the service which is being stopped, I haven't tried it on machine shutdown. (In reply to David Tardon from comment #2) > (In reply to David Tardon from comment #1) > > > Actual results: > > > Exit code 15 > > > > That's weird. If I understand the code correctly `systemctl > > is-system-running` cannot return anything except 0 and 1... > > Maybe this is not an exit code but a signal number, i.e., the systemctl > process has been terminated by SIGTERM (by systemd, as it cleans up > tuned.service's cgroup). That would make sense and it would also explain why > it only happens sometimes. Yes, you are right. Sorry for the noise. From the python doc for the popen returncode [1]: > A negative value -N indicates that the child was terminated by signal N (POSIX only). It receives -15 literally, which means SIGTERM. I will update the TuneD code to cope with it. The remaining question is why is the TuneD service receiving the second SIGTERM? I.e. the first SIGTERM triggers the shutdown code on the line 181 of daemon.py. It's quite quick, no more than one or two seconds. But why it receives another SIGTERM while in the 'popen/exec' call trying to run 'systemctl is-system-running'? I wasn't able to reproduce this on RHEL-8.7, but on RHEL-9.1 it happens quite often. [1] https://docs.python.org/3/library/subprocess.html#subprocess.Popen.returncode (In reply to Jaroslav Škarvada from comment #4) > The remaining question is why is the TuneD service receiving the second > SIGTERM? That SIGTERM is received not by the TuneD service, but by the forked systemctl. systemd sends the signal to _all_ processes in the cgroup, even to newly forked ones. > I wasn't able to reproduce this on RHEL-8.7, > but on RHEL-9.1 it happens quite often. I guess it's a difference between cgroups v1 and v2. (In reply to David Tardon from comment #5) > (In reply to Jaroslav Škarvada from comment #4) > > The remaining question is why is the TuneD service receiving the second > > SIGTERM? > > That SIGTERM is received not by the TuneD service, but by the forked > systemctl. systemd sends the signal to _all_ processes in the cgroup, even > to newly forked ones. > > > I wasn't able to reproduce this on RHEL-8.7, > > but on RHEL-9.1 it happens quite often. > > I guess it's a difference between cgroups v1 and v2. Thanks for info, it now makes sense. Maybe one more question, how to reliably detect whether the TuneD service is shut down by systemctl or by system shutdown/reboot? In the latter case we need to do full rollback in TuneD. I guess that in this case just calling the 'systemctl' command again from the TuneD wouldn't help. I also don't understand why it's reproducible sometimes. We need to handle it somehow, thus reopening and moving to TuneD. (In reply to Jaroslav Škarvada from comment #7) > Maybe one more question, how to reliably detect whether the TuneD service is > shut down by systemctl or by system shutdown/reboot? The way you already do it. If the system is shutting down, `systemctl is-system-running` returns "stopping" (this is also available via D-Bus as org.freedesktop.systemd1.Manager#SystemState property on /org/freedesktop/systemd1). > I also don't understand why it's reproducible sometimes. Well, there's a race there between systemd reacting to a new process in the cgroup and the process finishing. I can think of several ways to avoid the issue: 1. Use D-Bus API instead of calling systemctl. 2. Implement some stop signalling mechanism and use it in ExecStop=. Note that this must be synchronous, i.e., the ExecStop= command must wait till the daemon finishes stopping, as the killing of the cgroup processes will start just after the ExecStop= command finishes. (This method is discouraged in general, but it might be a suitable choice here.) 3. If the rollback can be done independently of the tuned daemon (i.e., the states/changes are somehow serialized on disk), do it from a separate script in ExecStopPost=. (This does have the additional advantage that the rollback will be done even if tuned crashes or is killed.) 4. Use KillMode=process. (This is not recommended, but possible, if you're absolutely sure that tuned doesn't leave any stray processes around. In any case, it can be used as a temp. workaround.) Issue migration from Bugzilla to Jira is in process at this time. This will be the last message in Jira copied from the Bugzilla bug. This BZ has been automatically migrated to the issues.redhat.com Red Hat Issue Tracker. All future work related to this report will be managed there. Due to differences in account names between systems, some fields were not replicated. Be sure to add yourself to Jira issue's "Watchers" field to continue receiving updates and add others to the "Need Info From" field to continue requesting information. To find the migrated issue, look in the "Links" section for a direct link to the new issue location. The issue key will have an icon of 2 footprints next to it, and begin with "RHEL-" followed by an integer. You can also find this issue by visiting https://issues.redhat.com/issues/?jql= and searching the "Bugzilla Bug" field for this BZ's number, e.g. a search like: "Bugzilla Bug" = 1234567 In the event you have trouble locating or viewing this issue, you can file an issue by sending mail to rh-issues. You can also visit https://access.redhat.com/articles/7032570 for general account information. |