Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 2117087

Summary: systemctl is-system-running sometimes returns exit code 15 in 1minutetip
Product: Red Hat Enterprise Linux 9 Reporter: Jaroslav Škarvada <jskarvad>
Component: tunedAssignee: Jaroslav Škarvada <jskarvad>
Status: CLOSED MIGRATED QA Contact: Robin Hack <rhack>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 9.1CC: dtardon, jeder, jskarvad, systemd-maint-list
Target Milestone: rcKeywords: MigratedToJIRA, Reopened
Target Release: ---Flags: pm-rhel: mirror+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-09-21 21:07:10 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jaroslav Škarvada 2022-08-09 23:22:33 UTC
Description of problem:
We are running 'systemctl is-system-running in TuneD on TuneD shutdown:
https://github.com/redhat-performance/tuned/blob/master/tuned/daemon/daemon.py#L181

If running in 1minutetip sometimes the 'systemctl is-system-running' command returns with exit code 15 and no output. It's not reproducible on RHEL-8.7.

Version-Release number of selected component (if applicable):
systemd-250-7.el9.x86_64

How reproducible:
Sometimes

Steps to Reproduce:
1. from systemd service which is shutting down, execute 'systemctl is-system-running'
2. check exitcode
3.

Actual results:
Exit code 15

Expected results:
Exit code 0

Additional info:
It looks like some race condition.

Comment 1 David Tardon 2022-08-16 08:57:39 UTC
(In reply to Jaroslav Škarvada from comment #0)
> Steps to Reproduce:
> 1. from systemd service which is shutting down, execute 'systemctl
> is-system-running'

Do you mean "service which is being stopped"? Or does it only happen if the service is being stopped _at shutdown_?

> Actual results:
> Exit code 15

That's weird. If I understand the code correctly `systemctl is-system-running` cannot return anything except 0 and 1...

Comment 2 David Tardon 2022-08-16 09:12:25 UTC
(In reply to David Tardon from comment #1)
> > Actual results:
> > Exit code 15
> 
> That's weird. If I understand the code correctly `systemctl
> is-system-running` cannot return anything except 0 and 1...

Maybe this is not an exit code but a signal number, i.e., the systemctl process has been terminated by SIGTERM (by systemd, as it cleans up tuned.service's cgroup). That would make sense and it would also explain why it only happens sometimes.

Comment 3 Jaroslav Škarvada 2022-08-17 07:23:19 UTC
(In reply to David Tardon from comment #1)
> (In reply to Jaroslav Škarvada from comment #0)
> > Steps to Reproduce:
> > 1. from systemd service which is shutting down, execute 'systemctl
> > is-system-running'
> 
> Do you mean "service which is being stopped"? Or does it only happen if the
> service is being stopped _at shutdown_?
> 
Yes, the service which is being stopped, I haven't tried it on machine shutdown.

Comment 4 Jaroslav Škarvada 2022-08-17 07:30:57 UTC
(In reply to David Tardon from comment #2)
> (In reply to David Tardon from comment #1)
> > > Actual results:
> > > Exit code 15
> > 
> > That's weird. If I understand the code correctly `systemctl
> > is-system-running` cannot return anything except 0 and 1...
> 
> Maybe this is not an exit code but a signal number, i.e., the systemctl
> process has been terminated by SIGTERM (by systemd, as it cleans up
> tuned.service's cgroup). That would make sense and it would also explain why
> it only happens sometimes.

Yes, you are right. Sorry for the noise. From the python doc for the popen returncode [1]:
> A negative value -N indicates that the child was terminated by signal N (POSIX only).

It receives -15 literally, which means SIGTERM. I will update the TuneD code to cope with it.

The remaining question is why is the TuneD service receiving the second SIGTERM? I.e. the first SIGTERM triggers the shutdown code on the line 181 of daemon.py. It's quite quick, no more than one or two seconds. But why it receives another SIGTERM while in the 'popen/exec' call trying to run 'systemctl is-system-running'? I wasn't able to reproduce this on RHEL-8.7, but on RHEL-9.1 it happens quite often.

[1] https://docs.python.org/3/library/subprocess.html#subprocess.Popen.returncode

Comment 5 David Tardon 2022-08-17 12:11:49 UTC
(In reply to Jaroslav Škarvada from comment #4)
> The remaining question is why is the TuneD service receiving the second
> SIGTERM?

That SIGTERM is received not by the TuneD service, but by the forked systemctl. systemd sends the signal to _all_ processes in the cgroup, even to newly forked ones.

> I wasn't able to reproduce this on RHEL-8.7,
> but on RHEL-9.1 it happens quite often.

I guess it's a difference between cgroups v1 and v2.

Comment 6 Jaroslav Škarvada 2022-08-18 11:25:39 UTC
(In reply to David Tardon from comment #5)
> (In reply to Jaroslav Škarvada from comment #4)
> > The remaining question is why is the TuneD service receiving the second
> > SIGTERM?
> 
> That SIGTERM is received not by the TuneD service, but by the forked
> systemctl. systemd sends the signal to _all_ processes in the cgroup, even
> to newly forked ones.
> 
> > I wasn't able to reproduce this on RHEL-8.7,
> > but on RHEL-9.1 it happens quite often.
> 
> I guess it's a difference between cgroups v1 and v2.

Thanks for info, it now makes sense.

Comment 7 Jaroslav Škarvada 2022-08-18 11:28:15 UTC
Maybe one more question, how to reliably detect whether the TuneD service is shut down by systemctl or by system shutdown/reboot? In the latter case we need to do full rollback in TuneD. I guess that in this case just calling the 'systemctl' command again from the TuneD wouldn't help.

I also don't understand why it's reproducible sometimes.

Comment 8 Jaroslav Škarvada 2022-08-18 11:29:48 UTC
We need to handle it somehow, thus reopening and moving to TuneD.

Comment 10 David Tardon 2022-08-23 07:50:35 UTC
(In reply to Jaroslav Škarvada from comment #7)
> Maybe one more question, how to reliably detect whether the TuneD service is
> shut down by systemctl or by system shutdown/reboot? 

The way you already do it. If the system is shutting down, `systemctl is-system-running` returns "stopping" (this is also available via D-Bus as org.freedesktop.systemd1.Manager#SystemState property on /org/freedesktop/systemd1).

> I also don't understand why it's reproducible sometimes.

Well, there's a race there between systemd reacting to a new process in the cgroup and the process finishing.

I can think of several ways to avoid the issue:
1. Use D-Bus API instead of calling systemctl.
2. Implement some stop signalling mechanism and use it in ExecStop=. Note that this must be synchronous, i.e., the ExecStop= command must wait till the daemon finishes stopping, as the killing of the cgroup processes will start just after the ExecStop= command finishes. (This method is discouraged in general, but it might be a suitable choice here.)
3. If the rollback can be done independently of the tuned daemon (i.e., the states/changes are somehow serialized on disk), do it from a separate script in ExecStopPost=. (This does have the additional advantage that the rollback will be done even if tuned crashes or is killed.)
4. Use KillMode=process. (This is not recommended, but possible, if you're absolutely sure that tuned doesn't leave any stray processes around. In any case, it can be used as a temp. workaround.)

Comment 15 RHEL Program Management 2023-09-21 20:52:11 UTC
Issue migration from Bugzilla to Jira is in process at this time. This will be the last message in Jira copied from the Bugzilla bug.

Comment 16 RHEL Program Management 2023-09-21 21:07:10 UTC
This BZ has been automatically migrated to the issues.redhat.com Red Hat Issue Tracker. All future work related to this report will be managed there.

Due to differences in account names between systems, some fields were not replicated.  Be sure to add yourself to Jira issue's "Watchers" field to continue receiving updates and add others to the "Need Info From" field to continue requesting information.

To find the migrated issue, look in the "Links" section for a direct link to the new issue location. The issue key will have an icon of 2 footprints next to it, and begin with "RHEL-" followed by an integer.  You can also find this issue by visiting https://issues.redhat.com/issues/?jql= and searching the "Bugzilla Bug" field for this BZ's number, e.g. a search like:

"Bugzilla Bug" = 1234567

In the event you have trouble locating or viewing this issue, you can file an issue by sending mail to rh-issues. You can also visit https://access.redhat.com/articles/7032570 for general account information.