1897085 – spausedd lacks capability to move to root cgroup [RHEL 8]

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1897085 - spausedd lacks capability to move to root cgroup [RHEL 8]

Summary: spausedd lacks capability to move to root cgroup [RHEL 8]

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 8
Classification:	Red Hat
Component:	corosync
Sub Component:
Version:	8.3
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	rc
Target Release:	8.4
Assignee:	Jan Friesse
QA Contact:	cluster-qe@redhat.com
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1897087
TreeView+	depends on / blocked

Reported:	2020-11-12 09:43 UTC by Reid Wahl
Modified:	2021-05-18 15:26 UTC (History)
CC List:	4 users (show)
Fixed In Version:	corosync-3.1.0-3.el8
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1897087 (view as bug list)
Environment:
Last Closed:	2021-05-18 15:26:09 UTC
Type:	Bug
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Knowledge Base (Solution)	5574661	0	None	None	None	2020-11-13 05:13:12 UTC

Description Reid Wahl 2020-11-12 09:43:54 UTC

Description of problem:

If CPU accounting is enabled, then spausedd is not allowed to run with SCHED_RR priority without additional cgroup configuration. corosync works around this issue by moving itself to the root cgroup by default if it's not already in the root cgroup path. (See BZ 1476214 and corosync_move_to_root_cgroup().)

spausedd lacks such capability. So if CPU accounting is enabled, spausedd fails to obtain SCHED_RR priority unless a user has explicitly added cgroup configs to allow RT scheduling for spausedd.

spausedd should have the same "move to root cgroup" behavior as corosync.

This has been added upstream in the following commit:
  - https://github.com/jfriesse/spausedd/commit/21836a25

-----

Version-Release number of selected component (if applicable):

spausedd-3.0.3-4.el8

-----

How reproducible:

Always

-----

Steps to Reproduce:
1. Enable CPU accounting. One way to do this is to set DefaultCPUAccounting=yes in /etc/systemd/system.conf. More details in https://access.redhat.com/articles/3696121.
2. Reboot.
3. Start spausedd.service.

-----

Actual results:

If BZ 1896309 has not been fixed, spausedd crashes with SIGSEGV.

If BZ 1896309 has been fixed, spausedd starts up and logs the error "Operation not permitted (1): Can't set SCHED_RR".

-----

Expected results:

spausedd starts normally and obtains SCHED_RR priority.

Comment 1 Jan Friesse 2020-11-12 10:36:48 UTC

Patch: https://github.com/jfriesse/spausedd/commit/21836a25

Reproducer is in the first comment.

Comment 4 Simon Foucek 2020-12-04 12:40:29 UTC

Before fix(version of corosync and spausedd is 3.1.0-1, bz1896309 isn't fixed, so spausedd will throw SIGSEGV instead of "Can't set SCHED_RR"):
>[root@virt-038 ~]# rpm -q spausedd
>spausedd-3.1.0-1.el8.x86_64
>[root@virt-038 ~]# rpm -q corosync
>corosync-3.1.0-1.el8.x86_64
>[root@virt-038 ~]# cat /etc/systemd/system.conf | grep DefaultCPUAccounting
>#DefaultCPUAccounting=no
>[root@virt-038 ~]# sed -i 's/#DefaultCPUAccounting=no/DefaultCPUAccounting=yes/g' /etc/systemd/system.conf
>[root@virt-038 ~]# cat /etc/systemd/system.conf | grep DefaultCPUAccounting
>DefaultCPUAccounting=yes
>[root@virt-038 ~]# reboot
>Connection to virt-038.cluster-qe.lab.eng.brq.redhat.com closed by remote host.
>Connection to virt-038.cluster-qe.lab.eng.brq.redhat.com closed.
>[root@virt-038 ~]# systemctl start spausedd.service
>[root@virt-038 ~]# systemctl status spausedd.service
>● spausedd.service - Scheduler Pause Detection Daemon
>   Loaded: loaded (/usr/lib/systemd/system/spausedd.service; disabled; vendor preset: disabled)
>   Active: failed (Result: core-dump) since Fri 2020-12-04 13:01:18 CET; 1min 37s ago
>     Docs: man:spausedd
>  Process: 2473 ExecStart=/usr/bin/spausedd -D (code=exited, status=0/SUCCESS)
> Main PID: 2489 (code=dumped, signal=SEGV)
>      CPU: 42ms
>
>Dec 04 13:01:17 virt-038.cluster-qe.lab.eng.brq.redhat.com systemd[1]: Starting Scheduler Pause Detection Daemon...
>Dec 04 13:01:17 virt-038.cluster-qe.lab.eng.brq.redhat.com systemd[1]: Started Scheduler Pause Detection Daemon.
>Dec 04 13:01:18 virt-038.cluster-qe.lab.eng.brq.redhat.com systemd[1]: spausedd.service: Main process exited, code=dumped, status=11/SEGV
>Dec 04 13:01:18 virt-038.cluster-qe.lab.eng.brq.redhat.com systemd[1]: spausedd.service: Failed with result 'core-dump'.
>Dec 04 13:01:18 virt-038.cluster-qe.lab.eng.brq.redhat.com systemd[1]: spausedd.service: Consumed 42ms CPU time
>[root@virt-038 ~]# spausedd
>Dec 04 13:05:14 spausedd: Segmentation fault (core dumped)

Result: After DefaultCPUAccounting set to yes and reboot, spausedd.service fails with SIGSEGV error.

After fix:

>[root@virt-489 ~]# rpm -q spausedd
>spausedd-3.1.0-3.el8.x86_64
>[root@virt-489 ~]# rpm -q corosync
>corosync-3.1.0-3.el8.x86_64
>[root@virt-489 ~]# cat /etc/systemd/system.conf | grep DefaultCPUAccounting 
>#DefaultCPUAccounting=no
>[root@virt-489 ~]# sed -i 's/#DefaultCPUAccounting=no/DefaultCPUAccounting=yes/g'  /etc/systemd/system.conf 
>[root@virt-489 ~]# cat /etc/systemd/system.conf | grep DefaultCPUAccounting
>DefaultCPUAccounting=yes
>[root@virt-489 ~]# reboot
>Connection to virt-489.cluster-qe.lab.eng.brq.redhat.com closed by remote host.
>Connection to virt-489.cluster-qe.lab.eng.brq.redhat.com closed.
>[root@virt-489 ~]# systemctl start spausedd.service
>[root@virt-489 ~]# systemctl status spausedd.service
>● spausedd.service - Scheduler Pause Detection Daemon
>   Loaded: loaded (/usr/lib/systemd/system/spausedd.service; disabled; vendor preset: disabled)
>   Active: active (running) since Fri 2020-12-04 12:49:45 CET; 13min ago
>     Docs: man:spausedd
>  Process: 2312 ExecStart=/usr/bin/spausedd -D (code=exited, status=0/SUCCESS)
> Main PID: 2313 (spausedd)
>    Tasks: 1 (limit: 25573)
>   Memory: 1.7M
>      CPU: 8ms
>   CGroup: /system.slice/spausedd.service
>           └─2313 /usr/bin/spausedd -D
>
>Dec 04 12:49:45 virt-489.cluster-qe.lab.eng.brq.redhat.com systemd[1]: Starting Scheduler Pause Detection Daemon...
>Dec 04 12:49:45 virt-489.cluster-qe.lab.eng.brq.redhat.com systemd[1]: Started Scheduler Pause Detection Daemon.
>Dec 04 12:49:45 virt-489.cluster-qe.lab.eng.brq.redhat.com spausedd[2313]: Running main poll loop with maximum timeout 200 and steal threshold 10%
>[root@virt-489 ~]# spausedd
>Dec 04 12:50:03 spausedd: Running main poll loop with maximum timeout 200 and steal threshold 10%
>[root@virt-489 ~]# chrt -p $(pidof spausedd)
>pid 2313's current scheduling policy: SCHED_RR
>pid 2313's current scheduling priority: 99

Result: After DefaultCPUAccounting set to yes and reboot, spausedd.service starts normally and obtains SCHED_RR priority.

Comment 6 errata-xmlrpc 2021-05-18 15:26:09 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (corosync bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:1780

Note You need to log in before you can comment on or make changes to this bug.