Bug 1980362

Summary: Disable CONFIG_RT_GROUP_SCHED from default kernel config or allow configuration of RT budget for cgroup v2
Product: Red Hat Enterprise Linux 9 Reporter: Jan Friesse <jfriesse>
Component: kernelAssignee: Phil Auld <pauld>
kernel sub component: Control Groups QA Contact: Chao Ye <cye>
Status: CLOSED DUPLICATE Docs Contact:
Severity: unspecified    
Priority: unspecified CC: arozansk, bhu, cfeist, cye, jlelli, kcarcia, kwenning, llong, longman, pauld
Version: 9.0   
Target Milestone: beta   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-08-09 17:16:06 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jan Friesse 2021-07-08 13:23:54 UTC
Description of problem:
RHEL kernel has CONFIG_RT_GROUP_SCHED option enabled making impossible for userspace application to gain RT priority (by calling sched_setscheduler (0, SCHED_RR, ... ) until process is moved to root cgroup. Sadly this is causing problem described in bug 1962768.

Solution recommended by systemd developers is to assign RT budget into cgroup, but sadly this looks impossible with cgroup v2 (https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html#cpu)

All and all, because it is not possible to assign RT budget I think it may be solution to simply disable CONFIG_RT_GROUP_SCHED (as in Fedora for example). Better solution would be to allow configuration of RT budget.

Version-Release number of selected component (if applicable):
5.13.0-0.rc7.51.el9

How reproducible:
100%

Steps to Reproduce:
1. Ensure  /sys/fs/cgroup/cgroup.subtree_control contains "cpu" - for example by having service with CPUQuota option
2. Run application which asks for rt priority

Actual results:
Application cannot gain rt priority, or when moves itself to root cgroup logging doesn't get correct tag (as described in bug 1962768).

Expected results:
Option 1: CONFIG_RT_GROUP_SCHED disabled so application can gain rt priotity
Option 2: It is possible to configure RT budget with cgroup v2

Comment 2 Waiman Long 2021-07-08 15:00:23 UTC
(In reply to Jan Friesse from comment #0)
> Description of problem:
> RHEL kernel has CONFIG_RT_GROUP_SCHED option enabled making impossible for
> userspace application to gain RT priority (by calling sched_setscheduler (0,
> SCHED_RR, ... ) until process is moved to root cgroup. Sadly this is causing
> problem described in bug 1962768.

Yes, this is a known issue and we are considering disabling CONFIG_RT_GROUP_SCHED for RHEL9, i.e. option 1.

-Longman

Comment 3 Phil Auld 2021-07-08 15:23:18 UTC
Fwiw, I've put in an MR against ark to disable this.

https://gitlab.com/cki-project/kernel-ark/-/merge_requests/1177

It could use more acks :)

Comment 4 Waiman Long 2021-07-08 15:26:38 UTC
(In reply to Phil Auld from comment #3)
> Fwiw, I've put in an MR against ark to disable this.
> 
> https://gitlab.com/cki-project/kernel-ark/-/merge_requests/1177
> 
> It could use more acks :)

I am aware you have an outstanding kernel-ark MR. I have just ack'ed it.

-Longman

Comment 5 Waiman Long 2021-07-08 15:27:24 UTC
(In reply to Waiman Long from comment #4)
> (In reply to Phil Auld from comment #3)
> > Fwiw, I've put in an MR against ark to disable this.
> > 
> > https://gitlab.com/cki-project/kernel-ark/-/merge_requests/1177
> > 
> > It could use more acks :)
> 
> I am aware you have an outstanding kernel-ark MR. I have just ack'ed it.

Maybe you can update the MR description to add this BZ.

-Longman

Comment 6 Phil Auld 2021-07-08 15:44:30 UTC
ARK changes don't usually get RHEL BZs associated with them do they? I can certainly add it to the MR comment though if you want.

Comment 7 Waiman Long 2021-07-08 15:47:19 UTC
(In reply to Phil Auld from comment #6)
> ARK changes don't usually get RHEL BZs associated with them do they? I can
> certainly add it to the MR comment though if you want.

I am OK if you don't want to add it. It is up to you.

Comment 8 Phil Auld 2021-07-08 15:58:51 UTC
Done. I found some other ark patch emails that referenced bugzilla.redhat.com bugs so it seems okay to me.

Comment 9 Klaus Wenninger 2021-07-09 06:36:00 UTC
Being in charge of a userspace-daemon (SBD = Storage Based Death) that needs RT-scheduling I ran into similar issues with CONFIG_RT_GROUP_SCHED - already some time back when I tried to extrapolate how RHEL9 kernel might look like from f31 kernel.

Adding my comment here as I had tried to identify the situation where extra actions had to be taken because of CONFIG_RT_GROUP_SCHED being set.
And the only way I found was checking for "rt_rq[...]:/" in /proc/sched_debug (given it is a kernel with /proc/config disabled).
Grepping in debug-statistics is of course surper-ugly.
Reference to my code playing with the issue: https://github.com/wenningerk/sbd/commit/763e3a0c2ca87c76dddf598ba3355811300ffa59

Finally it turned of course out that there were no real options to properly deal with the issue (as mentioned above already).
But still proper detection and reporting of a doomed kernel-config would be desirable (even if it might not trigger in final RHEL9 once CONFIG_RT_GROUP_SCHED is disabled).

Comment 10 Phil Auld 2021-08-09 17:16:06 UTC

*** This bug has been marked as a duplicate of bug 1971867 ***