Bug 1956453
Summary: | systemd-run sets incorrect value to /sys/fs/cgroup/*/cgroup.subtree_control with cgroupv2 on RHEL9 | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 9 | Reporter: | Troy Wilson <trwilson> |
Component: | systemd | Assignee: | systemd-maint |
Status: | CLOSED NOTABUG | QA Contact: | Frantisek Sumsal <fsumsal> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | 9.0 | CC: | dtardon, llong, systemd-maint-list |
Target Milestone: | beta | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2021-05-04 16:32:11 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Troy Wilson
2021-05-03 17:51:53 UTC
(In reply to Troy Wilson from comment #0) > Description of problem: > When using systemd-run on RHEL9 with cgroupv2, systemd will put the value > "cpu" into the /sys/fs/cgroup/*/cgroup.subtree_control files, which has the > effect of destroying any existing v2 cgroup that is configured. I don't > know if systemd-run should or should not be putting anything in > /sys/fs/cgroup/*/cgroup.subtree_control, systemd-run just forwards the command (together with supplied options) to systemd, which starts it as a transient unit. but if it does, the keyword for > cgroupv2 is "cpuset", not "cpu". https://github.com/torvalds/linux/blob/master/Documentation/admin-guide/cgroup-v2.rst#cpu > Steps to Reproduce: > 1. Configure a v2 cgroup that includes cpusets How did you configure it? >> but if it does, the keyword for >> cgroupv2 is "cpuset", not "cpu". > > https://github.com/torvalds/linux/blob/master/Documentation/admin-guide/cgroup-v2.rst#cpu I stated that poorly and was thinking only in terms of cgroups, sorry. I used the cpuset controller (https://github.com/torvalds/linux/blob/master/Documentation/admin-guide/cgroup-v2.rst#cpuset) to configure a cgroup for user-1000.slice and then started a workload in that slice using systemd-run. It looks like systemd adds the cpu controller to cgroup.subtree_control if I specify a CPUQuota... but it also seems to remove the cpuset controller, which I wouldn't expect. >> Steps to Reproduce: >> 1. Configure a v2 cgroup that includes cpusets > > How did you configure it? My system has 40 CPUs, I assign 4-39 to user.slice and then 5-39 to user-1000.slice echo "+cpuset" > /sys/fs/cgroup/cgroup.subtree_control echo 4-39 > /sys/fs/cgroup/user.slice/cpuset.cpus echo "root" > /sys/fs/cgroup/user.slice/cpuset.cpus.partition echo "+cpuset" > /sys/fs/cgroup/user.slice/cgroup.subtree_control echo 5-39 > /sys/fs/cgroup/user.slice/user-1000.slice/cpuset.cpus echo root > /sys/fs/cgroup/user.slice/user-1000.slice/cpuset.cpus.partition Here is an example of what I am seeing. [root@fedora ~]# ./setup.sh echo "+cpuset" > /sys/fs/cgroup/cgroup.subtree_control echo 4-39 > /sys/fs/cgroup/user.slice/cpuset.cpus echo "root" > /sys/fs/cgroup/user.slice/cpuset.cpus.partition echo "+cpuset" > /sys/fs/cgroup/user.slice/cgroup.subtree_control echo 5-39 > /sys/fs/cgroup/user.slice/user-1000.slice/cpuset.cpus echo root > /sys/fs/cgroup/user.slice/user-1000.slice/cpuset.cpus.partition [root@fedora ~]# ./state.sh cat /sys/fs/cgroup/cgroup.subtree_control cpuset memory pids cat /sys/fs/cgroup/system.slice/cpuset.cpus cat /sys/fs/cgroup/user.slice/cpuset.cpus 4-39 cat /sys/fs/cgroup/user.slice/cpuset.cpus.partition root cat /sys/fs/cgroup/user.slice/cgroup.subtree_control cpuset memory pids cat /sys/fs/cgroup/user.slice/user-1000.slice/cpuset.cpus 5-39 cat /sys/fs/cgroup/user.slice/user-1000.slice/cpuset.cpus.partition root [root@fedora ~]# systemd-run --slice=user-1000.slice --property=CPUQuota=3500% sleep 10 Running as unit: run-r27a8c0ed364f4345b34eccc4e4871bee.service [root@fedora ~]# ./state.sh cat /sys/fs/cgroup/cgroup.subtree_control cpu memory pids <--- if I specify a CPUQuota, "cpu" replaces "cpuset" which destroys the cgroup (this is collected during the 10 seconds while the sleep runs) cat /sys/fs/cgroup/system.slice/cpuset.cpus cat: /sys/fs/cgroup/system.slice/cpuset.cpus: No such file or directory cat /sys/fs/cgroup/user.slice/cpuset.cpus cat: /sys/fs/cgroup/user.slice/cpuset.cpus: No such file or directory cat /sys/fs/cgroup/user.slice/cpuset.cpus.partition cat: /sys/fs/cgroup/user.slice/cpuset.cpus.partition: No such file or directory cat /sys/fs/cgroup/user.slice/cgroup.subtree_control cpu memory pids cat /sys/fs/cgroup/user.slice/user-1000.slice/cpuset.cpus cat: /sys/fs/cgroup/user.slice/user-1000.slice/cpuset.cpus: No such file or directory cat /sys/fs/cgroup/user.slice/user-1000.slice/cpuset.cpus.partition cat: /sys/fs/cgroup/user.slice/user-1000.slice/cpuset.cpus.partition: No such file or directory [root@fedora ~]# ./state.sh cat /sys/fs/cgroup/cgroup.subtree_control memory pids <--- after the sleep has completed, both the "cpu" and "cpuset" keywords are gone cat /sys/fs/cgroup/system.slice/cpuset.cpus cat: /sys/fs/cgroup/system.slice/cpuset.cpus: No such file or directory cat /sys/fs/cgroup/user.slice/cpuset.cpus cat: /sys/fs/cgroup/user.slice/cpuset.cpus: No such file or directory cat /sys/fs/cgroup/user.slice/cpuset.cpus.partition cat: /sys/fs/cgroup/user.slice/cpuset.cpus.partition: No such file or directory cat /sys/fs/cgroup/user.slice/cgroup.subtree_control memory pids cat /sys/fs/cgroup/user.slice/user-1000.slice/cpuset.cpus cat: /sys/fs/cgroup/user.slice/user-1000.slice/cpuset.cpus: No such file or directory cat /sys/fs/cgroup/user.slice/user-1000.slice/cpuset.cpus.partition cat: /sys/fs/cgroup/user.slice/user-1000.slice/cpuset.cpus.partition: No such file or directory [root@fedora ~]# [root@fedora ~]# [root@fedora ~]# ./setup.sh echo "+cpuset" > /sys/fs/cgroup/cgroup.subtree_control echo 4-39 > /sys/fs/cgroup/user.slice/cpuset.cpus echo "root" > /sys/fs/cgroup/user.slice/cpuset.cpus.partition echo "+cpuset" > /sys/fs/cgroup/user.slice/cgroup.subtree_control echo 5-39 > /sys/fs/cgroup/user.slice/user-1000.slice/cpuset.cpus echo root > /sys/fs/cgroup/user.slice/user-1000.slice/cpuset.cpus.partition [root@fedora ~]# ./state.sh cat /sys/fs/cgroup/cgroup.subtree_control cpuset memory pids cat /sys/fs/cgroup/system.slice/cpuset.cpus cat /sys/fs/cgroup/user.slice/cpuset.cpus 4-39 cat /sys/fs/cgroup/user.slice/cpuset.cpus.partition root cat /sys/fs/cgroup/user.slice/cgroup.subtree_control cpuset memory pids cat /sys/fs/cgroup/user.slice/user-1000.slice/cpuset.cpus 5-39 cat /sys/fs/cgroup/user.slice/user-1000.slice/cpuset.cpus.partition root [root@fedora ~]# systemd-run --slice=user-1000.slice sleep 10 Running as unit: run-rd3cd4df89d4d41dd9784be1e4956a844.service [root@fedora ~]# ./state.sh cat /sys/fs/cgroup/cgroup.subtree_control cpuset memory pids <--- if I invoke systemd-run without CPUQuota, the configured cgroup stays intact cat /sys/fs/cgroup/system.slice/cpuset.cpus cat /sys/fs/cgroup/user.slice/cpuset.cpus 4-39 cat /sys/fs/cgroup/user.slice/cpuset.cpus.partition root cat /sys/fs/cgroup/user.slice/cgroup.subtree_control cpuset memory pids cat /sys/fs/cgroup/user.slice/user-1000.slice/cpuset.cpus 5-39 cat /sys/fs/cgroup/user.slice/user-1000.slice/cpuset.cpus.partition root [root@fedora ~]# (In reply to Troy Wilson from comment #2) > >> Steps to Reproduce: > >> 1. Configure a v2 cgroup that includes cpusets > > > > How did you configure it? > > My system has 40 CPUs, I assign 4-39 to user.slice and then 5-39 to > user-1000.slice > > echo "+cpuset" > /sys/fs/cgroup/cgroup.subtree_control > echo 4-39 > /sys/fs/cgroup/user.slice/cpuset.cpus > echo "root" > /sys/fs/cgroup/user.slice/cpuset.cpus.partition > echo "+cpuset" > /sys/fs/cgroup/user.slice/cgroup.subtree_control > echo 5-39 > /sys/fs/cgroup/user.slice/user-1000.slice/cpuset.cpus > echo root > > /sys/fs/cgroup/user.slice/user-1000.slice/cpuset.cpus.partition Manual modification of cgroups owned by systemd is not supported. If you want to manage your own subhierarchy, use delegation (https://systemd.io/CGROUP_DELEGATION/). (In reply to David Tardon from comment #1) > (In reply to Troy Wilson from comment #0) > > Description of problem: > > When using systemd-run on RHEL9 with cgroupv2, systemd will put the value > > "cpu" into the /sys/fs/cgroup/*/cgroup.subtree_control files, which has the > > effect of destroying any existing v2 cgroup that is configured. I don't > > know if systemd-run should or should not be putting anything in > > /sys/fs/cgroup/*/cgroup.subtree_control, > > systemd-run just forwards the command (together with supplied options) to > systemd, which starts it as a transient unit. > > but if it does, the keyword for > > cgroupv2 is "cpuset", not "cpu". > > https://github.com/torvalds/linux/blob/master/Documentation/admin-guide/ > cgroup-v2.rst#cpu > > > Steps to Reproduce: > > 1. Configure a v2 cgroup that includes cpusets > > How did you configure it? The way cgroup v2 works is that you have to enable the specific controller level-by-level. Echoing "+cpuset" to cgroup.subtree_control, for example, will enable its children to have the cpuset controller enabled. However, the grandchildren won't have cpuset enabled. Each child has to enable it in its cgroup.subtree_control to allow the grandchildren to use cpuset. Generally speaking, you can enable all the controllers except one in all the cgroups. The exception is the cpu controller because having a nested cpu controller hierarchy will cause some performance degradation. So care must be taken to enable cpu controller. There are engineers upstream trying to fix this problem, but it will probably take a while. -Longman |