Bug 2210307
| Summary: | RFE: Better visibility of currently configured cgroup limits | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 9 | Reporter: | Michal Sekletar <msekleta> |
| Component: | systemd | Assignee: | Michal Sekletar <msekleta> |
| Status: | NEW --- | QA Contact: | Frantisek Sumsal <fsumsal> |
| Severity: | medium | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 9.2 | CC: | alexander.hass, bfinger, fkrska, martin.tegtmeier, michael.trapp, systemd-maint-list |
| Target Milestone: | rc | Keywords: | FutureFeature |
| Target Release: | --- | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | Type: | Bug | |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Michal Sekletar
2023-05-26 14:08:57 UTC
Actually it is already possible to get maximum memory limit for a service that respects cgroup settings in parent units and also current memory consumption of sibling cgroups. Limit is exposed as MemoryAvailable= unit property and it is also displayed in systemctl status output.
# systemd-run --unit c.service --property Slice=a-b.slice sleep infinity
# systemd-cgls /sys/fs/cgroup/a.slice/
Directory /sys/fs/cgroup/a.slice/:
└─a-b.slice (#4645)
→ user.invocation_id: 5c4e8aa08b604b959881d4dfb0efa90e
→ trusted.invocation_id: 5c4e8aa08b604b959881d4dfb0efa90e
└─c.service (#4679)
→ user.invocation_id: 5049c3e6f19d44a9aaf4015c990e9846
→ trusted.invocation_id: 5049c3e6f19d44a9aaf4015c990e9846
└─4382 /usr/bin/sleep infinity
# systemctl set-property c.service MemoryMax=500M
# systemctl set-property a-b.slice MemoryMax=300M
# systemctl set-property a.slice MemoryMax=100M
# systemctl status c.service
● c.service - /usr/bin/sleep infinity
Loaded: loaded (/run/systemd/transient/c.service; transient)
Transient: yes
Drop-In: /run/systemd/transient/c.service.d
└─50-MemoryMax.conf
Active: active (running) since Wed 2023-07-05 08:00:31 EDT; 1min 12s ago
Main PID: 4382 (sleep)
Tasks: 1 (limit: 11116)
Memory: 200.0K (max: 500.0M available: 99.7M)
CPU: 2ms
CGroup: /a.slice/a-b.slice/c.service
└─4382 /usr/bin/sleep infinity
After executing above commands the effective maximum limit that c.service can allocate is 99.7M because some memory is already consumed by sleep process.
Hi Michal, You are right about cgroups v2 however in RHEL 8 with cgroups v1 this is broken. I’m using Red Hat Enterprise Linux release 8.7 (Ootpa) and created a “limited.slice” in systemd which limits TasksMax to 5. Then I created “someprocs.service” which is running in the limited.slice. The someprocs.service unit file doesn’t contain any limits at all. If I query the service properties for someprocs: [root@ip-172-31-33-24 ~]# systemctl status someprocs ● someprocs.service - Some processes as daemon Loaded: loaded (/etc/systemd/system/someprocs.service; disabled; vendor preset: disabled) Active: active (running) since Tue 2023-07-18 07:54:48 UTC; 8s ago Process: 5303 ExecStart=/bin/bash /home/ec2-user/test/startprocs (code=exited, status=0/SUCCESS) Tasks: 3 (limit: 23049) Memory: 668.0K CGroup: /limited.slice/someprocs.service ├─5305 /home/ec2-user/test/someproc 10 ├─5307 /home/ec2-user/test/someproc 20 └─5309 /home/ec2-user/test/someproc 30 It shows Tasks: 3 / Limit: 23049 . The parent cgroup is displayed correctly however the tasks limit is wrong and doesn’t factor in the more stringent limit of the parent cgroup. Looking at the limited.slice we can see the effective limit [root@ip-172-31-33-24 ~]# systemctl status limited.slice ● limited.slice - Slice with limited resources Loaded: loaded (/etc/systemd/system/limited.slice; static; vendor preset: disabled) Active: active since Tue 2023-07-18 07:49:22 UTC; 4min 5s ago Tasks: 3 (limit: 5) Memory: 728.0K CPU: 19ms CGroup: /limited.slice └─someprocs.service ├─5305 /home/ec2-user/test/someproc 10 ├─5307 /home/ec2-user/test/someproc 20 └─5309 /home/ec2-user/test/someproc 30 Tasks: 3 / Limit: 5 !! And this limit is enforced for someprocs - as soon as I try to fork more than 5 processes I receive a “device busy” error. But the limit is invisible to my processes. Running getrlimit(RLIMIT_NPROC) in someproc returns “14406” which is just as wrong as “23049” derived from systemd properties... As far as I can tell there are two use cases: 1. I’m a developer and want to determine effective limits for my application at runtime. This is especially tricky for memory (e.g. getrlimit and systemd properties report infinity/unlimited but malloc() + access result in OOM-kill) 2. I’m in support and ask my customer to run a support script collecting system information to determine if my application was running as intended or if the environment was somehow limited I’m not at all implying that we recommend using limits - ideally our applications run unlimited... but you never know which hardening guides, policies and third party security tools were implemented in customer setups. So we’d like to have a generic interface independent of the underlying technology (cgroup_v1, cgroup_v2, whatever) to query effective limits and obviously the results should be accurate. Thanks, -Martin |