Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 2210307

Summary: RFE: Better visibility of currently configured cgroup limits
Product: Red Hat Enterprise Linux 9 Reporter: Michal Sekletar <msekleta>
Component: systemdAssignee: Michal Sekletar <msekleta>
Status: CLOSED MIGRATED QA Contact: Frantisek Sumsal <fsumsal>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 9.2CC: alexander.hass, bfinger, fkrska, martin.tegtmeier, michael.trapp, systemd-maint-list
Target Milestone: rcKeywords: FutureFeature, MigratedToJIRA
Target Release: ---Flags: pm-rhel: mirror+
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-09-25 23:05:57 UTC Type: Story
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Michal Sekletar 2023-05-26 14:08:57 UTC
Description of problem:
Currently, it is not easy to display the cgroup resource consumption limit that a given service (cgroup) is not allowed to exceed. For example, cgroup memory controller is hierarchical and memory allocation is constrained by cgroup limits set up also on levels above the current level, hence looking only at single cgroup is not enough. 

Version-Release number of selected component (if applicable):
systemd-252-15.el9


How reproducible:
deterministic

Steps to Reproduce:
1. Log in as root
2. systemctl set-property user-0.slice MemoryLimit=500M

Actual results:
$ cat /sys/fs/cgroup/user.slice/memory.max 
max
$ cat /sys/fs/cgroup/user.slice/user-0.slice/memory.max 
524288000
$ cat /sys/fs/cgroup/user.slice/user-0.slice/session-43.scope/memory.max 
max

$ systemd-cgls
...
├─user.slice (#1236)
│ → user.invocation_id: 52637e2cc24f4c659a09749680820dc5
│ → trusted.invocation_id: 52637e2cc24f4c659a09749680820dc5
│ └─user-0.slice (#42278)
│   → user.invocation_id: 0edf11261bac4d85975271fbd55f7714
│   → trusted.invocation_id: 0edf11261bac4d85975271fbd55f7714
│   ├─session-43.scope (#42574)
│   │ ├─794118 sshd: root [priv]
│   │ ├─794121 sshd: root@pts/0
│   │ ├─794122 -bash
│   │ ├─794472 systemd-cgls
│   │ └─794473 less


Expected results:
User might not be aware that actual maximum memory limit is set to 500MB on the cgroup level above his user session scope and he might think that he is able to allocate more than that. We should introduce a new switch in systemd-cgls that would display actual limits for each level.


Additional info:

Comment 4 Michal Sekletar 2023-07-05 12:06:31 UTC
Actually it is already possible to get maximum memory limit for a service that respects cgroup settings in parent units and also current memory consumption of sibling cgroups. Limit is exposed as MemoryAvailable= unit property and it is also displayed in systemctl status output.

# systemd-run --unit c.service --property Slice=a-b.slice sleep infinity

# systemd-cgls /sys/fs/cgroup/a.slice/
Directory /sys/fs/cgroup/a.slice/:
└─a-b.slice (#4645)
  → user.invocation_id: 5c4e8aa08b604b959881d4dfb0efa90e
  → trusted.invocation_id: 5c4e8aa08b604b959881d4dfb0efa90e
  └─c.service (#4679)
    → user.invocation_id: 5049c3e6f19d44a9aaf4015c990e9846
    → trusted.invocation_id: 5049c3e6f19d44a9aaf4015c990e9846
    └─4382 /usr/bin/sleep infinity

# systemctl set-property c.service MemoryMax=500M
# systemctl set-property a-b.slice MemoryMax=300M
# systemctl set-property a.slice MemoryMax=100M

# systemctl status c.service
● c.service - /usr/bin/sleep infinity
     Loaded: loaded (/run/systemd/transient/c.service; transient)
  Transient: yes
    Drop-In: /run/systemd/transient/c.service.d
             └─50-MemoryMax.conf
     Active: active (running) since Wed 2023-07-05 08:00:31 EDT; 1min 12s ago
   Main PID: 4382 (sleep)
      Tasks: 1 (limit: 11116)
     Memory: 200.0K (max: 500.0M available: 99.7M)
        CPU: 2ms
     CGroup: /a.slice/a-b.slice/c.service
             └─4382 /usr/bin/sleep infinity

After executing above commands the effective maximum limit that c.service can allocate is 99.7M because some memory is already consumed by sleep process.

Comment 5 Martin Tegtmeier 2023-07-20 08:42:48 UTC
Hi Michal,

You are right about cgroups v2 however in RHEL 8 with cgroups v1 this is broken.

I’m using Red Hat Enterprise Linux release 8.7 (Ootpa) and created a “limited.slice” in systemd which limits TasksMax to 5.

Then I created “someprocs.service” which is running in the limited.slice. The someprocs.service unit file doesn’t contain any limits at all.
 
If I query the service properties for someprocs:
 
[root@ip-172-31-33-24 ~]# systemctl status someprocs
● someprocs.service - Some processes as daemon
   Loaded: loaded (/etc/systemd/system/someprocs.service; disabled; vendor preset: disabled)
   Active: active (running) since Tue 2023-07-18 07:54:48 UTC; 8s ago
  Process: 5303 ExecStart=/bin/bash /home/ec2-user/test/startprocs (code=exited, status=0/SUCCESS)
    Tasks: 3 (limit: 23049)
   Memory: 668.0K
   CGroup: /limited.slice/someprocs.service
           ├─5305 /home/ec2-user/test/someproc 10
           ├─5307 /home/ec2-user/test/someproc 20
           └─5309 /home/ec2-user/test/someproc 30
 
It shows Tasks: 3 / Limit: 23049 . The parent cgroup is displayed correctly however the tasks limit is wrong and doesn’t factor in the more stringent limit of the parent cgroup.

Looking at the limited.slice we can see the effective limit
 
[root@ip-172-31-33-24 ~]# systemctl status limited.slice
● limited.slice - Slice with limited resources
   Loaded: loaded (/etc/systemd/system/limited.slice; static; vendor preset: disabled)
   Active: active since Tue 2023-07-18 07:49:22 UTC; 4min 5s ago
    Tasks: 3 (limit: 5)
   Memory: 728.0K
      CPU: 19ms
   CGroup: /limited.slice
           └─someprocs.service
             ├─5305 /home/ec2-user/test/someproc 10
             ├─5307 /home/ec2-user/test/someproc 20
             └─5309 /home/ec2-user/test/someproc 30
 
Tasks: 3 / Limit: 5 !!
And this limit is enforced for someprocs - as soon as I try to fork more than 5 processes I receive a “device busy” error. But the limit is invisible to my processes. Running getrlimit(RLIMIT_NPROC) in someproc returns “14406” which is just as wrong as “23049” derived from systemd properties... 

As far as I can tell there are two use cases: 
1. I’m a developer and want to determine effective limits for my application at runtime. This is especially tricky for memory (e.g. getrlimit and systemd properties report infinity/unlimited but malloc() + access result in OOM-kill)
2. I’m in support and ask my customer to run a support script collecting system information to determine if my application was running as intended or if the environment was somehow limited

I’m not at all implying that we recommend using limits - ideally our applications run unlimited... but you never know which hardening guides, policies and third party security tools were implemented in customer setups.

So we’d like to have a generic interface independent of the underlying technology (cgroup_v1, cgroup_v2, whatever) to query effective limits and obviously the results should be accurate.

Thanks,
   -Martin

Comment 6 RHEL Program Management 2023-09-25 23:02:07 UTC
Issue migration from Bugzilla to Jira is in process at this time. This will be the last message in Jira copied from the Bugzilla bug.

Comment 7 RHEL Program Management 2023-09-25 23:05:57 UTC
This BZ has been automatically migrated to the issues.redhat.com Red Hat Issue Tracker. All future work related to this report will be managed there.

Due to differences in account names between systems, some fields were not replicated.  Be sure to add yourself to Jira issue's "Watchers" field to continue receiving updates and add others to the "Need Info From" field to continue requesting information.

To find the migrated issue, look in the "Links" section for a direct link to the new issue location. The issue key will have an icon of 2 footprints next to it, and begin with "RHEL-" followed by an integer.  You can also find this issue by visiting https://issues.redhat.com/issues/?jql= and searching the "Bugzilla Bug" field for this BZ's number, e.g. a search like:

"Bugzilla Bug" = 1234567

In the event you have trouble locating or viewing this issue, you can file an issue by sending mail to rh-issues. You can also visit https://access.redhat.com/articles/7032570 for general account information.