Bug 1880452 - [cgroup v2] --memory-reservation not setting memory.high in container
Summary: [cgroup v2] --memory-reservation not setting memory.high in container
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Fedora
Classification: Fedora
Component: podman
Version: 32
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Giuseppe Scrivano
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-09-18 15:22 UTC by Severin Gehwolf
Modified: 2020-09-28 11:14 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-09-28 10:32:56 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Severin Gehwolf 2020-09-18 15:22:48 UTC
Description of problem:
When trying to set --memory-reservation via the podman command line it doesn't seem to have any effect on the memory.high cgroup v2 interface file.

Version-Release number of selected component (if applicable):
# rpm -q podman crun
podman-2.0.6-1.fc32.x86_64
crun-0.14.1-4.fc32.x86_64

How reproducible:
100%


Steps to Reproduce:
# podman run -ti --rm --memory-reservation=500m fedora:32 /bin/bash
[root@7b17acccb2d1 /]# cat /proc/self/cgroup 
0::/
[root@7b17acccb2d1 /]# cat /sys/fs/cgroup/memory.high 
max


Actual results: max


Expected results: 524288000


Additional info:
This last worked on F31 with cgroupv2

# rpm -q podman crun
podman-1.5.1-2.17.dev.gitce64c14.fc31.x86_64
crun-0.8-1.fc31.x86_64

# podman run -ti --rm --memory-reservation=500m fedora:31 /bin/bash
[root@27937c73e0a4 /]# cat /proc/self/cgroup 
0::/machine.slice/libpod-27937c73e0a4044f708ca0a57a33e7dd5fdb9b80aba572fa3cd7f250caef4e04.scope
[root@27937c73e0a4 /]# cat /sys/fs/cgroup//machine.slice/libpod-27937c73e0a4044f708ca0a57a33e7dd5fdb9b80aba572fa3cd7f250caef4e04.scope/memory.high 
524288000

Comment 1 Severin Gehwolf 2020-09-24 10:09:21 UTC
Anyone? Is this expected behaviour?

Comment 2 Daniel Walsh 2020-09-25 18:33:03 UTC
Nope looks like a bug to me.

Comment 3 Daniel Walsh 2020-09-25 18:40:18 UTC
Maybe this is something that is only supporeted in Cgroup V1

Here is the test for it.

	It("podman run memory-reservation test", func() {
		if podmanTest.Host.Distribution == "ubuntu" {
			Skip("Unable to perform test on Ubuntu distributions due to memory management")
		}

		cgroupsv2, err := cgroups.IsCgroup2UnifiedMode()
		Expect(err).To(BeNil())

		var session *PodmanSessionIntegration

		if cgroupsv2 {
			session = podmanTest.Podman([]string{"run", "--memory-reservation=40m", ALPINE, "sh", "-c", "cat /sys/fs/cgroup/$(sed -e 's|0::||' < /proc/self/cgroup)/memory.high"})
		} else {
			session = podmanTest.Podman([]string{"run", "--memory-reservation=40m", ALPINE, "cat", "/sys/fs/cgroup/memory/memory.soft_limit_in_bytes"})
		}

		session.WaitWithDefaultTimeout()
		Expect(session.ExitCode()).To(Equal(0))
		if cgroupsv2 {
			Expect(session.OutputToString()).To(Equal("max"))
		} else {
			Expect(session.OutputToString()).To(Equal("41943040"))
		}
	})


If this has no effect on V2, we should probably print a warning if the user sets it and document this behaviour in man page.

Giuseppe and Valentin, do you know for sure?

Comment 4 Giuseppe Scrivano 2020-09-28 07:52:33 UTC
--memory-reservation under cgroup v2 is mapped to `memory.low`.

Can you check the value of that file?

Comment 5 Severin Gehwolf 2020-09-28 08:28:47 UTC
(In reply to Giuseppe Scrivano from comment #4)
> --memory-reservation under cgroup v2 is mapped to `memory.low`.

Hmm, this changed from F31 cgroups v2 then. See "Additional Info" on comment 0. Is there some documentation about it somewhere? We are using those files in OpenJDK's container detection code and would like to have some references to refer to and why. Thanks!
 
> Can you check the value of that file?

Here you go. It's what you said:

$ podman run -ti --rm --memory-reservation=500m fedora:32 /bin/bash
[root@10037ca757b6 /]# cat /proc/self/cgroup
0::/
[root@10037ca757b6 /]# cat /sys/fs/cgroup/memory.high 
max
[root@10037ca757b6 /]# cat /sys/fs/cgroup/memory.low 
524288000

The question as to why this changed and whether it's going to be stable moving forward remains. Thanks, all!

Comment 6 Giuseppe Scrivano 2020-09-28 08:42:41 UTC
The conversion from cgroup v1 to cgroup v2 happens in the OCI runtime.  For crun, the conversions are documented here: https://github.com/containers/crun/blob/master/crun.1.md#cgroup-v2

memory.high is not meant to be a reservation but a limit, so the previous conversion was wrong and it was fixed in crun 0.11.

Comment 7 Severin Gehwolf 2020-09-28 10:30:43 UTC
(In reply to Giuseppe Scrivano from comment #6)
> The conversion from cgroup v1 to cgroup v2 happens in the OCI runtime.  For
> crun, the conversions are documented here:
> https://github.com/containers/crun/blob/master/crun.1.md#cgroup-v2
> 
> memory.high is not meant to be a reservation but a limit, so the previous
> conversion was wrong and it was fixed in crun 0.11.

OK, thanks!

Should 'memory.swap_max' be 'memory.swap.max' instead in that table? As per:
https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html#memory

"""
memory.swap.max

    A read-write single value file which exists on non-root cgroups. The default is “max”.

    Swap usage hard limit. If a cgroup’s swap usage reaches this limit, anonymous memory of the cgroup will not be swapped out.
"""

Comment 8 Severin Gehwolf 2020-09-28 10:32:56 UTC
Closing as per comment 5 and comment 6. This got fixed in crun 0.11 to use memory.low instead. F31 had crun 0.8 which still contained the bug.

Comment 9 Giuseppe Scrivano 2020-09-28 11:14:47 UTC
(In reply to Severin Gehwolf from comment #7)
> (In reply to Giuseppe Scrivano from comment #6)
> > The conversion from cgroup v1 to cgroup v2 happens in the OCI runtime.  For
> > crun, the conversions are documented here:
> > https://github.com/containers/crun/blob/master/crun.1.md#cgroup-v2
> > 
> > memory.high is not meant to be a reservation but a limit, so the previous
> > conversion was wrong and it was fixed in crun 0.11.
> 
> OK, thanks!
> 
> Should 'memory.swap_max' be 'memory.swap.max' instead in that table? As per:
> https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html#memory
> 
> """
> memory.swap.max
> 
>     A read-write single value file which exists on non-root cgroups. The
> default is “max”.
> 
>     Swap usage hard limit. If a cgroup’s swap usage reaches this limit,
> anonymous memory of the cgroup will not be swapped out.
> """

yes thanks for noticing it.  I'll address it


Note You need to log in before you can comment on or make changes to this bug.