Bug 1880452

Summary: [cgroup v2] --memory-reservation not setting memory.high in container
Product: [Fedora] Fedora Reporter: Severin Gehwolf <sgehwolf>
Component: podmanAssignee: Giuseppe Scrivano <gscrivan>
Status: CLOSED NOTABUG QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 32CC: bbaude, debarshir, dwalsh, gscrivan, jnovy, lsm5, mheon, rh.container.bot, santiago
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-09-28 10:32:56 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Severin Gehwolf 2020-09-18 15:22:48 UTC
Description of problem:
When trying to set --memory-reservation via the podman command line it doesn't seem to have any effect on the memory.high cgroup v2 interface file.

Version-Release number of selected component (if applicable):
# rpm -q podman crun
podman-2.0.6-1.fc32.x86_64
crun-0.14.1-4.fc32.x86_64

How reproducible:
100%


Steps to Reproduce:
# podman run -ti --rm --memory-reservation=500m fedora:32 /bin/bash
[root@7b17acccb2d1 /]# cat /proc/self/cgroup 
0::/
[root@7b17acccb2d1 /]# cat /sys/fs/cgroup/memory.high 
max


Actual results: max


Expected results: 524288000


Additional info:
This last worked on F31 with cgroupv2

# rpm -q podman crun
podman-1.5.1-2.17.dev.gitce64c14.fc31.x86_64
crun-0.8-1.fc31.x86_64

# podman run -ti --rm --memory-reservation=500m fedora:31 /bin/bash
[root@27937c73e0a4 /]# cat /proc/self/cgroup 
0::/machine.slice/libpod-27937c73e0a4044f708ca0a57a33e7dd5fdb9b80aba572fa3cd7f250caef4e04.scope
[root@27937c73e0a4 /]# cat /sys/fs/cgroup//machine.slice/libpod-27937c73e0a4044f708ca0a57a33e7dd5fdb9b80aba572fa3cd7f250caef4e04.scope/memory.high 
524288000

Comment 1 Severin Gehwolf 2020-09-24 10:09:21 UTC
Anyone? Is this expected behaviour?

Comment 2 Daniel Walsh 2020-09-25 18:33:03 UTC
Nope looks like a bug to me.

Comment 3 Daniel Walsh 2020-09-25 18:40:18 UTC
Maybe this is something that is only supporeted in Cgroup V1

Here is the test for it.

	It("podman run memory-reservation test", func() {
		if podmanTest.Host.Distribution == "ubuntu" {
			Skip("Unable to perform test on Ubuntu distributions due to memory management")
		}

		cgroupsv2, err := cgroups.IsCgroup2UnifiedMode()
		Expect(err).To(BeNil())

		var session *PodmanSessionIntegration

		if cgroupsv2 {
			session = podmanTest.Podman([]string{"run", "--memory-reservation=40m", ALPINE, "sh", "-c", "cat /sys/fs/cgroup/$(sed -e 's|0::||' < /proc/self/cgroup)/memory.high"})
		} else {
			session = podmanTest.Podman([]string{"run", "--memory-reservation=40m", ALPINE, "cat", "/sys/fs/cgroup/memory/memory.soft_limit_in_bytes"})
		}

		session.WaitWithDefaultTimeout()
		Expect(session.ExitCode()).To(Equal(0))
		if cgroupsv2 {
			Expect(session.OutputToString()).To(Equal("max"))
		} else {
			Expect(session.OutputToString()).To(Equal("41943040"))
		}
	})


If this has no effect on V2, we should probably print a warning if the user sets it and document this behaviour in man page.

Giuseppe and Valentin, do you know for sure?

Comment 4 Giuseppe Scrivano 2020-09-28 07:52:33 UTC
--memory-reservation under cgroup v2 is mapped to `memory.low`.

Can you check the value of that file?

Comment 5 Severin Gehwolf 2020-09-28 08:28:47 UTC
(In reply to Giuseppe Scrivano from comment #4)
> --memory-reservation under cgroup v2 is mapped to `memory.low`.

Hmm, this changed from F31 cgroups v2 then. See "Additional Info" on comment 0. Is there some documentation about it somewhere? We are using those files in OpenJDK's container detection code and would like to have some references to refer to and why. Thanks!
 
> Can you check the value of that file?

Here you go. It's what you said:

$ podman run -ti --rm --memory-reservation=500m fedora:32 /bin/bash
[root@10037ca757b6 /]# cat /proc/self/cgroup
0::/
[root@10037ca757b6 /]# cat /sys/fs/cgroup/memory.high 
max
[root@10037ca757b6 /]# cat /sys/fs/cgroup/memory.low 
524288000

The question as to why this changed and whether it's going to be stable moving forward remains. Thanks, all!

Comment 6 Giuseppe Scrivano 2020-09-28 08:42:41 UTC
The conversion from cgroup v1 to cgroup v2 happens in the OCI runtime.  For crun, the conversions are documented here: https://github.com/containers/crun/blob/master/crun.1.md#cgroup-v2

memory.high is not meant to be a reservation but a limit, so the previous conversion was wrong and it was fixed in crun 0.11.

Comment 7 Severin Gehwolf 2020-09-28 10:30:43 UTC
(In reply to Giuseppe Scrivano from comment #6)
> The conversion from cgroup v1 to cgroup v2 happens in the OCI runtime.  For
> crun, the conversions are documented here:
> https://github.com/containers/crun/blob/master/crun.1.md#cgroup-v2
> 
> memory.high is not meant to be a reservation but a limit, so the previous
> conversion was wrong and it was fixed in crun 0.11.

OK, thanks!

Should 'memory.swap_max' be 'memory.swap.max' instead in that table? As per:
https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html#memory

"""
memory.swap.max

    A read-write single value file which exists on non-root cgroups. The default is “max”.

    Swap usage hard limit. If a cgroup’s swap usage reaches this limit, anonymous memory of the cgroup will not be swapped out.
"""

Comment 8 Severin Gehwolf 2020-09-28 10:32:56 UTC
Closing as per comment 5 and comment 6. This got fixed in crun 0.11 to use memory.low instead. F31 had crun 0.8 which still contained the bug.

Comment 9 Giuseppe Scrivano 2020-09-28 11:14:47 UTC
(In reply to Severin Gehwolf from comment #7)
> (In reply to Giuseppe Scrivano from comment #6)
> > The conversion from cgroup v1 to cgroup v2 happens in the OCI runtime.  For
> > crun, the conversions are documented here:
> > https://github.com/containers/crun/blob/master/crun.1.md#cgroup-v2
> > 
> > memory.high is not meant to be a reservation but a limit, so the previous
> > conversion was wrong and it was fixed in crun 0.11.
> 
> OK, thanks!
> 
> Should 'memory.swap_max' be 'memory.swap.max' instead in that table? As per:
> https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html#memory
> 
> """
> memory.swap.max
> 
>     A read-write single value file which exists on non-root cgroups. The
> default is “max”.
> 
>     Swap usage hard limit. If a cgroup’s swap usage reaches this limit,
> anonymous memory of the cgroup will not be swapped out.
> """

yes thanks for noticing it.  I'll address it