1949438 – Builds are failing with cgroupsv2 enabled

Bug 1949438 - Builds are failing with cgroupsv2 enabled

Summary: Builds are failing with cgroupsv2 enabled

Keywords:
Status:	CLOSED DEFERRED
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Build
Sub Component:
Version:	4.8
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Gabe Montero
QA Contact:	wewang
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-04-14 09:53 UTC by Vadim Rutkovsky
Modified:	2021-06-03 17:49 UTC (History)
CC List:	12 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-06-03 17:49:53 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift builder pull 246	0	None	open	Bug 1949438: check both cgroupv1 and cgroupv2 files for quota	2021-05-27 16:40:18 UTC

Description Vadim Rutkovsky 2021-04-14 09:53:04 UTC

Description of problem:
Any Build would fail with
```
failed to retrieve cgroup limits: cannot determine cgroup limits: open /sys/fs/cgroup/memory/memory.limit_in_bytes: no such file or directory
```
when kubelets have cgroupsv2 support enabled.
See test results at https://github.com/openshift/okd-machine-os/pull/88.

Epic link: https://issues.redhat.com/browse/OCPNODE-404

Comment 1 Adam Kaplan 2021-04-14 12:19:40 UTC

@Vadim I have a few questions regarding cgroups v2:

1. Is cgroups v2 targeted for 4.8 or 4.9?
2. Will cgroups v2 be the only supported version of cgroups for OpenShift, or will cluster admins be able to choose/change the cgroup version at install time?

Comment 2 Vadim Rutkovsky 2021-04-14 13:21:31 UTC

(In reply to Adam Kaplan from comment #1)
> @Vadim I have a few questions regarding cgroups v2:
> 
> 1. Is cgroups v2 targeted for 4.8 or 4.9?

IIUC its a goal for 4.9 or maybe 4.10 even. Mrunal might know more

> 2. Will cgroups v2 be the only supported version of cgroups for OpenShift,
> or will cluster admins be able to choose/change the cgroup version at
> install time?

cgroupsv2 is optional, most likely its not default in 4.9

Comment 3 Adam Kaplan 2021-04-15 12:29:49 UTC

Marking this not a blocker, moving to the Containers team.

My understanding is that we need an updated version of runc added to openshift/builder for this to work. We are currently using the RHEL 8 version of runc, which may not support cgroups v2.

Comment 4 Tom Sweeney 2021-04-15 20:37:53 UTC

Giuseppe, can you take a look at this please?

Comment 5 Giuseppe Scrivano 2021-04-15 21:34:41 UTC

this error must be fixed in: https://github.com/openshift/builder/blob/master/pkg/build/builder/util_linux.go#L16

My suggestion is to attempt reading /sys/fs/cgroup/memory.max when readInt64("/sys/fs/cgroup/memory/memory.limit_in_bytes") fails with ENOENT, paying attention to handle the string "max" for no limits.

Comment 9 wewang 2021-06-01 08:01:48 UTC

Hi, Vadim and Gabe, could you help to check if my verified steps are enough? thanks

Steps:
 1. Create machineconfig object to enable cgroupv2 for workers
```
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker
  name: 05-worker-kernelarg-cgroupv2
spec:
  config:
    ignition:
      version: 3.2.0
  kernelArguments:
    - systemd.unified_cgroup_hierarchy=0

```
 2. Verify if hosts are added the kernelarguments
$ oc debug node/ci-ln-bvqm7g2-f76d1-hcwzl-worker-d-5nhln
sh-4.4# cat /host/proc/cmdline 
BOOT_IMAGE=(hd0,gpt3)/ostree/rhcos-6410d15a4729a68864917936f86ae1b2ee5cf9bd8867ad400687cb8349929e7b/vmlinuz-4.18.0-305.el8.x86_64 random.trust_cpu=on console=tty0 console=ttyS0,115200n8 ostree=/ostree/boot.1/rhcos/6410d15a4729a68864917936f86ae1b2ee5cf9bd8867ad400687cb8349929e7b/0 ignition.platform.id=gcp root=UUID=d71f572e-3495-4828-b736-770d10ce03f3 rw rootflags=prjquota systemd.unified_cgroup_hierarchy=0

3. Start a build, check build complete
$oc new-app openshift/ruby~https://github.com/openshift/ruby-hello-world

Comment 10 Gabe Montero 2021-06-01 18:39:28 UTC

Actually Wen I would suggest that minimally your MachineConfig objects should look more like the template samples in https://github.com/openshift/installer/pull/4648/files

I believe you'll see there that the setting systemd.unified_cgroup_hierarchy=1 is "how" they achieved Vadim's suggestion of "removing" instances of "systemd.unified_cgroup_hierarchy=0"

Pending any clarifications with confidence from Vadim, base the fact that Ryan Phillips from the node team authored that POC installer PR, and he is co-author of the EP
for cgroups v2 support, https://github.com/openshift/enhancements/pull/652/files, you might want to reach out to him directly.

Among other things, in that EP, they say for testing, to try his installer PR with cluster bot

Conceivably, you could launch a cluster from cluster bot with both his PR and my openshift/builder PR.

But certainly run that "test both PRs with cluster bot" idea by Ryan in case their is some more up to date suggested approach from him.

Comment 11 wewang 2021-06-02 08:35:05 UTC

Hi Gabe, now pr4648 needs to rebase, so cannot use it to launch a cluster, now I just use builder pr to install a cluster and enable cgroupv2 in workers and masters manually.but cannot find the following error,
the file should be /sys/fs/cgroup/memory.max in worker host,right? but i cannot find it.

Steps:
1. Launched a cluster with builder pr
2. Create machineconfig objects to enable cgroupsv2 for masters and workers.
```
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: master
  name: 99-openshift-cgroupsv2-master-kargs
spec:
  kernelArguments:
    - systemd.unified_cgroup_hierarchy=1
    - cgroup_no_v1="all"
    - psi=1 
```
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker
  name: 99-openshift-cgroupsv2-worker-kargs
spec:
  kernelArguments:
    - systemd.unified_cgroup_hierarchy=1
    - cgroup_no_v1="all"
    - psi=1 
```
3. Check the info
$ oc debug node/ci-ln-f65t1vk-f76d1-6kmf7-worker-c-vcb4h
sh-4.4# cat /host/proc/cmdline 
BOOT_IMAGE=(hd0,gpt3)/ostree/rhcos-b232649c3951d8c9399c855f00d8722f89a69a3a92a398dced90a9f984509131/vmlinuz-4.18.0-305.3.1.el8_4.x86_64 random.trust_cpu=on console=tty0 console=ttyS0,115200n8 ostree=/ostree/boot.1/rhcos/b232649c3951d8c9399c855f00d8722f89a69a3a92a398dced90a9f984509131/0 ignition.platform.id=gcp root=UUID=f09346cb-d456-48c3-8eac-4fd7c235d89d rw rootflags=prjquota systemd.unified_cgroup_hierarchy=1 cgroup_no_v1=all psi=1

sh-4.4# ls /sys/fs/cgroup/
cgroup.controllers  cgroup.max.descendants  cgroup.stat		    cgroup.threads  cpuset.cpus.effective  init.scope	io.stat		memory.pressure  system.slice
cgroup.max.depth    cgroup.procs	    cgroup.subtree_control  cpu.pressure    cpuset.mems.effective  io.pressure	kubepods.slice	memory.stat	 user.slice

4. Create a build
$ oc logs -f build/ruby-hello-world-1
Cloning "https://github.com/openshift/ruby-hello-world" ...
	Commit:	a98345e28783279b55bd5627ecaec80a687efa5d (Merge pull request #129 from pvalena/refresh)
	Author:	Ben Parees <bparees.github.com>
	Date:	Thu May 6 17:50:47 2021 -0400
Caching blobs under "/var/cache/blobs".
error: failed to retrieve cgroup limits: cannot determine cgroup limits: open /sys/fs/cgroup/memory.max: no such file or directory

Comment 12 Gabe Montero 2021-06-02 13:21:20 UTC

Yeah if "/sys/fs/cgroup/memory.max" does not exist, my understanding would be that nodes in question are not in "proper" cgroupv2 mode.

I don't have any expertise or knowledge beyond what I see in the EP I referenced last time.  Again, I would suggest, if you have not already,
reaching out to Ryan and the node team to review your steps and help.  Or maybe you can get him to rebase that WIP PR so you can install from
that.

Comment 13 Kir Kolyshkin 2021-06-03 01:36:06 UTC

> The file should be /sys/fs/cgroup/memory.max in worker host,right? but i cannot find it.

There is no "memory.max" in the top-level cgroup v2 hierarchy, as top-level cgroup does not have memory limit.

I noticed that while reviewing https://github.com/openshift/builder/pull/246 but then I thought this is done
inside a container, not on the host -- and in a container we do have that file.

Now, the proper check whether the host in cgroupv2 "unified hierarchy" mode,
also suitable to be used from scripts, is something like this:

$ stat -f -c %T /sys/fs/cgroup
cgroup2fs

The result should be "cgroup2fs". Old versions of stat utility might not know about this fs type,
so the alternative is

[kir@kir-rhat runc]$ stat -f -c %t /sys/fs/cgroup
63677270

The output should be 63677270, which is a magic number for cgroup2fs.

Hope that helps.

Comment 14 wewang 2021-06-03 12:51:31 UTC

Thanks kir, I can see the result according your comments
[wewang@wangwen work]$ oc debug node/ip-10-0-201-169.us-east-2.compute.internal
Creating debug namespace/openshift-debug-node-8nzrq ...
Starting pod/ip-10-0-201-169us-east-2computeinternal-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.0.201.169
If you don't see a command prompt, try pressing enter.
sh-4.4# stat -f -c %t /sys/fs/cgroup
63677270
sh-4.4# stat -f -c %T /sys/fs/cgroup
cgroup2fs
sh-4.4# exit

[wewang@wangwen work]$ oc logs -f build/ruby-hello-world-2
Cloning "https://github.com/openshift/ruby-hello-world" ...
	Commit:	a98345e28783279b55bd5627ecaec80a687efa5d (Merge pull request #129 from pvalena/refresh)
	Author:	Ben Parees <bparees.github.com>
	Date:	Thu May 6 17:50:47 2021 -0400
Caching blobs under "/var/cache/blobs".
error: failed to retrieve cgroup limits: cannot determine cgroup limits: open /sys/fs/cgroup/memory.max: no such file or directory

Comment 15 Ryan Phillips 2021-06-03 13:08:10 UTC

We don't support cgroupsv2 within Openshift CI yet. This bug should be marked closed.

Comment 16 Vadim Rutkovsky 2021-06-03 15:23:02 UTC

Reopening, as cgroupsv2 support is techpreview in 4.9. Its not a blocker for 4.8, but we still need this ticket to merge fixes

Comment 17 Gabe Montero 2021-06-03 15:53:30 UTC

I think a Jira on the build api team's board would be a more appropriate tracking mechanism, linked to https://issues.redhat.com/browse/OCPNODE-404

I've cc:ed Adam - Adam - is tracking cgroupv2 support in 4.9 via Jira instead of bugzilla the preferred method here, or are you OK leaving it as a bugzilla?

Comment 18 Adam Kaplan 2021-06-03 17:49:53 UTC

I've added this to our JIRA board: https://issues.redhat.com/browse/BUILD-278

Closing this as deferred.

Note You need to log in before you can comment on or make changes to this bug.