1972209 – Under load, container failed to be created due to missing cgroup scope

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1972209 - Under load, container failed to be created due to missing cgroup scope

Summary: Under load, container failed to be created due to missing cgroup scope

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 8
Classification:	Red Hat
Component:	runc
Sub Component:
Version:	8.4
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	beta
Target Release:	---
Assignee:	Jindrich Novy
QA Contact:	Alex Jia
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	2019335 (view as bug list)
Depends On:
Blocks:	1982460 1990406 2000570 2019335 2021325
TreeView+	depends on / blocked

Reported:	2021-06-15 12:34 UTC by Damien Ciabrini
Modified:	2022-02-17 02:55 UTC (History)
CC List:	22 users (show)
Fixed In Version:	runc-1.0.0-72.rc92.el8_4
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1990406 2000570 2019335 2021325 (view as bug list)
Environment:
Last Closed:	2021-11-09 17:38:22 UTC
Type:	Bug
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Ceph Project Bug Tracker	49287	0	None	None	None	2021-07-13 11:34:02 UTC
Red Hat Product Errata	RHSA-2021:4154	0	None	None	None	2021-11-09 17:39:24 UTC

Internal Links: 1981773

Description Damien Ciabrini 2021-06-15 12:34:47 UTC

Description of problem:

Context: creating podman containers on an VM environment with moderate load. Hypervisor is hosting 10 VM in total, slightly overcommitting CPU, but not short on RAM, and decent IO workload.


When we reboot the VM, on restart, we have around 40 podman containers that get restarted, and 5 new containers are created and started in parallel.

Sometimes, the creation of those new containers is failing [1], in what seems to be a race between podman, runc and/or systemd. the podman run commands errors out with exit code 127 and the following error message:

    Error: OCI runtime error: container_linux.go:370: starting container process caused: process_linux.go:459: container init caused: process_linux.go:422: setting cgroup config for procHooks process caused: Unit libpod-b634df465802f29636c6ff5e2e23d04b4392da4142577f83bd4c7143adca7c31.scope not found.

This seems to be runc complaining that the cgroup that is it supposed to configure does not exist.

The failure seems to happens randomly on any of the 5 containers that we are creating. The creation command looks like the following:

    podman run -d --name=galera-bundle-podman-0 -e PCMK_stderr=1 --net=host -e PCMK_remote_port=3123 -v /var/lib/kolla/config_files/mysql.json:/var/lib/kolla/config_files/config.json:ro -v /var/lib/config-data/puppet-generated/mysql/:/var/lib/kolla/config_files/src:ro -v /etc/hosts:/etc/hosts:ro -v /etc/localtime:/etc/localtime:ro -v /var/lib/mysql:/var/lib/mysql:rw -v /var/log/mariadb:/var/log/mariadb:rw -v /var/log/containers/mysql:/var/log/mysql:rw -v /dev/log:/dev/log:rw -v /etc/pacemaker/authkey:/etc/pacemaker/authkey -v /var/log/pacemaker/bundles/galera-bundle-0:/var/log --user=root --log-driver=k8s-file --log-opt path=/var/log/containers/stdouts/galera-bundle.log -e KOLLA_CONFIG_STRATEGY=COPY_ALWAYS cluster.common.tag/rhosp16-openstack-mariadb:pcmklatest /bin/bash /usr/local/bin/kolla_start

In our env, this seems to happen when many containers are being created/started concurrently. So far, we don't see this error on all our VMs, but one scenario triggers that race pretty consistently (I'd say >50%). This scenario is probably the most load-heavy for our hypervisor, but I don't have hard evidence to back that up yet.

Also worth noting, this happens in our Openstack testing, but looking at [2], it seems that the very same error has also been witnessed in Ceph testing some time ago, under the identified condition of a heavy IO workload.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1967128
[2] https://tracker.ceph.com/issues/41037

Version-Release number of selected component (if applicable):
podman-3.0.1-6.module+el8.4.0+10614+dd38312c.x86_64
runc-1.0.0-70.rc92.module+el8.4.0+10614+dd38312c.x86_64
systemd-239-45.el8.x86_64


How reproducible:
Fairly high (>50% under load)

Steps to Reproduce:
1. Restart a few dozens of existing containers concurrently
2. Create a couple of new containers at the same time

Actual results:
Sometimes podman can't run the created container because runc will fail to configure the specified cgroup.

Expected results:
Podman run should always work

Additional info:

Comment 1 Matthew Heon 2021-06-15 13:18:08 UTC

Any chance you can try with `crun` instead of `runc` on a fresh system? Podman isn't responsible for creating that cgroup, so I suspect this is a race somewhere in runc, and testing with crun will reveal that.

Comment 2 Damien Ciabrini 2021-06-15 13:42:06 UTC

(In reply to Matthew Heon from comment #1)
> Any chance you can try with `crun` instead of `runc` on a fresh system?
> Podman isn't responsible for creating that cgroup, so I suspect this is a
> race somewhere in runc, and testing with crun will reveal that.

I will run a couple a tests with crun and report if i see any occurrence of it. Unfortunately each test is about 2h30 to 3h so it might take some time to report back.

Meanwhile, i couldn't spot who is responsible for creating the cgroup from source, but this error message has been reported by runc (the error message comes from it), so that would tend to validate your initia suspicion.

Comment 3 Damien Ciabrini 2021-06-18 17:54:07 UTC

After some config changes on the node under test, all the containers have been recreated to use crun instead of runc. That equates to 47 containers on the host, among which 8 are re-created after each reboot.

I did 100 reboots with this new setup, under the same load as originally reported, and I couldn't replicate my issue when podman targets the crun runtime.

Comment 4 Jindrich Novy 2021-06-21 09:34:12 UTC

Reassigning to runc as comment #3 proves it is a runc's race as Matt mentions in comment #1.

Comment 5 Tom Sweeney 2021-06-21 13:26:05 UTC

Kir,  can you take a look at this, please?

Dan or Mrunal, if someone else should take a look, please let me know.

Comment 6 Sebastian Wagner 2021-07-13 11:34:02 UTC

FYI, this issue also affects Ceph: https://tracker.ceph.com/issues/49287 . This *might* also affect RHCS 5, but I haven't seen this race yet in downstream.

Comment 7 Yaniv Kaul 2021-07-21 11:35:55 UTC

(In reply to Tom Sweeney from comment #5)
> Kir,  can you take a look at this, please?
> 
> Dan or Mrunal, if someone else should take a look, please let me know.

I'd appreciate if you can provide an update, as it impacts both RHOSP 16.2 as well as Ceph (potentially RHCS 5.0), both of which are to be released soon.

Comment 8 Daniel Walsh 2021-07-21 16:50:32 UTC

I would guess we would ask you to test with the latest runc 1.0.1, which was recently released.  Of course maybe transitioning to crun is the best idea.

Comment 9 Kir Kolyshkin 2021-07-22 16:17:06 UTC

This is indeed a race in runc, which was fixed by https://github.com/opencontainers/runc/pull/2614, which is part of runc v1.0.0-rc93. So any recent runc should be fine (1.0.1 is recommended though).

I can't find it at the moment which runc is available via rhel8 container-tools, but I hope it's recent.

Comment 10 Daniel Walsh 2021-07-22 16:31:50 UTC

Ok let's just say that this is fixed in runc 1.0.1

Comment 11 Tom Sweeney 2021-07-22 18:15:58 UTC

Jindrich, I think this one is in your purview, please reroute if not.  Setting to Post for any further BZ or packaging needs.

Comment 19 Laurie Friedman 2021-08-05 22:05:31 UTC

@jnovy It is too late to make any changes for 8.4.0.2. The final compose is already done.  But you could make the change in 8.4.0.3 in 6 weeks.

Comment 20 Jindrich Novy 2021-08-06 05:19:40 UTC

Proposed this for zstream in bug 1990406 then. Thanks.

Comment 21 Alex Jia 2021-08-08 12:40:22 UTC

I can't hit this issue on runc-1.0.1-5.module+el8.5.0+12157+04f1d6be
w/ podman-3.3.0-2.module+el8.5.0+12157+04f1d6be.

Comment 22 Sebastian Wagner 2021-08-23 09:23:05 UTC

Seems that centos's container-tools 3.0 is also affected by this: https://pulpito.ceph.com/swagner-2021-08-20_11:35:16-rados:cephadm-wip-swagner2-testing-2021-08-18-1238-pacific-distro-basic-smithi/6349346/ Is there a plan to get it into container-tools 3.0 as well?

Comment 23 Tom Sweeney 2021-08-23 13:21:00 UTC

@jnovy do you know the answer to Sebastian's question: https://bugzilla.redhat.com/show_bug.cgi?id=1972209#c22?  Is it possible to update 3.0, or has the window closed?

Comment 24 Jindrich Novy 2021-08-26 09:19:04 UTC

Sebastian, please file a separate bug for 3.0 stream if you believe a backport is required there too. Thanks.

Comment 25 Damien Ciabrini 2021-09-02 12:11:13 UTC

Hey Jindrich, Tom and Sebastien,

I just cloned this bz into https://bugzilla.redhat.com/show_bug.cgi?id=2000570, to track the backport of this fix for container-tools 3.0 in rhel 8.4, as it is what we're consuming in RHOSP 16.2.

Thanks

Comment 26 Sebastian Wagner 2021-11-02 09:39:39 UTC

Sorry for my ignorance here, but we're still seeing this bug multiple times a day in upstream Ceph using CentOS's container-tools:3.0. That's why I cloned this into 2019335

Comment 27 Jindrich Novy 2021-11-08 08:54:52 UTC

*** Bug 2019335 has been marked as a duplicate of this bug. ***

Comment 28 Jindrich Novy 2021-11-08 08:56:45 UTC

Manual cloning will not work, this needs to follow the zstream cloning process.

Laurie, Derrick, can you please z+ this so I can update runc in 3.0-8.4.0?

Comment 29 Sebastian Wagner 2021-11-08 08:58:30 UTC

relates to https://github.com/ceph/ceph/pull/43813

Comment 35 errata-xmlrpc 2021-11-09 17:38:22 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: container-tools:rhel8 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:4154

Comment 36 Jindrich Novy 2021-11-10 13:22:53 UTC

The patch mentioned in comment #9 in bug 1972209 is already applied in the runc-1.0.0-72.rc92.el8_4 which was already released in 3.0-8.4.0 via https://access.redhat.com/errata/RHBA-2021:4093 - so no need for cloning/updates in 3.0-8.4.0

Note You need to log in before you can comment on or make changes to this bug.

bbaude
bdobreli
dornelas
dwalsh
ekuris
gfidente
jligon
jnovy
kir
leiwang
lfriedma
lmiccini
lsm5
mheon
michele
mpatel
pthomas
sewagner
snanda
tsweeney
umohnani
ypu