Bug 1972209

Summary:	Under load, container failed to be created due to missing cgroup scope
Product:	Red Hat Enterprise Linux 8	Reporter:	Damien Ciabrini <dciabrin>
Component:	runc	Assignee:	Jindrich Novy <jnovy>
Status:	CLOSED ERRATA	QA Contact:	Alex Jia <ajia>
Severity:	high	Docs Contact:
Priority:	high
Version:	8.4	CC:	bbaude, bdobreli, dornelas, dwalsh, ekuris, gfidente, jligon, jnovy, kir, leiwang, lfriedma, lmiccini, lsm5, mheon, michele, mpatel, pthomas, sewagner, snanda, tsweeney, umohnani, ypu
Target Milestone:	beta	Keywords:	ZStream
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	runc-1.0.0-72.rc92.el8_4	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:
Clones:	1990406 2000570 2019335 2021325 (view as bug list)		Environment:
Last Closed:	2021-11-09 17:38:22 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1982460, 1990406, 2000570, 2019335, 2021325

Description Damien Ciabrini 2021-06-15 12:34:47 UTC

Description of problem:

Context: creating podman containers on an VM environment with moderate load. Hypervisor is hosting 10 VM in total, slightly overcommitting CPU, but not short on RAM, and decent IO workload.


When we reboot the VM, on restart, we have around 40 podman containers that get restarted, and 5 new containers are created and started in parallel.

Sometimes, the creation of those new containers is failing [1], in what seems to be a race between podman, runc and/or systemd. the podman run commands errors out with exit code 127 and the following error message:

    Error: OCI runtime error: container_linux.go:370: starting container process caused: process_linux.go:459: container init caused: process_linux.go:422: setting cgroup config for procHooks process caused: Unit libpod-b634df465802f29636c6ff5e2e23d04b4392da4142577f83bd4c7143adca7c31.scope not found.

This seems to be runc complaining that the cgroup that is it supposed to configure does not exist.

The failure seems to happens randomly on any of the 5 containers that we are creating. The creation command looks like the following:

    podman run -d --name=galera-bundle-podman-0 -e PCMK_stderr=1 --net=host -e PCMK_remote_port=3123 -v /var/lib/kolla/config_files/mysql.json:/var/lib/kolla/config_files/config.json:ro -v /var/lib/config-data/puppet-generated/mysql/:/var/lib/kolla/config_files/src:ro -v /etc/hosts:/etc/hosts:ro -v /etc/localtime:/etc/localtime:ro -v /var/lib/mysql:/var/lib/mysql:rw -v /var/log/mariadb:/var/log/mariadb:rw -v /var/log/containers/mysql:/var/log/mysql:rw -v /dev/log:/dev/log:rw -v /etc/pacemaker/authkey:/etc/pacemaker/authkey -v /var/log/pacemaker/bundles/galera-bundle-0:/var/log --user=root --log-driver=k8s-file --log-opt path=/var/log/containers/stdouts/galera-bundle.log -e KOLLA_CONFIG_STRATEGY=COPY_ALWAYS cluster.common.tag/rhosp16-openstack-mariadb:pcmklatest /bin/bash /usr/local/bin/kolla_start

In our env, this seems to happen when many containers are being created/started concurrently. So far, we don't see this error on all our VMs, but one scenario triggers that race pretty consistently (I'd say >50%). This scenario is probably the most load-heavy for our hypervisor, but I don't have hard evidence to back that up yet.

Also worth noting, this happens in our Openstack testing, but looking at [2], it seems that the very same error has also been witnessed in Ceph testing some time ago, under the identified condition of a heavy IO workload.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1967128
[2] https://tracker.ceph.com/issues/41037

Version-Release number of selected component (if applicable):
podman-3.0.1-6.module+el8.4.0+10614+dd38312c.x86_64
runc-1.0.0-70.rc92.module+el8.4.0+10614+dd38312c.x86_64
systemd-239-45.el8.x86_64


How reproducible:
Fairly high (>50% under load)

Steps to Reproduce:
1. Restart a few dozens of existing containers concurrently
2. Create a couple of new containers at the same time

Actual results:
Sometimes podman can't run the created container because runc will fail to configure the specified cgroup.

Expected results:
Podman run should always work

Additional info:

Comment 1 Matthew Heon 2021-06-15 13:18:08 UTC

Any chance you can try with `crun` instead of `runc` on a fresh system? Podman isn't responsible for creating that cgroup, so I suspect this is a race somewhere in runc, and testing with crun will reveal that.

Comment 2 Damien Ciabrini 2021-06-15 13:42:06 UTC

(In reply to Matthew Heon from comment #1)
> Any chance you can try with `crun` instead of `runc` on a fresh system?
> Podman isn't responsible for creating that cgroup, so I suspect this is a
> race somewhere in runc, and testing with crun will reveal that.

I will run a couple a tests with crun and report if i see any occurrence of it. Unfortunately each test is about 2h30 to 3h so it might take some time to report back.

Meanwhile, i couldn't spot who is responsible for creating the cgroup from source, but this error message has been reported by runc (the error message comes from it), so that would tend to validate your initia suspicion.

Comment 3 Damien Ciabrini 2021-06-18 17:54:07 UTC

After some config changes on the node under test, all the containers have been recreated to use crun instead of runc. That equates to 47 containers on the host, among which 8 are re-created after each reboot.

I did 100 reboots with this new setup, under the same load as originally reported, and I couldn't replicate my issue when podman targets the crun runtime.

Comment 4 Jindrich Novy 2021-06-21 09:34:12 UTC

Reassigning to runc as comment #3 proves it is a runc's race as Matt mentions in comment #1.

Comment 5 Tom Sweeney 2021-06-21 13:26:05 UTC

Kir,  can you take a look at this, please?

Dan or Mrunal, if someone else should take a look, please let me know.

Comment 6 Sebastian Wagner 2021-07-13 11:34:02 UTC

FYI, this issue also affects Ceph: https://tracker.ceph.com/issues/49287 . This *might* also affect RHCS 5, but I haven't seen this race yet in downstream.

Comment 7 Yaniv Kaul 2021-07-21 11:35:55 UTC

(In reply to Tom Sweeney from comment #5)
> Kir,  can you take a look at this, please?
> 
> Dan or Mrunal, if someone else should take a look, please let me know.

I'd appreciate if you can provide an update, as it impacts both RHOSP 16.2 as well as Ceph (potentially RHCS 5.0), both of which are to be released soon.

Comment 8 Daniel Walsh 2021-07-21 16:50:32 UTC

I would guess we would ask you to test with the latest runc 1.0.1, which was recently released.  Of course maybe transitioning to crun is the best idea.

Comment 9 Kir Kolyshkin 2021-07-22 16:17:06 UTC

This is indeed a race in runc, which was fixed by https://github.com/opencontainers/runc/pull/2614, which is part of runc v1.0.0-rc93. So any recent runc should be fine (1.0.1 is recommended though).

I can't find it at the moment which runc is available via rhel8 container-tools, but I hope it's recent.

Comment 10 Daniel Walsh 2021-07-22 16:31:50 UTC

Ok let's just say that this is fixed in runc 1.0.1

Comment 11 Tom Sweeney 2021-07-22 18:15:58 UTC

Jindrich, I think this one is in your purview, please reroute if not.  Setting to Post for any further BZ or packaging needs.

Comment 19 Laurie Friedman 2021-08-05 22:05:31 UTC

@jnovy It is too late to make any changes for 8.4.0.2. The final compose is already done.  But you could make the change in 8.4.0.3 in 6 weeks.

Comment 20 Jindrich Novy 2021-08-06 05:19:40 UTC

Proposed this for zstream in bug 1990406 then. Thanks.

Comment 21 Alex Jia 2021-08-08 12:40:22 UTC

I can't hit this issue on runc-1.0.1-5.module+el8.5.0+12157+04f1d6be
w/ podman-3.3.0-2.module+el8.5.0+12157+04f1d6be.

Comment 22 Sebastian Wagner 2021-08-23 09:23:05 UTC

Seems that centos's container-tools 3.0 is also affected by this: https://pulpito.ceph.com/swagner-2021-08-20_11:35:16-rados:cephadm-wip-swagner2-testing-2021-08-18-1238-pacific-distro-basic-smithi/6349346/ Is there a plan to get it into container-tools 3.0 as well?

Comment 23 Tom Sweeney 2021-08-23 13:21:00 UTC

@jnovy do you know the answer to Sebastian's question: https://bugzilla.redhat.com/show_bug.cgi?id=1972209#c22?  Is it possible to update 3.0, or has the window closed?

Comment 24 Jindrich Novy 2021-08-26 09:19:04 UTC

Sebastian, please file a separate bug for 3.0 stream if you believe a backport is required there too. Thanks.

Comment 25 Damien Ciabrini 2021-09-02 12:11:13 UTC

Hey Jindrich, Tom and Sebastien,

I just cloned this bz into https://bugzilla.redhat.com/show_bug.cgi?id=2000570, to track the backport of this fix for container-tools 3.0 in rhel 8.4, as it is what we're consuming in RHOSP 16.2.

Thanks

Comment 26 Sebastian Wagner 2021-11-02 09:39:39 UTC

Sorry for my ignorance here, but we're still seeing this bug multiple times a day in upstream Ceph using CentOS's container-tools:3.0. That's why I cloned this into 2019335

Comment 27 Jindrich Novy 2021-11-08 08:54:52 UTC

*** Bug 2019335 has been marked as a duplicate of this bug. ***

Comment 28 Jindrich Novy 2021-11-08 08:56:45 UTC

Manual cloning will not work, this needs to follow the zstream cloning process.

Laurie, Derrick, can you please z+ this so I can update runc in 3.0-8.4.0?

Comment 29 Sebastian Wagner 2021-11-08 08:58:30 UTC

relates to https://github.com/ceph/ceph/pull/43813

Comment 35 errata-xmlrpc 2021-11-09 17:38:22 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: container-tools:rhel8 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:4154

Comment 36 Jindrich Novy 2021-11-10 13:22:53 UTC

The patch mentioned in comment #9 in bug 1972209 is already applied in the runc-1.0.0-72.rc92.el8_4 which was already released in 3.0-8.4.0 via https://access.redhat.com/errata/RHBA-2021:4093 - so no need for cloning/updates in 3.0-8.4.0