https://testgrid.k8s.io/redhat-openshift-ocp-release-4.6-informing#release-openshift-ocp-installer-e2e-gcp-rt-4.6&grid=old has gone from half passing to failing all the time. The failures are not in any particular tests, but the tests failing appear to be failing because pods are either not running or not cleaning up. https://prow.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-gcp-rt-4.6/1310725154694762496 https://prow.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-gcp-rt-4.6/1310694250597322752 https://prow.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-gcp-rt-4.6/1310630488951492608 Are the latest three.
`grep -r 'Failed to create pod sandbox' '/home/deads/Downloads/build-log.txt' | wc -l` shows over 2200 hits. The same set of tests on "normal" GCP shows only 45 hits. (https://prow.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-gcp-4.6/1310991133542715392 and then expand the flaky monitor output)
A grep for 'Failed to remove cgroup' shows up about 3k times in the failing realtime runs and zero times in the "normal" gcp events
An example from one pod Sep 26 07:46:33.447 W ns/openshift-marketplace pod/community-operators-2khwx node/ci-op-dhgdhx53-1a9c0-qgt9w-worker-c-m8qkr reason/FailedCreatePodSandBox Failed to create pod sandbox: rpc error: code = Unknown desc = container create failed: time="2020-09-26T07:46:21Z" level=warning msg="Timed out while waiting for StartTransientUnit(crio-f6984c5eeb6c6b49ff3f036bdcb9ded317b3d0b2469ebbb35705442a2afd98c2.scope) completion signal from dbus. Continuing..." time="2020-09-26T07:46:25Z" level=warning msg="Failed to remove cgroup (will retry)" error="rmdir /sys/fs/cgroup/memory/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod28ce6e74_694c_4b77_a953_dc01e182ac76.slice/crio-f6984c5eeb6c6b49ff3f036bdcb9ded317b3d0b2469ebbb35705442a2afd98c2.scope: device or resource busy" time="2020-09-26T07:46:25Z" level=warning msg="Failed to remove cgroup (will retry)" error="rmdir /sys/fs/cgroup/cpu,cpuacct/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod28ce6e74_694c_4b77_a953_dc01e182ac76.slice/crio-f6984c5eeb6c6b49ff3f036bdcb9ded317b3d0b2469ebbb35705442a2afd98c2.scope: device or resource busy" time="2020-09-26T07:46:25Z" level=warning msg="Failed to remove cgroup (will retry)" error="rmdir /sys/fs/cgroup/pids/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod28ce6e74_694c_4b77_a953_dc01e182ac76.slice/crio-f6984c5eeb6c6b49ff3f036bdcb9ded317b3d0b2469ebbb35705442a2afd98c2.scope: device or resource busy" time="2020-09-26T07:46:25Z" level=warning msg="Failed to remove cgroup (will retry)" error="rmdir /sys/fs/cgroup/freezer/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod28ce6e74_694c_4b77_a953_dc01e182ac76.slice/crio-f6984c5eeb6c6b49ff3f036bdcb9ded317b3d0b2469ebbb35705442a2afd98c2.scope: device or resource busy" time="2020-09-26T07:46:25Z" level=warning msg="Failed to remove cgroup (will retry)" error="rmdir /sys/fs/cgroup/net_cls,net_prio/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod28ce6e74_694c_4b77_a953_dc01e182ac76.slice/crio-f6984c5eeb6c6b49ff3f036bdcb9ded317b3d0b2469ebbb35705442a2afd98c2.scope: device or resource busy" time="2020-09-26T07:46:25Z" level=warning msg="Failed to remove cgroup (will retry)" error="rmdir /sys/fs/cgroup/perf_event/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod28ce6e74_694c_4b77_a953_dc01e182ac76.slice/crio-f6984c5eeb6c6b49ff3f036bdcb9ded317b3d0b2469ebbb35705442a2afd98c2.scope: device or resource busy" time="2020-09-26T07:46:25Z" level=warning msg="Failed to remove cgroup (will retry)" error="rmdir /sys/fs/cgroup/net_cls,net_prio/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod28ce6e74_694c_4b77_a953_dc01e182ac76.slice/crio-f6984c5eeb6c6b49ff3f036bdcb9ded317b3d0b2469ebbb35705442a2afd98c2.scope: device or resource busy" time="2020-09-26T07:46:25Z" level=warning msg="Failed to remove cgroup (will retry)" error="rmdir /sys/fs/cgroup/cpuset/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod28ce6e74_694c_4b77_a953_dc01e182ac76.slice/crio-f6984c5eeb6c6b49ff3f036bdcb9ded317b3d0b2469ebbb35705442a2afd98c2.scope: device or resource busy" time="2020-09-26T07:46:25Z" level=warning msg="Failed to remove cgroup (will retry)" error="rmdir /sys/fs/cgroup/devices/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod28ce6e74_694c_4b77_a953_dc01e182ac76.slice/crio-f6984c5eeb6c6b49ff3f036bdcb9ded317b3d0b2469ebbb35705442a2afd98c2.scope: device or resource busy" time="2020-09-26T07:46:25Z" level=warning msg="Failed to remove cgroup (will retry)" error="rmdir /sys/fs/cgroup/cpu,cpuacct/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod28ce6e74_694c_4b77_a953_dc01e182ac76.slice/crio-f6984c5eeb6c6b49ff3f036bdcb9ded317b3d0b2469ebbb35705442a2afd98c2.scope: device or resource busy" time="2020-09-26T07:46:25Z" level=warning msg="Failed to remove cgroup (will retry)" error="rmdir /sys/fs/cgroup/blkio/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod28ce6e74_694c_4b77_a953_dc01e182ac76.slice/crio-f6984c5eeb6c6b49ff3f036bdcb9ded317b3d0b2469ebbb35705442a2afd98c2.scope: device or resource busy" time="2020-09-26T07:46:25Z" level=warning msg="Failed to remove cgroup (will retry)" error="rmdir /sys/fs/cgroup/hugetlb/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod28ce6e74_694c_4b77_a953_dc01e182ac76.slice/crio-f6984c5eeb6c6b49ff3f036bdcb9ded317b3d0b2469ebbb35705442a2afd98c2.scope: device or resource busy" time="2020-09-26T07:46:27Z" level=error msg="Failed to remove cgroup" error="rmdir /sys/fs/cgroup/hugetlb/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod28ce6e74_694c_4b77_a953_dc01e182ac76.slice/crio-f6984c5eeb6c6b49ff3f036bdcb9ded317b3d0b2469ebbb35705442a2afd98c2.scope: device or resource busy" time="2020-09-26T07:46:27Z" level=error msg="Failed to remove cgroup" error="rmdir /sys/fs/cgroup/perf_event/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod28ce6e74_694c_4b77_a953_dc01e182ac76.slice/crio-f6984c5eeb6c6b49ff3f036bdcb9ded317b3d0b2469ebbb35705442a2afd98c2.scope: device or resource busy" time="2020-09-26T07:46:27Z" level=error msg="Failed to remove cgroup" error="rmdir /sys/fs/cgroup/net_cls,net_prio/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod28ce6e74_694c_4b77_a953_dc01e182ac76.slice/crio-f6984c5eeb6c6b49ff3f036bdcb9ded317b3d0b2469ebbb35705442a2afd98c2.scope: device or resource busy" time="2020-09-26T07:46:27Z" level=error msg="Failed to remove cgroup" error="rmdir /sys/fs/cgroup/cpuset/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod28ce6e74_694c_4b77_a953_dc01e182ac76.slice/crio-f6984c5eeb6c6b49ff3f036bdcb9ded317b3d0b2469ebbb35705442a2afd98c2.scope: device or resource busy" time="2020-09-26T07:46:27Z" level=error msg="Failed to remove cgroup" error="rmdir /sys/fs/cgroup/devices/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod28ce6e74_694c_4b77_a953_dc01e182ac76.slice/crio-f6984c5eeb6c6b49ff3f036bdcb9ded317b3d0b2469ebbb35705442a2afd98c2.scope: device or resource busy" time="2020-09-26T07:46:27Z" level=error msg="Failed to remove cgroup" error="rmdir /sys/fs/cgroup/cpu,cpuacct/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod28ce6e74_694c_4b77_a953_dc01e182ac76.slice/crio-f6984c5eeb6c6b49ff3f036bdcb9ded317b3d0b2469ebbb35705442a2afd98c2.scope: device or resource busy" time="2020-09-26T07:46:27Z" level=error msg="Failed to remove cgroup" error="rmdir /sys/fs/cgroup/blkio/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod28ce6e74_694c_4b77_a953_dc01e182ac76.slice/crio-f6984c5eeb6c6b49ff3f036bdcb9ded317b3d0b2469ebbb35705442a2afd98c2.scope: device or resource busy" time="2020-09-26T07:46:27Z" level=error msg="Failed to remove cgroup" error="rmdir /sys/fs/cgroup/net_cls,net_prio/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod28ce6e74_694c_4b77_a953_dc01e182ac76.slice/crio-f6984c5eeb6c6b49ff3f036bdcb9ded317b3d0b2469ebbb35705442a2afd98c2.scope: device or resource busy" time="2020-09-26T07:46:27Z" level=error msg="Failed to remove cgroup" error="rmdir /sys/fs/cgroup/memory/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod28ce6e74_694c_4b77_a953_dc01e182ac76.slice/crio-f6984c5eeb6c6b49ff3f036bdcb9ded317b3d0b2469ebbb35705442a2afd98c2.scope: device or resource busy" time="2020-09-26T07:46:27Z" level=error msg="Failed to remove cgroup" error="rmdir /sys/fs/cgroup/cpu,cpuacct/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod28ce6e74_694c_4b77_a953_dc01e182ac76.slice/crio-f6984c5eeb6c6b49ff3f036bdcb9ded317b3d0b2469ebbb35705442a2afd98c2.scope: device or resource busy" time="2020-09-26T07:46:27Z" level=error msg="Failed to remove cgroup" error="rmdir /sys/fs/cgroup/pids/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod28ce6e74_694c_4b77_a953_dc01e182ac76.slice/crio-f6984c5eeb6c6b49ff3f036bdcb9ded317b3d0b2469ebbb35705442a2afd98c2.scope: device or resource busy" time="2020-09-26T07:46:27Z" level=error msg="Failed to remove cgroup" error="rmdir /sys/fs/cgroup/freezer/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod28ce6e74_694c_4b77_a953_dc01e182ac76.slice/crio-f6984c5eeb6c6b49ff3f036bdcb9ded317b3d0b2469ebbb35705442a2afd98c2.scope: device or resource busy" time="2020-09-26T07:46:27Z" level=warning msg="signal: killed" time="2020-09-26T07:46:32Z" level=error msg="container_linux.go:370: starting container process caused: process_linux.go:459: container init caused: process_linux.go:422: settin
Preliminary analysis: The "failure to remove cgroup" is a new log message added by https://github.com/opencontainers/runc/pull/2506. Previously those errors were silently ignored, and now those are logged (but still ignored). Kernel version: > Sep 28 22:51:47.099667 localhost kernel: Linux version 4.18.0-193.23.1.el8_2.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 8.3.1 20191121 (Red Hat 8.3.1-5) (GCC)) #1 SMP Thu Sep 17 15:34:16 EDT 2020 runc version appears to be post rc92 as the logging changes were merged 2 weeks after the rc92 release. Most probably HEAD or close to it. Assuming HEAD. > process_linux.go:422: settin" As I suspected, the error is coming from cgroup manager's Set: if err := p.manager.Set(p.config.Config); err != nil { return newSystemErrorWithCause(err, "setting cgroup config for procHooks process") } It seems that some of the parameters that are set are not to the kernel's liking. Unfortunately, since the error message is truncated (who does that?), we don't know what exact setting has failed. Now, the container is half-created but applying cgroup settings failed, and on this code path runc tries to remove cgroups without killing the process inside first, thus we see those "failure to remove cgroup" messages (to clarify, it was always that way, except for the messages added). This looks like a bug in runc, but it is not new. Fortunately, I was just looking at it yesterday. As to systemd failure (Timed out while waiting for StartTransientUnit ... completion signal from dbus), this is _probably_ caused by the same issue -- some invalid cgroup parameter. Will try to repro locally now.
The runc version is https://github.com/opencontainers/runc/commit/190fcf244c3a0df5ab4c1f6b600cf194ccc17a8a (as per https://bugzilla.redhat.com/show_bug.cgi?id=1868057#c10), thanks to Peter for solving the mystery.
I digged up logs that give the full context - 55569:Sep 28 22:39:15.022 W ns/e2e-test-cli-deployment-5dbh5 pod/tag-images-1-hxs6b node/ci-op-xgzym9zi-1a9c0-qntl2-worker-b-2qrq5 reason/Failed Error: container create failed: time="2020-09-28T22:39:09Z" level=warning msg="Timed out while waiting for StartTransientUnit(crio-9e1546337c68680dc75187a8a8ffca5a9a41713d164a5e783340fe220503f2cb.scope) completion signal from dbus. Continuing..."\ntime="2020-09-28T22:39:10Z" level=warning msg="Failed to remove cgroup (will retry)" error="rmdir /sys/fs/cgroup/cpuset/kubepods.slice/kubepods-pod6843d0c5_72d1_496c_ad87_6ba2519c7504.slice/crio-9e1546337c68680dc75187a8a8ffca5a9a41713d164a5e783340fe220503f2cb.scope: device or resource busy"\ntime="2020-09-28T22:39:10Z" level=warning msg="Failed to remove cgroup (will retry)" error="rmdir /sys/fs/cgroup/cpu,cpuacct/kubepods.slice/kubepods-pod6843d0c5_72d1_496c_ad87_6ba2519c7504.slice/crio-9e1546337c68680dc75187a8a8ffca5a9a41713d164a5e783340fe220503f2cb.scope: device or resource busy"\ntime="2020-09-28T22:39:10Z" level=warning msg="Failed to remove cgroup (will retry)" error="rmdir /sys/fs/cgroup/blkio/kubepods.slice/kubepods-pod6843d0c5_72d1_496c_ad87_6ba2519c7504.slice/crio-9e1546337c68680dc75187a8a8ffca5a9a41713d164a5e783340fe220503f2cb.scope: device or resource busy"\ntime="2020-09-28T22:39:10Z" level=warning msg="Failed to remove cgroup (will retry)" error="rmdir /sys/fs/cgroup/net_cls,net_prio/kubepods.slice/kubepods-pod6843d0c5_72d1_496c_ad87_6ba2519c7504.slice/crio-9e1546337c68680dc75187a8a8ffca5a9a41713d164a5e783340fe220503f2cb.scope: device or resource busy"\ntime="2020-09-28T22:39:11Z" level=warning msg="Failed to remove cgroup (will retry)" error="rmdir /sys/fs/cgroup/perf_event/kubepods.slice/kubepods-pod6843d0c5_72d1_496c_ad87_6ba2519c7504.slice/crio-9e1546337c68680dc75187a8a8ffca5a9a41713d164a5e783340fe220503f2cb.scope: device or resource busy"\ntime="2020-09-28T22:39:11Z" level=warning msg="Failed to remove cgroup (will retry)" error="rmdir /sys/fs/cgroup/freezer/kubepods.slice/kubepods-pod6843d0c5_72d1_496c_ad87_6ba2519c7504.slice/crio-9e1546337c68680dc75187a8a8ffca5a9a41713d164a5e783340fe220503f2cb.scope: device or resource busy"\ntime="2020-09-28T22:39:11Z" level=warning msg="Failed to remove cgroup (will retry)" error="rmdir /sys/fs/cgroup/net_cls,net_prio/kubepods.slice/kubepods-pod6843d0c5_72d1_496c_ad87_6ba2519c7504.slice/crio-9e1546337c68680dc75187a8a8ffca5a9a41713d164a5e783340fe220503f2cb.scope: device or resource busy"\ntime="2020-09-28T22:39:11Z" level=warning msg="Failed to remove cgroup (will retry)" error="rmdir /sys/fs/cgroup/devices/kubepods.slice/kubepods-pod6843d0c5_72d1_496c_ad87_6ba2519c7504.slice/crio-9e1546337c68680dc75187a8a8ffca5a9a41713d164a5e783340fe220503f2cb.scope: device or resource busy"\ntime="2020-09-28T22:39:11Z" level=warning msg="Failed to remove cgroup (will retry)" error="rmdir /sys/fs/cgroup/memory/kubepods.slice/kubepods-pod6843d0c5_72d1_496c_ad87_6ba2519c7504.slice/crio-9e1546337c68680dc75187a8a8ffca5a9a41713d164a5e783340fe220503f2cb.scope: device or resource busy"\ntime="2020-09-28T22:39:11Z" level=warning msg="Failed to remove cgroup (will retry)" error="rmdir /sys/fs/cgroup/cpu,cpuacct/kubepods.slice/kubepods-pod6843d0c5_72d1_496c_ad87_6ba2519c7504.slice/crio-9e1546337c68680dc75187a8a8ffca5a9a41713d164a5e783340fe220503f2cb.scope: device or resource busy"\ntime="2020-09-28T22:39:11Z" level=warning msg="Failed to remove cgroup (will retry)" error="rmdir /sys/fs/cgroup/pids/kubepods.slice/kubepods-pod6843d0c5_72d1_496c_ad87_6ba2519c7504.slice/crio-9e1546337c68680dc75187a8a8ffca5a9a41713d164a5e783340fe220503f2cb.scope: device or resource busy"\ntime="2020-09-28T22:39:11Z" level=warning msg="Failed to remove cgroup (will retry)" error="rmdir /sys/fs/cgroup/hugetlb/kubepods.slice/kubepods-pod6843d0c5_72d1_496c_ad87_6ba2519c7504.slice/crio-9e1546337c68680dc75187a8a8ffca5a9a41713d164a5e783340fe220503f2cb.scope: device or resource busy"\ntime="2020-09-28T22:39:13Z" level=error msg="Failed to remove cgroup" error="rmdir /sys/fs/cgroup/cpu,cpuacct/kubepods.slice/kubepods-pod6843d0c5_72d1_496c_ad87_6ba2519c7504.slice/crio-9e1546337c68680dc75187a8a8ffca5a9a41713d164a5e783340fe220503f2cb.scope: device or resource busy"\ntime="2020-09-28T22:39:13Z" level=error msg="Failed to remove cgroup" error="rmdir /sys/fs/cgroup/blkio/kubepods.slice/kubepods-pod6843d0c5_72d1_496c_ad87_6ba2519c7504.slice/crio-9e1546337c68680dc75187a8a8ffca5a9a41713d164a5e783340fe220503f2cb.scope: device or resource busy"\ntime="2020-09-28T22:39:13Z" level=error msg="Failed to remove cgroup" error="rmdir /sys/fs/cgroup/net_cls,net_prio/kubepods.slice/kubepods-pod6843d0c5_72d1_496c_ad87_6ba2519c7504.slice/crio-9e1546337c68680dc75187a8a8ffca5a9a41713d164a5e783340fe220503f2cb.scope: device or resource busy"\ntime="2020-09-28T22:39:13Z" level=error msg="Failed to remove cgroup" error="rmdir /sys/fs/cgroup/cpuset/kubepods.slice/kubepods-pod6843d0c5_72d1_496c_ad87_6ba2519c7504.slice/crio-9e1546337c68680dc75187a8a8ffca5a9a41713d164a5e783340fe220503f2cb.scope: device or resource busy"\ntime="2020-09-28T22:39:13Z" level=error msg="Failed to remove cgroup" error="rmdir /sys/fs/cgroup/memory/kubepods.slice/kubepods-pod6843d0c5_72d1_496c_ad87_6ba2519c7504.slice/crio-9e1546337c68680dc75187a8a8ffca5a9a41713d164a5e783340fe220503f2cb.scope: device or resource busy"\ntime="2020-09-28T22:39:14Z" level=error msg="Failed to remove cgroup" error="rmdir /sys/fs/cgroup/cpu,cpuacct/kubepods.slice/kubepods-pod6843d0c5_72d1_496c_ad87_6ba2519c7504.slice/crio-9e1546337c68680dc75187a8a8ffca5a9a41713d164a5e783340fe220503f2cb.scope: device or resource busy"\ntime="2020-09-28T22:39:14Z" level=error msg="Failed to remove cgroup" error="rmdir /sys/fs/cgroup/pids/kubepods.slice/kubepods-pod6843d0c5_72d1_496c_ad87_6ba2519c7504.slice/crio-9e1546337c68680dc75187a8a8ffca5a9a41713d164a5e783340fe220503f2cb.scope: device or resource busy"\ntime="2020-09-28T22:39:14Z" level=error msg="Failed to remove cgroup" error="rmdir /sys/fs/cgroup/hugetlb/kubepods.slice/kubepods-pod6843d0c5_72d1_496c_ad87_6ba2519c7504.slice/crio-9e1546337c68680dc75187a8a8ffca5a9a41713d164a5e783340fe220503f2cb.scope: device or resource busy"\ntime="2020-09-28T22:39:14Z" level=error msg="Failed to remove cgroup" error="rmdir /sys/fs/cgroup/perf_event/kubepods.slice/kubepods-pod6843d0c5_72d1_496c_ad87_6ba2519c7504.slice/crio-9e1546337c68680dc75187a8a8ffca5a9a41713d164a5e783340fe220503f2cb.scope: device or resource busy"\ntime="2020-09-28T22:39:14Z" level=error msg="Failed to remove cgroup" error="rmdir /sys/fs/cgroup/freezer/kubepods.slice/kubepods-pod6843d0c5_72d1_496c_ad87_6ba2519c7504.slice/crio-9e1546337c68680dc75187a8a8ffca5a9a41713d164a5e783340fe220503f2cb.scope: device or resource busy"\ntime="2020-09-28T22:39:14Z" level=error msg="Failed to remove cgroup" error="rmdir /sys/fs/cgroup/net_cls,net_prio/kubepods.slice/kubepods-pod6843d0c5_72d1_496c_ad87_6ba2519c7504.slice/crio-9e1546337c68680dc75187a8a8ffca5a9a41713d164a5e783340fe220503f2cb.scope: device or resource busy"\ntime="2020-09-28T22:39:14Z" level=error msg="Failed to remove cgroup" error="rmdir /sys/fs/cgroup/devices/kubepods.slice/kubepods-pod6843d0c5_72d1_496c_ad87_6ba2519c7504.slice/crio-9e1546337c68680dc75187a8a8ffca5a9a41713d164a5e783340fe220503f2cb.scope: device or resource busy"\ntime="2020-09-28T22:39:14Z" level=warning msg="signal: killed"\ntime="2020-09-28T22:39:14Z" level=error msg="container_linux.go:370: starting container process caused: process_linux.go:459: container init caused: process_linux.go:422: setting cgroup config for procHooks process caused: Unit crio-9e1546337c68680dc75187a8a8ffca5a9a41713d164a5e783340fe220503f2cb.scope not found."\n 105653:Sep 28 23:13:20.137 W ns/e2e-cronjob-1566 pod/concurrent-1601334780-bsr5c node/ci-op-xgzym9zi-1a9c0-qntl2-worker-c-5jw89 reason/FailedCreatePodSandBox Failed to create pod sandbox: rpc error: code = Unknown desc = container create failed: time="2020-09-28T23:13:19Z" level=warning msg="exit status 2"\ntime="2020-09-28T23:13:19Z" level=error msg="container_linux.go:370: starting container process caused: process_linux.go:459: container init caused: process_linux.go:422: setting cgroup config for procHooks process caused: failed to write \"c 248:* m\" to \"/sys/fs/cgroup/devices/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod7e42411b_f680_4383_98ae_666062fbe3e5.slice/crio-009a83318e5230bcd0f71d606b909c5888df07a9eb2d1050de97165310a7e477.scope/devices.deny\": open /sys/fs/cgroup/devices/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod7e42411b_f680_4383_98ae_666062fbe3e5.slice/crio-009a83318e5230bcd0f71d606b909c5888df07a9eb2d1050de97165310a7e477.scope/devices.deny: no such file or directory"\n 107237:Sep 28 23:13:20.137 W ns/e2e-cronjob-1566 pod/concurrent-1601334780-bsr5c node/ci-op-xgzym9zi-1a9c0-qntl2-worker-c-5jw89 reason/FailedCreatePodSandBox Failed to create pod sandbox: rpc error: code = Unknown desc = container create failed: time="2020-09-28T23:13:19Z" level=warning msg="exit status 2"\ntime="2020-09-28T23:13:19Z" level=error msg="container_linux.go:370: starting container process caused: process_linux.go:459: container init caused: process_linux.go:422: setting cgroup config for procHooks process caused: failed to write \"c 248:* m\" to \"/sys/fs/cgroup/devices/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod7e42411b_f680_4383_98ae_666062fbe3e5.slice/crio-009a83318e5230bcd0f71d606b909c5888df07a9eb2d1050de97165310a7e477.scope/devices.deny\": open /sys/fs/cgroup/devices/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod7e42411b_f680_4383_98ae_666062fbe3e5.slice/crio-009a83318e5230bcd0f71d606b909c5888df07a9eb2d1050de97165310a7e477.scope/devices.deny: no such file or directory"\n 108911:Sep 28 23:13:20.137 W ns/e2e-cronjob-1566 pod/concurrent-1601334780-bsr5c node/ci-op-xgzym9zi-1a9c0-qntl2-worker-c-5jw89 reason/FailedCreatePodSandBox Failed to create pod sandbox: rpc error: code = Unknown desc = container create failed: time="2020-09-28T23:13:19Z" level=warning msg="exit status 2"\ntime="2020-09-28T23:13:19Z" level=error msg="container_linux.go:370: starting container process caused: process_linux.go:459: container init caused: process_linux.go:422: setting cgroup config for procHooks process caused: failed to write \"c 248:* m\" to \"/sys/fs/cgroup/devices/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod7e42411b_f680_4383_98ae_666062fbe3e5.slice/crio-009a83318e5230bcd0f71d606b909c5888df07a9eb2d1050de97165310a7e477.scope/devices.deny\": open /sys/fs/cgroup/devices/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod7e42411b_f680_4383_98ae_666062fbe3e5.slice/crio-009a83318e5230bcd0f71d606b909c5888df07a9eb2d1050de97165310a7e477.scope/devices.deny: no such file or directory"\n 110244:Sep 28 23:13:20.137 W ns/e2e-cronjob-1566 pod/concurrent-1601334780-bsr5c node/ci-op-xgzym9zi-1a9c0-qntl2-worker-c-5jw89 reason/FailedCreatePodSandBox Failed to create pod sandbox: rpc error: code = Unknown desc = container create failed: time="2020-09-28T23:13:19Z" level=warning msg="exit status 2"\ntime="2020-09-28T23:13:19Z" level=error msg="container_linux.go:370: starting container process caused: process_linux.go:459: container init caused: process_linux.go:422: setting cgroup config for procHooks process caused: failed to write \"c 248:* m\" to \"/sys/fs/cgroup/devices/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod7e42411b_f680_4383_98ae_666062fbe3e5.slice/crio-009a83318e5230bcd0f71d606b909c5888df07a9eb2d1050de97165310a7e477.scope/devices.deny\": open /sys/fs/cgroup/devices/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod7e42411b_f680_4383_98ae_666062fbe3e5.slice/crio-009a83318e5230bcd0f71d606b909c5888df07a9eb2d1050de97165310a7e477.scope/devices.deny: no such file or directory"\n 111556:Sep 28 23:13:20.137 W ns/e2e-cronjob-1566 pod/concurrent-1601334780-bsr5c node/ci-op-xgzym9zi-1a9c0-qntl2-worker-c-5jw89 reason/FailedCreatePodSandBox Failed to create pod sandbox: rpc error: code = Unknown desc = container create failed: time="2020-09-28T23:13:19Z" level=warning msg="exit status 2"\ntime="2020-09-28T23:13:19Z" level=error msg="container_linux.go:370: starting container process caused: process_linux.go:459: container init caused: process_linux.go:422: setting cgroup config for procHooks process caused: failed to write \"c 248:* m\" to \"/sys/fs/cgroup/devices/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod7e42411b_f680_4383_98ae_666062fbe3e5.slice/crio-009a83318e5230bcd0f71d606b909c5888df07a9eb2d1050de97165310a7e477.scope/devices.deny\": open /sys/fs/cgroup/devices/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod7e42411b_f680_4383_98ae_666062fbe3e5.slice/crio-009a83318e5230bcd0f71d606b909c5888df07a9eb2d1050de97165310a7e477.scope/devices.deny: no such file or directory"\n 112919:Sep 28 23:13:20.137 W ns/e2e-cronjob-1566 pod/concurrent-1601334780-bsr5c node/ci-op-xgzym9zi-1a9c0-qntl2-worker-c-5jw89 reason/FailedCreatePodSandBox Failed to create pod sandbox: rpc error: code = Unknown desc = container create failed: time="2020-09-28T23:13:19Z" level=warning msg="exit status 2"\ntime="2020-09-28T23:13:19Z" level=error msg="container_linux.go:370: starting container process caused: process_linux.go:459: container init caused: process_linux.go:422: setting cgroup config for procHooks process caused: failed to write \"c 248:* m\" to \"/sys/fs/cgroup/devices/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod7e42411b_f680_4383_98ae_666062fbe3e5.slice/crio-009a83318e5230bcd0f71d606b909c5888df07a9eb2d1050de97165310a7e477.scope/devices.deny\": open /sys/fs/cgroup/devices/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod7e42411b_f680_4383_98ae_666062fbe3e5.slice/crio-009a83318e5230bcd0f71d606b909c5888df07a9eb2d1050de97165310a7e477.scope/devices.deny: no such file or directory"\n 116373:Sep 28 23:13:20.137 W ns/e2e-cronjob-1566 pod/concurrent-1601334780-bsr5c node/ci-op-xgzym9zi-1a9c0-qntl2-worker-c-5jw89 reason/FailedCreatePodSandBox Failed to create pod sandbox: rpc error: code = Unknown desc = container create failed: time="2020-09-28T23:13:19Z" level=warning msg="exit status 2"\ntime="2020-09-28T23:13:19Z" level=error msg="container_linux.go:370: starting container process caused: process_linux.go:459: container init caused: process_linux.go:422: setting cgroup config for procHooks process caused: failed to write \"c 248:* m\" to \"/sys/fs/cgroup/devices/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod7e42411b_f680_4383_98ae_666062fbe3e5.slice/crio-009a83318e5230bcd0f71d606b909c5888df07a9eb2d1050de97165310a7e477.scope/devices.deny\": open /sys/fs/cgroup/devices/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod7e42411b_f680_4383_98ae_666062fbe3e5.slice/crio-009a83318e5230bcd0f71d606b909c5888df07a9eb2d1050de97165310a7e477.scope/devices.deny: no such file or directory"\n 119987:Sep 28 23:13:20.137 W ns/e2e-cronjob-1566 pod/concurrent-1601334780-bsr5c node/ci-op-xgzym9zi-1a9c0-qntl2-worker-c-5jw89 reason/FailedCreatePodSandBox Failed to create pod sandbox: rpc error: code = Unknown desc = container create failed: time="2020-09-28T23:13:19Z" level=warning msg="exit status 2"\ntime="2020-09-28T23:13:19Z" level=error msg="container_linux.go:370: starting container process caused: process_linux.go:459: container init caused: process_linux.go:422: setting cgroup config for procHooks process caused: failed to write \"c 248:* m\" to \"/sys/fs/cgroup/devices/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod7e42411b_f680_4383_98ae_666062fbe3e5.slice/crio-009a83318e5230bcd0f71d606b909c5888df07a9eb2d1050de97165310a7e477.scope/devices.deny\": open /sys/fs/cgroup/devices/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod7e42411b_f680_4383_98ae_666062fbe3e5.slice/crio-009a83318e5230bcd0f71d606b909c5888df07a9eb2d1050de97165310a7e477.scope/devices.deny: no such file or directory"\n 120984:Sep 28 23:13:20.137 W ns/e2e-cronjob-1566 pod/concurrent-1601334780-bsr5c node/ci-op-xgzym9zi-1a9c0-qntl2-worker-c-5jw89 reason/FailedCreatePodSandBox Failed to create pod sandbox: rpc error: code = Unknown desc = container create failed: time="2020-09-28T23:13:19Z" level=warning msg="exit status 2"\ntime="2020-09-28T23:13:19Z" level=error msg="container_linux.go:370: starting container process caused: process_linux.go:459: container init caused: process_linux.go:422: setting cgroup config for procHooks process caused: failed to write \"c 248:* m\" to \"/sys/fs/cgroup/devices/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod7e42411b_f680_4383_98ae_666062fbe3e5.slice/crio-009a83318e5230bcd0f71d606b909c5888df07a9eb2d1050de97165310a7e477.scope/devices.deny\": open /sys/fs/cgroup/devices/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod7e42411b_f680_4383_98ae_666062fbe3e5.slice/crio-009a83318e5230bcd0f71d606b909c5888df07a9eb2d1050de97165310a7e477.scope/devices.deny: no such file or directory"\n 121862:Sep 28 23:13:20.137 W ns/e2e-cronjob-1566 pod/concurrent-1601334780-bsr5c node/ci-op-xgzym9zi-1a9c0-qntl2-worker-c-5jw89 reason/FailedCreatePodSandBox Failed to create pod sandbox: rpc error: code = Unknown desc = container create failed: time="2020-09-28T23:13:19Z" level=warning msg="exit status 2"\ntime="2020-09-28T23:13:19Z" level=error msg="container_linux.go:370: starting container process caused: process_linux.go:459: container init caused: process_linux.go:422: setting cgroup config for procHooks process caused: failed to write \"c 248:* m\" to \"/sys/fs/cgroup/devices/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod7e42411b_f680_4383_98ae_666062fbe3e5.slice/crio-009a83318e5230bcd0f71d606b909c5888df07a9eb2d1050de97165310a7e477.scope/devices.deny\": open /sys/fs/cgroup/devices/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod7e42411b_f680_4383_98ae_666062fbe3e5.slice/crio-009a83318e5230bcd0f71d606b909c5888df07a9eb2d1050de97165310a7e477.scope/devices.deny: no such file or directory"\n 123550:Sep 28 23:13:20.137 W ns/e2e-cronjob-1566 pod/concurrent-1601334780-bsr5c node/ci-op-xgzym9zi-1a9c0-qntl2-worker-c-5jw89 reason/FailedCreatePodSandBox Failed to create pod sandbox: rpc error: code = Unknown desc = container create failed: time="2020-09-28T23:13:19Z" level=warning msg="exit status 2"\ntime="2020-09-28T23:13:19Z" level=error msg="container_linux.go:370: starting container process caused: process_linux.go:459: container init caused: process_linux.go:422: setting cgroup config for procHooks process caused: failed to write \"c 248:* m\" to \"/sys/fs/cgroup/devices/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod7e42411b_f680_4383_98ae_666062fbe3e5.slice/crio-009a83318e5230bcd0f71d606b909c5888df07a9eb2d1050de97165310a7e477.scope/devices.deny\": open /sys/fs/cgroup/devices/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod7e42411b_f680_4383_98ae_666062fbe3e5.slice/crio-009a83318e5230bcd0f71d606b909c5888df07a9eb2d1050de97165310a7e477.scope/devices.deny: no such file or directory"\n 126154:Sep 28 23:13:20.137 W ns/e2e-cronjob-1566 pod/concurrent-1601334780-bsr5c node/ci-op-xgzym9zi-1a9c0-qntl2-worker-c-5jw89 reason/FailedCreatePodSandBox Failed to create pod sandbox: rpc error: code = Unknown desc = container create failed: time="2020-09-28T23:13:19Z" level=warning msg="exit status 2"\ntime="2020-09-28T23:13:19Z" level=error msg="container_linux.go:370: starting container process caused: process_linux.go:459: container init caused: process_linux.go:422: setting cgroup config for procHooks process caused: failed to write \"c 248:* m\" to \"/sys/fs/cgroup/devices/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod7e42411b_f680_4383_98ae_666062fbe3e5.slice/crio-009a83318e5230bcd0f71d606b909c5888df07a9eb2d1050de97165310a7e477.scope/devices.deny\": open /sys/fs/cgroup/devices/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod7e42411b_f680_4383_98ae_666062fbe3e5.slice/crio-009a83318e5230bcd0f71d606b909c5888df07a9eb2d1050de97165310a7e477.scope/devices.deny: no such file or directory"\n 131419:Sep 28 23:13:20.137 W ns/e2e-cronjob-1566 pod/concurrent-1601334780-bsr5c node/ci-op-xgzym9zi-1a9c0-qntl2-worker-c-5jw89 reason/FailedCreatePodSandBox Failed to create pod sandbox: rpc error: code = Unknown desc = container create failed: time="2020-09-28T23:13:19Z" level=warning msg="exit status 2"\ntime="2020-09-28T23:13:19Z" level=error msg="container_linux.go:370: starting container process caused: process_linux.go:459: container init caused: process_linux.go:422: setting cgroup config for procHooks process caused: failed to write \"c 248:* m\" to \"/sys/fs/cgroup/devices/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod7e42411b_f680_4383_98ae_666062fbe3e5.slice/crio-009a83318e5230bcd0f71d606b909c5888df07a9eb2d1050de97165310a7e477.scope/devices.deny\": open /sys/fs/cgroup/devices/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod7e42411b_f680_4383_98ae_666062fbe3e5.slice/crio-009a83318e5230bcd0f71d606b909c5888df07a9eb2d1050de97165310a7e477.scope/devices.deny: no such file or directory"\n 136902:Sep 28 23:13:20.137 W ns/e2e-cronjob-1566 pod/concurrent-1601334780-bsr5c node/ci-op-xgzym9zi-1a9c0-qntl2-worker-c-5jw89 reason/FailedCreatePodSandBox Failed to create pod sandbox: rpc error: code = Unknown desc = container create failed: time="2020-09-28T23:13:19Z" level=warning msg="exit status 2"\ntime="2020-09-28T23:13:19Z" level=error msg="container_linux.go:370: starting container process caused: process_linux.go:459: container init caused: process_linux.go:422: setting cgroup config for procHooks process caused: failed to write \"c 248:* m\" to \"/sys/fs/cgroup/devices/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod7e42411b_f680_4383_98ae_666062fbe3e5.slice/crio-009a83318e5230bcd0f71d606b909c5888df07a9eb2d1050de97165310a7e477.scope/devices.deny\": open /sys/fs/cgroup/devices/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod7e42411b_f680_4383_98ae_666062fbe3e5.slice/crio-009a83318e5230bcd0f71d606b909c5888df07a9eb2d1050de97165310a7e477.scope/devices.deny: no such file or directory"\n
OK, found the complete error message: Sep 28 23:11:32.621520 ci-op-8sdv5hzw-05fa4-52pzw-worker-b-9xfrj crio[1555]: time="2020-09-28 23:11:32.621445239Z" level=error msg="Container creation error: time=\"2020-09-28T23:11:27Z\" level=warning msg=\"Timed out while waiting for StartTransientUnit(crio-e3d6d7ca102d12e6de418b7f1cf3546fd3a00137b263751ac99ed12c92f9a4fb.scope) completion signal from dbus. Continuing...\"\ntime=\"2020-09-28T23:11:32Z\" level=warning msg=\"signal: killed\"\ntime=\"2020-09-28T23:11:32Z\" level=error msg=\"container_linux.go:370: starting container process caused: process_linux.go:459: container init caused: process_linux.go:422: setting cgroup config for procHooks process caused: Unit crio-e3d6d7ca102d12e6de418b7f1cf3546fd3a00137b263751ac99ed12c92f9a4fb.scope not found.\"\n" So, what happens in runc, in particular, in (*initProcess).start 1. in (*legacyManager).Apply() 1.1. It tries to start a transient unit (IOW create a .scope). 1.2. It waits for the above to finish, with a 1 second timeout. 1.3. Timeout is too small and it is hit, it prints a warning "Timed out while waiting for StartTransientUnit(...) completion signal from dbus. Continuing..." and proceeds. NOTE that because the timeout is hit, the error is not returned. 1.4. It creates all cgroups using the fs manager (this is a fallback since systemd does not handle all cgroups/parameters). 1.5. It adds the container init pid to those cgroups (successfully). 1.6. No error returned from Apply() ... 2. (*legacyManager).Set() tries to set systemd unit properties, failing since the unit/scope does not exist. An error is returned. Here's a relevant part of log for items 1.1 to 1.3 (different scope than the above, doesn't matter) 1. runc run is called, it creates a systemd job to create a scope: Sep 28 23:11:29.839591 ci-op-8sdv5hzw-05fa4-52pzw-worker-b-9xfrj systemd[1]: SELinux access check scon=system_u:system_r:container_runtime_t:s0 tcon=system_u:object_r:init_var_run_t:s0 tclass=service perm=start path=/run/systemd/transient/crio-1ba5f1c66bb22db8b45b367a1a7eb2cd6c0552cca34d5b684238b55118b73b1.scope cmdline=/usr/bin/runc --systemd-cgroup --root=/run/runc create --bundle /var/run/containers/storage/overlay-containers/61ba5f1c66bb22db8b45b367a1a7eb2cd6c0552cca34d5b684238b55118b73b1/userdata --pid-file /var/run/containers/storage/overlay-containers/61ba5f1c66bb22db8b45b367a1a7eb2cd6c0552cca34d5b684238b55118b73b1/userdata/pidfile 61ba5f1c66bb22db8b45b367a1a7eb2cd6c0552cca34d5b684238b55118b73b1: 0 Sep 28 23:11:29.839606 ci-op-8sdv5hzw-05fa4-52pzw-worker-b-9xfrj systemd[1]: crio-61ba5f1c66bb22db8b45b367a1a7eb2cd6c0552cca34d5b684238b55118b73b1.scope: Trying to enqueue job crio-61ba5f1c66bb22db8b45b367a1a7eb2cd6c0552cca34d5b684238b55118b73b1.scope/start/replace Sep 28 23:11:29.839638 ci-op-8sdv5hzw-05fa4-52pzw-worker-b-9xfrj systemd[1]: crio-61ba5f1c66bb22db8b45b367a1a7eb2cd6c0552cca34d5b684238b55118b73b1.scope: Installed new job crio-61ba5f1c66bb22db8b45b367a1a7eb2cd6c0552cca34d5b684238b55118b73b1.scope/start as 4712 Sep 28 23:11:29.839646 ci-op-8sdv5hzw-05fa4-52pzw-worker-b-9xfrj systemd[1]: crio-61ba5f1c66bb22db8b45b367a1a7eb2cd6c0552cca34d5b684238b55118b73b1.scope: Enqueued job crio-61ba5f1c66bb22db8b45b367a1a7eb2cd6c0552cca34d5b684238b55118b73b1.scope/start as 4712 2. scope is being created but it fails because, apparently, container process is gone: Sep 28 23:11:31.094480 ci-op-8sdv5hzw-05fa4-52pzw-worker-b-9xfrj systemd[1]: crio-61ba5f1c66bb22db8b45b367a1a7eb2cd6c0552cca34d5b684238b55118b73b1.scope: Failed to set blkio.weight: No such file or directory Sep 28 23:11:31.103372 ci-op-8sdv5hzw-05fa4-52pzw-worker-b-9xfrj systemd[1]: Couldn't stat device /dev/char/10:200: No such file or directory ... Sep 28 23:11:31.296963 ci-op-8sdv5hzw-05fa4-52pzw-worker-b-9xfrj systemd[1]: crio-61ba5f1c66bb22db8b45b367a1a7eb2cd6c0552cca34d5b684238b55118b73b1.scope: Couldn't move process 61730 to requested cgroup '/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pode34d6449_60dd_4abf_8f88_ddddd380f3c0.slice/crio-61ba5f1c66bb22db8b45b367a1a7eb2cd6c0552cca34d5b684238b55118b73b1.scope': No such process Sep 28 23:11:31.296976 ci-op-8sdv5hzw-05fa4-52pzw-worker-b-9xfrj systemd[1]: crio-61ba5f1c66bb22db8b45b367a1a7eb2cd6c0552cca34d5b684238b55118b73b1.scope: Failed to add PIDs to scope's control group: No such process Sep 28 23:11:31.296982 ci-op-8sdv5hzw-05fa4-52pzw-worker-b-9xfrj systemd[1]: crio-61ba5f1c66bb22db8b45b367a1a7eb2cd6c0552cca34d5b684238b55118b73b1.scope: Failed with result 'resources'. Sep 28 23:11:31.296995 ci-op-8sdv5hzw-05fa4-52pzw-worker-b-9xfrj systemd[1]: crio-61ba5f1c66bb22db8b45b367a1a7eb2cd6c0552cca34d5b684238b55118b73b1.scope changed dead -> failed Sep 28 23:11:31.297056 ci-op-8sdv5hzw-05fa4-52pzw-worker-b-9xfrj systemd[1]: Sent message type=signal sender=org.freedesktop.systemd1 destination=n/a path=/org/freedesktop/systemd1 interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=22 reply_cookie=0 signature=sa{sv}as error-name=n/a error-message=n/a Note that it takes more than a second to for systemd to reply, thus the timeout in runc is hit. Note also that at the time systemd tries to put a process into a cgroup, it is already gone, as runc was too desperate :) So, we have three bugs here 1. The timeout waiting for StartTransientUnit is too small. 2. The "timeout hit" error from StartTransientUnit is not returned. 3. Somehow we fail to kill container process(es) before cleanup. This is a separate issue.
> 110244:Sep 28 23:13:20.137 W ns/e2e-cronjob-1566 pod/concurrent-1601334780-bsr5c node/ci-op-xgzym9zi-1a9c0-qntl2-worker-c-5jw89 reason/FailedCreatePodSandBox Failed to create pod sandbox: rpc error: code = Unknown desc = container create failed: time="2020-09-28T23:13:19Z" level=warning msg="exit status 2"\ntime="2020-09-28T23:13:19Z" level=error msg="container_linux.go:370: starting container process caused: process_linux.go:459: container init caused: process_linux.go:422: setting cgroup config for procHooks process caused: failed to write \"c 248:* m\" to \"/sys/fs/cgroup/devices/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod7e42411b_f680_4383_98ae_666062fbe3e5.slice/crio-009a83318e5230bcd0f71d606b909c5888df07a9eb2d1050de97165310a7e477.scope/devices.deny\": open /sys/fs/cgroup/devices/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod7e42411b_f680_4383_98ae_666062fbe3e5.slice/crio-009a83318e5230bcd0f71d606b909c5888df07a9eb2d1050de97165310a7e477.scope/devices.deny: no such file or directory"\n This is also ultimately caused by the fact that we do not wait for the systemd to finish creating the unit. This feels like a race between runc and systemd to create/remove the cgroup. Finally, this all is happens on RT kernel because maybe it has some realtime tasks, so all the other (normal) tasks lag (and we hit the 1 sec timeout).
Fix for the systemd race: https://github.com/opencontainers/runc/pull/2614 The remaining issue now is "Failed to remove cgroup" messages. Fortunately I have a simple repro, need some more time to figure out what's going on.
The "Failed to remove cgroup" issue is addressed by https://github.com/opencontainers/runc/pull/2615 Now, can we test both PRs (at the same time or separately, whatever works best for you)? For convenience, I have created a branch that is exactly the same as the one which bug was observed on plus these two PRs: https://github.com/kolyshkin/runc/tree/bz1883640
I think the systemd cgroup problem has been already reported in the bug #1879152, which has been closed as dupe of the bug #1819868. Looks like there is also a fix on the systemd side required.
when running on the realtime kernel, we need to be able to reliably start and stop containers.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196