[sig-arch] events should not repeat pathologically Since the 1.22 rebase, we've seen these events repeating: event happened 47 times, something is wrong: ns/openshift-cluster-node-tuning-operator pod/tuned-9vncf node/ip-10-0-203-226.ec2.internal - reason/FailedCreatePodContainer unable to ensure pod container exists: failed to create container for [kubepods burstable pod4724bbf2-ae65-4296-87a2-f36f56b7cc03] : Unit kubepods-burstable-pod4724bbf2_ae65_4296_87a2_f36f56b7cc03.slice already exists. see: https://search.ci.openshift.org/?search=Unit+.*.slice+already+exists&maxAge=48h&context=1&type=bug%2Bjunit&name=4.9&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job This has an issue upstream in kubernetes, see https://github.com/kubernetes/kubernetes/issues/104280 for details.
Fixed in https://github.com/opencontainers/runc/releases/tag/v1.0.2 by https://github.com/opencontainers/runc/pull/3167
Now we need to vendor it to all kubernetes releases master: https://github.com/kubernetes/kubernetes/pull/104528 1.22: https://github.com/kubernetes/kubernetes/pull/104529 1.21: https://github.com/kubernetes/kubernetes/pull/104530 I think that we should not try to bring it to 1.20 (the runc in there does the freeze so it's affected, but the amount of changes to backport is too high).
*** Bug 1996755 has been marked as a duplicate of this bug. ***
The bug is valid for 4.8 and 4.9. Fixed by 4.9: https://github.com/openshift/kubernetes/pull/910 4.8: https://github.com/openshift/kubernetes/pull/912
Strictly speaking, we have two bugs here. One is "unit already exists", introduced in runc rc94 and it's fixed in runc 1.0.0. Another is "cgroup freeze", introduced in runc rc92, mostly fixed in runc 1.0.1, fully fixed in runc 1.0.2 (or so we hope).
> Another is "cgroup freeze", introduced in runc rc92, mostly fixed in runc 1.0.1, fully fixed in runc 1.0.2 (or so we hope). Pardon me, this was introduced in rc91 (runc commit b810da149), not rc92.
*** Bug 1996187 has been marked as a duplicate of this bug. ***
This is not about version of a standalone runc binary being used, this is about version of runc's libcontainer imported by kubelet during compilation. Now, I am not sure, if 4.9.0-0.nightly-2021-09-01-193941 includes https://github.com/openshift/kubernetes/pull/910
Checked again and I see last error was 5 days ago. https://search.ci.openshift.org/?search=Unit+.*.slice+already+exists&maxAge=336h&context=1&type=junit&name=4.9&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job
noticing "[sig-arch] events should not repeat pathologically" test failure on 4.9 to 4.10 upgrade CI jobs / s390x from today (2021-09-21) : [sig-arch] events should not repeat pathologically expand_less 0s 1 events happened too frequently event happened 38 times, something is wrong: ns/openshift-machine-api machine/libvirt-s390x-1-3-708-542pk-worker-0-qntpm - reason/Updated Updated Machine libvirt-s390x-1-3-708-542pk-worker-0-qntpm https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-multiarch-master-nightly-4.10-upgrade-from-nightly-4.9-ocp-remote-libvirt-s390x/1440148923523010560
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:3759