Description of problem: As per the z-stream fix https://bugzilla.redhat.com/show_bug.cgi?id=1934656#c7, the customer has successfully upgraded the cluster to v4.6.21 but the bare metal workers still facing the issue with CRI-O due to which pods are either getting stuck at ContainerCreating or Terminating phase. Mar 23 20:42:22 [host_44] crio[3612]: time="2021-03-23 20:42:22.278230942Z" level=warning msg="Error reserving ctr name k8s_application_app_name3 for id 55f93365d7453a9f6e72aecc5baf3de1b4424871a9eeed771ff29dd3741c6411: name is reserved" Mar 23 20:23:39 [host_44] crio[2973]: time="2021-03-23 20:23:39.206376994Z" level=warning msg="Stopping container cdfd6fb1057a7fe40f0b7a67882fa645dfc978470bdecce403c898f5fc11dce6 with stop signal timed out: timeout reached after 30 seconds waiting for container process to exit" Version-Release number of selected component (if applicable): v4.6.21 How reproducible: Always on BareMetal nodes Actual results: level=warning msg="Error reserving ctr name Expected results: Pods should not be stuck in the ContainerCreating phase. Additional info: This issue is usually affecting the bare metal workers only which are being heavily used as compared to other workers.
Kir, can you take a look and see if there's anything fishy about runc here?
A similar bug in 4.6:https://bugzilla.redhat.com/show_bug.cgi?id=1934656
This might be a dupe of #1903228 -- alas, I don't have anything to say at this time.
Copying the status update I have provided at https://bugzilla.redhat.com/show_bug.cgi?id=1903228#c35: It would make sense to test if my fix (upstream: https://github.com/opencontainers/runc/pull/2918, 4.6 backport: https://github.com/projectatomic/runc/pull/47) helps or not. I see that both PRs were merged, but I'm not sure if RPMs are available.
RPMs are available now
$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.8.0-0.nightly-2021-05-12-122225 True False 102m Cluster version is 4.8.0-0.nightly-2021-05-12-122225 sh-4.4# chroot /host sh-4.4# rpm -qa | grep runc runc-1.0.0-95.rhaos4.8.gitcd80260.el8.x86_64
An additional fix (https://github.com/projectatomic/runc/pull/52) went into runc-1.0.0-86.rhaos4.6.git23384e2, please re-test with that one.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438