Description of problem: After upgrading a heavy loaded openshift virtualization cluster with planty of virtual machines from OCP 4.8 to OCP 4.9 some pods ends with ContainerCreateError, one of the pods is openshift-dns, checking the must-gather the error is relatd to CRI-O: event-filter.html:reserving ctr name k8s_dns_dns-default-2w7lt_openshift-dns_4f337baf-def9-4556-ae12-a40762295d35_2 for id 2f90fa55356c04a13064d9547eac4303ebe6282de84feb6d89acaea581a960a7: name is Version-Release number of selected component (if applicable): 4.9 How reproducible: First time we see this at a redhat openshift virtualization production cluster after upgrade. Steps to Reproduce: 1. 2. 3. Actual results: Pod with errors blocking upgrade: $ oc get pod --all-namespaces |grep Error openshift-cnv nmstate-handler-c6rdm 0/1 CreateContainerError 1 (6d23h ago) 7d openshift-dns dns-default-2w7lt 1/2 CreateContainerError 2 (6d5h ago) 7d $ oc get event -n openshift-dns LAST SEEN TYPE REASON OBJECT MESSAGE 4m21s Warning Failed pod/dns-default-2w7lt Error: context deadline exceeded 48m Warning Failed pod/dns-default-2w7lt (combined from similar events): Error: Kubelet may be retrying requests that are timing out in CRI-O due to system load: context deadline exceeded: error reserving ctr name k8s_dns_dns-default-2w7lt_openshift-dns_4f337baf-def9-4556-ae12-a40762295d35_2 for id 7802c1462c4f726770eec35502e0da1036252f28cedb4b4914f4831667ed5fb2: name is reserved $ oc get pod -n openshift-cnv nmstate-handler-c6rdm -o json |jq .status.containerStatuses[0].state { "waiting": { "message": "error reserving ctr name k8s_nmstate-handler_nmstate-handler-c6rdm_openshift-cnv_00ff5365-60f7-435e-a049-d46cb2d0dce5_2 for id 5b30c156edfa50cf07533d070d5097d7e31e3949a4bb211fa84ce96163217322: name is reserved", "reason": "CreateContainerError" } } Expected results: All pods should be at Running state after upgrade. Additional info: It looks related to https://bugzilla.redhat.com/show_bug.cgi?id=1806000 Also this is a baremetal installation, there is no virt provider.
@
DO we have an understanding of what is going on?
Unfortunately not yet, I have been swamped with other tasks
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 365 days