Bug 2023694

Summary: [OCP 4.9] CRI-O failing with: error reserving ctr name at openshift-dns afte upgrade from OCP 4.8
Product: OpenShift Container Platform Reporter: Quique Llorente <ellorent>
Component: NodeAssignee: Peter Hunt <pehunt>
Node sub component: CRI-O QA Contact: Sunil Choudhary <schoudha>
Status: CLOSED NOTABUG Docs Contact:
Severity: high    
Priority: high CC: aos-bugs, fdeutsch, guchen, kfryklun, mdekan, nchhabra, pehunt, phoracek, ycui
Version: 4.9Flags: pehunt: needinfo-
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-02-22 12:05:13 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Quique Llorente 2021-11-16 11:30:15 UTC
Description of problem:

After upgrading a heavy loaded openshift virtualization cluster with planty of virtual machines from OCP 4.8 to OCP 4.9 some pods ends with ContainerCreateError, one of the pods is openshift-dns, checking the must-gather the error is relatd to CRI-O:

event-filter.html:reserving ctr name k8s_dns_dns-default-2w7lt_openshift-dns_4f337baf-def9-4556-ae12-a40762295d35_2 for id 2f90fa55356c04a13064d9547eac4303ebe6282de84feb6d89acaea581a960a7: name is 

Version-Release number of selected component (if applicable): 4.9


How reproducible:
First time we see this at a redhat openshift virtualization production cluster after upgrade.


Steps to Reproduce:
1.
2.
3.

Actual results:
Pod with errors blocking upgrade:
$ oc get pod --all-namespaces |grep Error
openshift-cnv                                      nmstate-handler-c6rdm                                                0/1       CreateContainerError        1 (6d23h ago)     7d
openshift-dns                                      dns-default-2w7lt                                                    1/2       CreateContainerError        2 (6d5h ago)      7d

$ oc get event -n openshift-dns
LAST SEEN   TYPE      REASON    OBJECT                  MESSAGE
4m21s       Warning   Failed    pod/dns-default-2w7lt   Error: context deadline exceeded
48m         Warning   Failed    pod/dns-default-2w7lt   (combined from similar events): Error: Kubelet may be retrying requests that are timing out in CRI-O due to system load: context deadline exceeded: error reserving ctr name k8s_dns_dns-default-2w7lt_openshift-dns_4f337baf-def9-4556-ae12-a40762295d35_2 for id 7802c1462c4f726770eec35502e0da1036252f28cedb4b4914f4831667ed5fb2: name is reserved

$ oc get pod -n openshift-cnv nmstate-handler-c6rdm -o json  |jq .status.containerStatuses[0].state
{
  "waiting": {
    "message": "error reserving ctr name k8s_nmstate-handler_nmstate-handler-c6rdm_openshift-cnv_00ff5365-60f7-435e-a049-d46cb2d0dce5_2 for id 5b30c156edfa50cf07533d070d5097d7e31e3949a4bb211fa84ce96163217322: name is reserved",
    "reason": "CreateContainerError"
  }
}

Expected results:
All pods should be at Running state after upgrade.


Additional info:

It looks related to https://bugzilla.redhat.com/show_bug.cgi?id=1806000

Also this is a baremetal installation, there is no virt provider.

Comment 9 Quique Llorente 2021-11-18 06:24:05 UTC
@

Comment 14 Fabian Deutsch 2021-12-06 14:25:47 UTC
DO we have an understanding of what is going on?

Comment 15 Peter Hunt 2021-12-08 17:27:52 UTC
Unfortunately not yet, I have been swamped with other tasks

Comment 21 Red Hat Bugzilla 2023-09-15 01:50:01 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 365 days