Bug 2023694

Summary:	[OCP 4.9] CRI-O failing with: error reserving ctr name at openshift-dns afte upgrade from OCP 4.8
Product:	OpenShift Container Platform	Reporter:	Quique Llorente <ellorent>
Component:	Node	Assignee:	Peter Hunt <pehunt>
Node sub component:	CRI-O	QA Contact:	Sunil Choudhary <schoudha>
Status:	CLOSED NOTABUG	Docs Contact:
Severity:	high
Priority:	high	CC:	aos-bugs, fdeutsch, guchen, kfryklun, mdekan, nchhabra, pehunt, phoracek, ycui
Version:	4.9	Flags:	pehunt: needinfo-
Target Milestone:	---
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2022-02-22 12:05:13 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Quique Llorente 2021-11-16 11:30:15 UTC

Description of problem:

After upgrading a heavy loaded openshift virtualization cluster with planty of virtual machines from OCP 4.8 to OCP 4.9 some pods ends with ContainerCreateError, one of the pods is openshift-dns, checking the must-gather the error is relatd to CRI-O:

event-filter.html:reserving ctr name k8s_dns_dns-default-2w7lt_openshift-dns_4f337baf-def9-4556-ae12-a40762295d35_2 for id 2f90fa55356c04a13064d9547eac4303ebe6282de84feb6d89acaea581a960a7: name is 

Version-Release number of selected component (if applicable): 4.9


How reproducible:
First time we see this at a redhat openshift virtualization production cluster after upgrade.


Steps to Reproduce:
1.
2.
3.

Actual results:
Pod with errors blocking upgrade:
$ oc get pod --all-namespaces |grep Error
openshift-cnv                                      nmstate-handler-c6rdm                                                0/1       CreateContainerError        1 (6d23h ago)     7d
openshift-dns                                      dns-default-2w7lt                                                    1/2       CreateContainerError        2 (6d5h ago)      7d

$ oc get event -n openshift-dns
LAST SEEN   TYPE      REASON    OBJECT                  MESSAGE
4m21s       Warning   Failed    pod/dns-default-2w7lt   Error: context deadline exceeded
48m         Warning   Failed    pod/dns-default-2w7lt   (combined from similar events): Error: Kubelet may be retrying requests that are timing out in CRI-O due to system load: context deadline exceeded: error reserving ctr name k8s_dns_dns-default-2w7lt_openshift-dns_4f337baf-def9-4556-ae12-a40762295d35_2 for id 7802c1462c4f726770eec35502e0da1036252f28cedb4b4914f4831667ed5fb2: name is reserved

$ oc get pod -n openshift-cnv nmstate-handler-c6rdm -o json  |jq .status.containerStatuses[0].state
{
  "waiting": {
    "message": "error reserving ctr name k8s_nmstate-handler_nmstate-handler-c6rdm_openshift-cnv_00ff5365-60f7-435e-a049-d46cb2d0dce5_2 for id 5b30c156edfa50cf07533d070d5097d7e31e3949a4bb211fa84ce96163217322: name is reserved",
    "reason": "CreateContainerError"
  }
}

Expected results:
All pods should be at Running state after upgrade.


Additional info:

It looks related to https://bugzilla.redhat.com/show_bug.cgi?id=1806000

Also this is a baremetal installation, there is no virt provider.

Comment 9 Quique Llorente 2021-11-18 06:24:05 UTC

Comment 14 Fabian Deutsch 2021-12-06 14:25:47 UTC

DO we have an understanding of what is going on?

Comment 15 Peter Hunt 2021-12-08 17:27:52 UTC

Unfortunately not yet, I have been swamped with other tasks

Comment 21 Red Hat Bugzilla 2023-09-15 01:50:01 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 365 days