1773905 – [CNV deploy] nmstate pod 2.2.0-8 state is flakey

Bug 1773905 - [CNV deploy] nmstate pod 2.2.0-8 state is flakey

Summary: [CNV deploy] nmstate pod 2.2.0-8 state is flakey

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Container Native Virtualization (CNV)
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	2.2.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	urgent
Target Milestone:	---
Target Release:	2.2.0
Assignee:	Quique Llorente
QA Contact:	Yossi Segev
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-11-19 10:00 UTC by Tareq Alayan
Modified:	2023-09-14 05:47 UTC (History)
CC List:	6 users (show)
Fixed In Version:	cluster-network-addons-operator-container-v2.2.0-5
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-01-30 16:27:30 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHEA-2020:0307	0	None	None	None	2020-01-30 16:27:42 UTC

Description Tareq Alayan 2019-11-19 10:00:06 UTC

Description of problem:
nmstate pods are flakey 
they keep going from Ready to CrashLoopBackOff state

{"level":"info","ts":1574152115.700579,"logger":"cmd","msg":"failed to initialize service object for metrics: pods \"nmstate-handler-4bzgg\" is forbidden: User \"system:serviceaccount:openshift-cnv:nmstate-handler\" cannot get resource \"pods\" in API group \"\" in the namespace \"openshift-cnv\""}



Version-Release number of selected component (if applicable):

registry-proxy.engineering.redhat.com/rh-osbs/container-native-virtualization-kubernetes-nmstate-handler-rhel8:v2.2.0-8
How reproducible:
always 

Steps to Reproduce:
1. deploy cnv 

Actual results:


Expected results:


Additional info:

Comment 1 Quique Llorente 2019-11-19 10:58:57 UTC

Real issue is OOMKiller triggered by resources limit at nmstate-handler pods

The upstream fix is this https://github.com/kubevirt/cluster-network-addons-operator/pull/263

This can be fixes after deploy with the following command:

oc patch ds -n openshift-cnv nmstate-handler --patch '{"spec": {"template": {"spec":{ "containers": [{"name": "nmstate-handler", "resources": {"requests": {"cpu": "200m", "memory": "120Mi" }, "limits": {"cpu": "200m", "memory": "120Mi" }}}]}}}}'

Comment 2 Dan Kenigsberg 2019-11-20 13:50:17 UTC

Waiting for CPaaS to pick up upstream, chew it, and push it to errata.

Comment 3 Quique Llorente 2019-11-21 15:13:44 UTC

Errata is has it and it's a QE now 
https://errata.devel.redhat.com/errata?search=CNV+2.2.0

Comment 4 Nelly Credi 2019-11-25 08:08:55 UTC

please add fixed in version

Comment 5 Quique Llorente 2019-11-25 08:17:57 UTC

Added the CNAO version

Comment 6 Yossi Segev 2019-11-27 13:51:15 UTC

nmstate-handler pods are running without restart for more than 4 hours.
Verified in a cluster with OCP4.3/CNV2.2m with cluster-network-addons-operator:v2.2.0-5.

Comment 8 errata-xmlrpc 2020-01-30 16:27:30 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2020:0307

Comment 9 Red Hat Bugzilla 2023-09-14 05:47:10 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days

Note You need to log in before you can comment on or make changes to this bug.