Description of problem: nmstate pods are flakey they keep going from Ready to CrashLoopBackOff state {"level":"info","ts":1574152115.700579,"logger":"cmd","msg":"failed to initialize service object for metrics: pods \"nmstate-handler-4bzgg\" is forbidden: User \"system:serviceaccount:openshift-cnv:nmstate-handler\" cannot get resource \"pods\" in API group \"\" in the namespace \"openshift-cnv\""} Version-Release number of selected component (if applicable): registry-proxy.engineering.redhat.com/rh-osbs/container-native-virtualization-kubernetes-nmstate-handler-rhel8:v2.2.0-8 How reproducible: always Steps to Reproduce: 1. deploy cnv Actual results: Expected results: Additional info:
Real issue is OOMKiller triggered by resources limit at nmstate-handler pods The upstream fix is this https://github.com/kubevirt/cluster-network-addons-operator/pull/263 This can be fixes after deploy with the following command: oc patch ds -n openshift-cnv nmstate-handler --patch '{"spec": {"template": {"spec":{ "containers": [{"name": "nmstate-handler", "resources": {"requests": {"cpu": "200m", "memory": "120Mi" }, "limits": {"cpu": "200m", "memory": "120Mi" }}}]}}}}'
Waiting for CPaaS to pick up upstream, chew it, and push it to errata.
Errata is has it and it's a QE now https://errata.devel.redhat.com/errata?search=CNV+2.2.0
please add fixed in version
Added the CNAO version
nmstate-handler pods are running without restart for more than 4 hours. Verified in a cluster with OCP4.3/CNV2.2m with cluster-network-addons-operator:v2.2.0-5.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2020:0307
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days