Description of problem: Multiple issues seen on OCP 4.10 Cluster with nmstate operator pod installed on POWER platform, while running e2e test Issue 1: e2e test is failing because,nmstate-operator pod is running on master node in best-effort QoS. [root@rdr-sh-4vlan-bastion-0 ~]# oc get pod -A -o wide | grep nmstate-operator openshift-nmstate nmstate-operator-678c5c4448-8b748 1/1 Running 0 26h 10.128.0.54 master-2 <none> <none> oc get pods -n openshift-nmstate -o name | (xargs -I% oc describe % -n openshift-nmstate | grep QoS) QoS Class: BestEffort Issue 2: This test failed because, managed cluster does not have cpu and memory request cat failed3.txt |grep nmstate -n 35: apps/v1/Deployment/openshift-nmstate/nmstate-operator/container/nmstate-operator does not have a cpu request (rule: "apps/v1/Deployment/openshift-nmstate/nmstate-operator/container/nmstate-operator/request[cpu]") 36: apps/v1/Deployment/openshift-nmstate/nmstate-operator/container/nmstate-operator does not have a memory request (rule: "apps/v1/Deployment/openshift-nmstate/nmstate-operator/container/nmstate-operator/request[memory]") Issue 3: This test failed : Managed cluster should ensure platform components have system-* priority class associated 1 pods found with invalid priority class 1 pods found with invalid priority class (should be openshift-user-critical or begin with system-): openshift-nmstate/nmstate-operator-678c5c4448-8b748 (currently "") Version-Release number of selected component (if applicable): How reproducible: Always Steps to Reproduce: 1. Deploy kubernetes-nmstate-operator 2. Check priorityClass and qosClass Actual results: qosClass is at BestEffort There is no priorityClass Expected results: qosClass at operator pod has to be at least at Burstable priorityClass at operator pod has to be system-cluster-critical Additional info:
The u/s fixes: - https://github.com/nmstate/kubernetes-nmstate/pull/972 - https://github.com/nmstate/kubernetes-nmstate/pull/971
(In reply to Quique Llorente from comment #1) > The u/s fixes: > - https://github.com/nmstate/kubernetes-nmstate/pull/972 > - https://github.com/nmstate/kubernetes-nmstate/pull/971 We want to verify the fix on OCP 4.10 cluster. Which build will have the fix included?, please confirm. Thanks.
(In reply to Julie from comment #3) > (In reply to Quique Llorente from comment #1) > > The u/s fixes: > > - https://github.com/nmstate/kubernetes-nmstate/pull/972 > > - https://github.com/nmstate/kubernetes-nmstate/pull/971 > > > We want to verify the fix on OCP 4.10 cluster. Which build will have the fix > included?, please confirm. > Thanks. Both the PRs listed above are Merged. Will it be possible to verify the fix on OCP 4.10? or is it targetted for 4.11 only? Please confirm. Thanks. @Quique Llorente
Tried to verify fix on ocp4.11 cluster. Nmstate operator version: 4.11.0-202203240844 Cluster version: [root@rdr-shw-411nm-mon01-bastion-0 ~]# oc version Client Version: 4.11.0-0.nightly-ppc64le-2022-03-27-144303 Server Version: 4.11.0-0.nightly-ppc64le-2022-03-27-144303 Kubernetes Version: v1.23.3+2a2851c Issue 1: Some of the pods still running in BestEffort Qos class [root@rdr-shw-411nm-mon01-bastion-0 ~]# oc get pods -n openshift-nmstate -o name | (xargs -I% oc describe % -n openshift-nmstate | grep QoS) QoS Class: Burstable QoS Class: Burstable QoS Class: Burstable QoS Class: Burstable QoS Class: Burstable QoS Class: Burstable QoS Class: BestEffort QoS Class: Burstable QoS Class: Burstable [root@rdr-shw-411nm-mon01-bastion-0]# oc get pod -n openshift-nmstate NAME READY STATUS RESTARTS AGE nmstate-cert-manager-854cf745-wzkg5 1/1 Running 0 16h nmstate-handler-6jpch 1/1 Running 0 16h nmstate-handler-hvw6q 1/1 Running 0 16h nmstate-handler-nflxc 1/1 Running 0 16h nmstate-handler-prnc7 1/1 Running 0 16h nmstate-handler-zkp5q 1/1 Running 0 16h nmstate-operator-7797c67864-22l47 1/1 Running 0 16h nmstate-webhook-6b77ddc9-654l5 1/1 Running 0 16h nmstate-webhook-6b77ddc9-nv4t7 1/1 Running 0 16h [root@rdr-shw-411nm-mon01-bastion-0]# oc describe pod nmstate-operator-7797c67864-22l47 -n openshift-nmstate Name: nmstate-operator-7797c67864-22l47 Namespace: openshift-nmstate Priority: 0 Node: mon01-master-0.rdr-shw-411nm.redhat.com/193.168.200.116 Start Time: Mon, 28 Mar 2022 07:47:41 -0400 Labels: app=kubernetes-nmstate-operator name=kubernetes-nmstate-operator pod-template-hash=7797c67864 Annotations: alm-examples: [{ "apiVersion": "nmstate.io/v1", "kind": "NMState", "metadata": { "name": "nmstate" } }] capabilities: Basic Install categories: OpenShift Optional certified: false containerImage: registry.redhat.io/openshift4/kubernetes-nmstate-rhel8-operator@sha256:cc9efa6f9644e1dea8c526a6ef9a8c5b89d6cdf6e4da00633cb6db4e066579c5 createdAt: 2022-02-21 08:46:16 description: Kubernetes NMState is a declaritive means of configuring NetworkManager. k8s.v1.cni.cncf.io/network-status: [{ "name": "openshift-sdn", "interface": "eth0", "ips": [ "10.130.0.68" ], "default": true, "dns": {} }] k8s.v1.cni.cncf.io/networks-status: [{ "name": "openshift-sdn", "interface": "eth0", "ips": [ "10.130.0.68" ], "default": true, "dns": {} }] olm.operatorGroup: openshift-nmstate-fgpbl olm.operatorNamespace: openshift-nmstate olm.skipRange: >=4.3.0 <4.11.0 olm.targetNamespaces: openshift-nmstate openshift.io/scc: restricted operatorframework.io/properties: {"properties":[{"type":"olm.package","value":{"packageName":"kubernetes-nmstate-operator","version":"4.11.0-202203240844"}},{"type":"olm.g... operatorframework.io/suggested-namespace: openshift-nmstate operators.openshift.io/infrastructure-features: ["disconnected"] repository: https://github.com/openshift/kubernetes-nmstate support: Red Hat, Inc. Status: Running IP: 10.130.0.68 IPs: IP: 10.130.0.68 Controlled By: ReplicaSet/nmstate-operator-7797c67864 Containers: nmstate-operator: Container ID: cri-o://cc415aba03529140d436b052855e57f22b2eaf8f153ec6b05d8cb32bea1d3850 Image: registry.redhat.io/openshift4/kubernetes-nmstate-rhel8-operator@sha256:cc9efa6f9644e1dea8c526a6ef9a8c5b89d6cdf6e4da00633cb6db4e066579c5 Image ID: registry.redhat.io/openshift4/kubernetes-nmstate-rhel8-operator@sha256:cc9efa6f9644e1dea8c526a6ef9a8c5b89d6cdf6e4da00633cb6db4e066579c5 Port: <none> Host Port: <none> Command: manager State: Running Started: Mon, 28 Mar 2022 07:48:00 -0400 Ready: True Restart Count: 0 Environment: WATCH_NAMESPACE: (v1:metadata.annotations['olm.targetNamespaces']) OPERATOR_NAME: kubernetes-nmstate-operator ENABLE_PROFILER: False PROFILER_PORT: 6060 RUN_OPERATOR: HANDLER_IMAGE: registry.redhat.io/openshift4/ose-kubernetes-nmstate-handler-rhel8@sha256:faea8a56a64429156160f67b002f087affbb17534e813f1dd246cd06d36bfacb HANDLER_IMAGE_PULL_POLICY: Always HANDLER_NAMESPACE: openshift-nmstate OPERATOR_CONDITION_NAME: kubernetes-nmstate-operator.4.11.0-202203240844 Mounts: /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-jcwjk (ro) Conditions: Type Status Initialized True Ready True ContainersReady True PodScheduled True Volumes: kube-api-access-jcwjk: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: <nil> DownwardAPI: true ConfigMapName: openshift-service-ca.crt ConfigMapOptional: <nil> QoS Class: BestEffort Node-Selectors: node-role.kubernetes.io/master= Tolerations: node-role.kubernetes.io/master:NoSchedule op=Exists node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: <none> Issue 2: Managed cluster does not have cpu and memory request [root@rdr-shw-411nm-mon01-bastion-0 origin]# cat second.txt |grep nmstate -n 35: apps/v1/Deployment/openshift-nmstate/nmstate-operator/container/nmstate-operator does not have a cpu request (rule: "apps/v1/Deployment/openshift-nmstate/nmstate-operator/container/nmstate-operator/request[cpu]") 36: apps/v1/Deployment/openshift-nmstate/nmstate-operator/container/nmstate-operator does not have a memory request (rule: "apps/v1/Deployment/openshift-nmstate/nmstate-operator/container/nmstate-operator/request[memory]") Issue 3: Managed cluster should ensure platform components have system-* priority class associated 1 pods found with invalid priority class (should be openshift-user-critical or begin with system-): openshift-nmstate/nmstate-operator-7797c67864-22l47 (currently "")
(In reply to shweta from comment #16) > Tried to verify fix on ocp4.11 cluster. > > Nmstate operator version: 4.11.0-202203240844 > > Cluster version: > > [root@rdr-shw-411nm-mon01-bastion-0 ~]# oc version > Client Version: 4.11.0-0.nightly-ppc64le-2022-03-27-144303 > Server Version: 4.11.0-0.nightly-ppc64le-2022-03-27-144303 > Kubernetes Version: v1.23.3+2a2851c @cstabler the issue is reproduced on 4.11 on Power. Please check. Thanks
Re-tested on OCP4.11. Issue persists with 'Kubernetes NMState Operator 4.11.0-202203281806'. On describing nmstate-operator pod, it shows that it is running on master node in 'Best-Effort' QoS. FYI:
Tried to verify fix on ocp4.11 cluster. @cstabler The issue is fixed. Thanks. Nmstate operator version: 4.11.0-202204221007 Cluster version: [root@rdr-shw-411nm-mon01-bastion-0 ~]# oc version Client Version: 4.11.0-0.nightly-ppc64le-2022-04-26-072249 Kustomize Version: v4.5.4 Server Version: 4.11.0-0.nightly-ppc64le-2022-04-26-072249 Kubernetes Version: v1.23.3+d464c70 Now all pods are running in Burstable QoS class. [root@rdr-shw-411nm-mon01-bastion-0 ~]# oc get pods -n openshift-nmstate -o name | (xargs -I% oc describe % -n openshift-nmstate | grep QoS) QoS Class: Burstable QoS Class: Burstable QoS Class: Burstable QoS Class: Burstable QoS Class: Burstable QoS Class: Burstable QoS Class: Burstable QoS Class: Burstable QoS Class: Burstable
Verified on oc version: 4.11.0-0.nightly-2022-04-26-181148 nmstate version: kubernetes-nmstate-operator.4.11.0-202204291648 [kni@provisionhost-0-0 ~]$ oc get pods -n openshift-nmstate -o name | (xargs -I% oc describe % -n openshift-nmstate | grep QoS) QoS Class: Burstable QoS Class: Burstable QoS Class: Burstable QoS Class: Burstable QoS Class: Burstable QoS Class: Burstable QoS Class: Burstable QoS Class: Burstable QoS Class: Burstable
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 365 days