Bug 2046191

Summary: Opeartor pod is missing correct qosClass and priorityClass
Product: OpenShift Container Platform Reporter: Quique Llorente <ellorent>
Component: NetworkingAssignee: Christoph Stäbler <cstabler>
Networking sub component: kubernetes-nmstate-operator QA Contact: Aleksandra Malykhin <amalykhi>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: medium CC: amit.ugol, aos-bugs, augol, cstabler, mjulie, sbiragda
Version: 4.10Flags: amalykhi: needinfo-
cstabler: needinfo-
Target Milestone: ---   
Target Release: 4.11.0   
Hardware: powerpc   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of:
: 2059095 (view as bug list) Environment:
Last Closed: 2022-08-10 10:43:43 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2059095    

Description Quique Llorente 2022-01-26 11:10:29 UTC
Description of problem:

Multiple issues seen on OCP 4.10 Cluster with nmstate operator pod installed on POWER platform, while running e2e test

Issue 1:
e2e test is failing because,nmstate-operator pod is running on master node in best-effort QoS.
[root@rdr-sh-4vlan-bastion-0 ~]# oc get pod -A -o wide | grep nmstate-operator
openshift-nmstate                                  nmstate-operator-678c5c4448-8b748                                 1/1     Running     0               26h     10.128.0.54    master-2   <none>           <none>


oc get pods -n openshift-nmstate -o name | (xargs -I% oc describe % -n openshift-nmstate | grep QoS)
QoS Class:                   BestEffort

Issue 2:
This test failed because, managed cluster does not have cpu and memory request
cat failed3.txt |grep nmstate -n
35:  apps/v1/Deployment/openshift-nmstate/nmstate-operator/container/nmstate-operator does not have a cpu request (rule: "apps/v1/Deployment/openshift-nmstate/nmstate-operator/container/nmstate-operator/request[cpu]")
36:  apps/v1/Deployment/openshift-nmstate/nmstate-operator/container/nmstate-operator does not have a memory request (rule: "apps/v1/Deployment/openshift-nmstate/nmstate-operator/container/nmstate-operator/request[memory]")

Issue 3:
This test failed : Managed cluster should ensure platform components have system-* priority class associated
1 pods found with invalid priority class
1 pods found with invalid priority class (should be openshift-user-critical or begin with system-):
openshift-nmstate/nmstate-operator-678c5c4448-8b748 (currently "")


Version-Release number of selected component (if applicable):


How reproducible: Always


Steps to Reproduce:
1. Deploy kubernetes-nmstate-operator
2. Check priorityClass and qosClass


Actual results:
qosClass is at BestEffort
There is no priorityClass


Expected results:
qosClass at operator pod has to be at least at Burstable
priorityClass at operator pod has to be system-cluster-critical


Additional info:

Comment 3 Julie 2022-02-01 07:27:36 UTC
(In reply to Quique Llorente from comment #1)
> The u/s fixes:
> - https://github.com/nmstate/kubernetes-nmstate/pull/972
> - https://github.com/nmstate/kubernetes-nmstate/pull/971


We want to verify the fix on OCP 4.10 cluster. Which build will have the fix included?, please confirm. 
Thanks.

Comment 5 Julie 2022-02-21 14:03:23 UTC
(In reply to Julie from comment #3)
> (In reply to Quique Llorente from comment #1)
> > The u/s fixes:
> > - https://github.com/nmstate/kubernetes-nmstate/pull/972
> > - https://github.com/nmstate/kubernetes-nmstate/pull/971
> 
> 
> We want to verify the fix on OCP 4.10 cluster. Which build will have the fix
> included?, please confirm. 
> Thanks.


Both the PRs listed above are Merged.
Will it be possible to verify the fix on OCP 4.10? or is it targetted for 4.11 only? Please confirm. Thanks.
@Quique Llorente

Comment 16 shweta 2022-03-29 05:13:53 UTC
Tried to verify fix on ocp4.11 cluster.

Nmstate operator version: 4.11.0-202203240844

Cluster version:

[root@rdr-shw-411nm-mon01-bastion-0 ~]# oc version
Client Version: 4.11.0-0.nightly-ppc64le-2022-03-27-144303
Server Version: 4.11.0-0.nightly-ppc64le-2022-03-27-144303
Kubernetes Version: v1.23.3+2a2851c

Issue 1: Some of the pods still running in BestEffort Qos class

[root@rdr-shw-411nm-mon01-bastion-0 ~]# oc get pods -n openshift-nmstate -o name | (xargs -I% oc describe % -n openshift-nmstate | grep QoS)
QoS Class:                   Burstable
QoS Class:                   Burstable
QoS Class:                   Burstable
QoS Class:                   Burstable
QoS Class:                   Burstable
QoS Class:                   Burstable
QoS Class:                   BestEffort
QoS Class:                   Burstable
QoS Class:                   Burstable

[root@rdr-shw-411nm-mon01-bastion-0]# oc get pod -n  openshift-nmstate
NAME                                  READY   STATUS    RESTARTS   AGE
nmstate-cert-manager-854cf745-wzkg5   1/1     Running   0          16h
nmstate-handler-6jpch                 1/1     Running   0          16h
nmstate-handler-hvw6q                 1/1     Running   0          16h
nmstate-handler-nflxc                 1/1     Running   0          16h
nmstate-handler-prnc7                 1/1     Running   0          16h
nmstate-handler-zkp5q                 1/1     Running   0          16h
nmstate-operator-7797c67864-22l47     1/1     Running   0          16h
nmstate-webhook-6b77ddc9-654l5        1/1     Running   0          16h
nmstate-webhook-6b77ddc9-nv4t7        1/1     Running   0          16h


[root@rdr-shw-411nm-mon01-bastion-0]# oc describe pod nmstate-operator-7797c67864-22l47  -n  openshift-nmstate
Name:         nmstate-operator-7797c67864-22l47
Namespace:    openshift-nmstate
Priority:     0
Node:         mon01-master-0.rdr-shw-411nm.redhat.com/193.168.200.116
Start Time:   Mon, 28 Mar 2022 07:47:41 -0400
Labels:       app=kubernetes-nmstate-operator
              name=kubernetes-nmstate-operator
              pod-template-hash=7797c67864
Annotations:  alm-examples:
                [{
                  "apiVersion": "nmstate.io/v1",
                  "kind": "NMState",
                  "metadata": {
                    "name": "nmstate"
                  }
                }]
              capabilities: Basic Install
              categories: OpenShift Optional
              certified: false
              containerImage:
                registry.redhat.io/openshift4/kubernetes-nmstate-rhel8-operator@sha256:cc9efa6f9644e1dea8c526a6ef9a8c5b89d6cdf6e4da00633cb6db4e066579c5
              createdAt: 2022-02-21 08:46:16
              description: Kubernetes NMState is a declaritive means of configuring NetworkManager.
              k8s.v1.cni.cncf.io/network-status:
                [{
                    "name": "openshift-sdn",
                    "interface": "eth0",
                    "ips": [
                        "10.130.0.68"
                    ],
                    "default": true,
                    "dns": {}
                }]
              k8s.v1.cni.cncf.io/networks-status:
                [{
                    "name": "openshift-sdn",
                    "interface": "eth0",
                    "ips": [
                        "10.130.0.68"
                    ],
                    "default": true,
                    "dns": {}
                }]
              olm.operatorGroup: openshift-nmstate-fgpbl
              olm.operatorNamespace: openshift-nmstate
              olm.skipRange: >=4.3.0 <4.11.0
              olm.targetNamespaces: openshift-nmstate
              openshift.io/scc: restricted
              operatorframework.io/properties:
                {"properties":[{"type":"olm.package","value":{"packageName":"kubernetes-nmstate-operator","version":"4.11.0-202203240844"}},{"type":"olm.g...
              operatorframework.io/suggested-namespace: openshift-nmstate
              operators.openshift.io/infrastructure-features: ["disconnected"]
              repository: https://github.com/openshift/kubernetes-nmstate
              support: Red Hat, Inc.
Status:       Running
IP:           10.130.0.68
IPs:
  IP:           10.130.0.68
Controlled By:  ReplicaSet/nmstate-operator-7797c67864
Containers:
  nmstate-operator:
    Container ID:  cri-o://cc415aba03529140d436b052855e57f22b2eaf8f153ec6b05d8cb32bea1d3850
    Image:         registry.redhat.io/openshift4/kubernetes-nmstate-rhel8-operator@sha256:cc9efa6f9644e1dea8c526a6ef9a8c5b89d6cdf6e4da00633cb6db4e066579c5
    Image ID:      registry.redhat.io/openshift4/kubernetes-nmstate-rhel8-operator@sha256:cc9efa6f9644e1dea8c526a6ef9a8c5b89d6cdf6e4da00633cb6db4e066579c5
    Port:          <none>
    Host Port:     <none>
    Command:
      manager
    State:          Running
      Started:      Mon, 28 Mar 2022 07:48:00 -0400
    Ready:          True
    Restart Count:  0
    Environment:
      WATCH_NAMESPACE:             (v1:metadata.annotations['olm.targetNamespaces'])
      OPERATOR_NAME:              kubernetes-nmstate-operator
      ENABLE_PROFILER:            False
      PROFILER_PORT:              6060
      RUN_OPERATOR:
      HANDLER_IMAGE:              registry.redhat.io/openshift4/ose-kubernetes-nmstate-handler-rhel8@sha256:faea8a56a64429156160f67b002f087affbb17534e813f1dd246cd06d36bfacb
      HANDLER_IMAGE_PULL_POLICY:  Always
      HANDLER_NAMESPACE:          openshift-nmstate
      OPERATOR_CONDITION_NAME:    kubernetes-nmstate-operator.4.11.0-202203240844
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-jcwjk (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True
Volumes:
  kube-api-access-jcwjk:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
    ConfigMapName:           openshift-service-ca.crt
    ConfigMapOptional:       <nil>
QoS Class:                   BestEffort
Node-Selectors:              node-role.kubernetes.io/master=
Tolerations:                 node-role.kubernetes.io/master:NoSchedule op=Exists
                             node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:                      <none>


Issue 2: Managed cluster does not have cpu and memory request

[root@rdr-shw-411nm-mon01-bastion-0 origin]# cat second.txt |grep nmstate -n
35:  apps/v1/Deployment/openshift-nmstate/nmstate-operator/container/nmstate-operator does not have a cpu request (rule: "apps/v1/Deployment/openshift-nmstate/nmstate-operator/container/nmstate-operator/request[cpu]")
36:  apps/v1/Deployment/openshift-nmstate/nmstate-operator/container/nmstate-operator does not have a memory request (rule: "apps/v1/Deployment/openshift-nmstate/nmstate-operator/container/nmstate-operator/request[memory]")

Issue 3: Managed cluster should ensure platform components have system-* priority class associated

1 pods found with invalid priority class (should be openshift-user-critical or begin with system-):
openshift-nmstate/nmstate-operator-7797c67864-22l47 (currently "")

Comment 17 shweta 2022-03-29 05:32:58 UTC
(In reply to shweta from comment #16)

> Tried to verify fix on ocp4.11 cluster.
> 
> Nmstate operator version: 4.11.0-202203240844
> 
> Cluster version:
> 
> [root@rdr-shw-411nm-mon01-bastion-0 ~]# oc version
> Client Version: 4.11.0-0.nightly-ppc64le-2022-03-27-144303
> Server Version: 4.11.0-0.nightly-ppc64le-2022-03-27-144303
> Kubernetes Version: v1.23.3+2a2851c


@cstabler  the issue is reproduced on 4.11 on Power. Please check. Thanks

Comment 21 Julie 2022-04-22 09:39:24 UTC
Re-tested on OCP4.11. 
Issue persists with 'Kubernetes NMState Operator 4.11.0-202203281806'.
On describing nmstate-operator pod, it shows that it is running on master node in 'Best-Effort' QoS.
FYI:

Comment 23 shweta 2022-04-27 04:39:27 UTC
Tried to verify fix on ocp4.11 cluster.

@cstabler 
The issue is fixed. Thanks.
 
Nmstate operator version: 4.11.0-202204221007
 
Cluster version:
 
[root@rdr-shw-411nm-mon01-bastion-0 ~]# oc version
Client Version: 4.11.0-0.nightly-ppc64le-2022-04-26-072249
Kustomize Version: v4.5.4
Server Version: 4.11.0-0.nightly-ppc64le-2022-04-26-072249
Kubernetes Version: v1.23.3+d464c70

Now all pods are running in  Burstable QoS class.

[root@rdr-shw-411nm-mon01-bastion-0 ~]# oc get pods -n openshift-nmstate -o name | (xargs -I% oc describe % -n openshift-nmstate | grep QoS)
QoS Class:                   Burstable
QoS Class:                   Burstable
QoS Class:                   Burstable
QoS Class:                   Burstable
QoS Class:                   Burstable
QoS Class:                   Burstable
QoS Class:                   Burstable
QoS Class:                   Burstable
QoS Class:                   Burstable

Comment 24 Aleksandra Malykhin 2022-05-01 06:06:32 UTC
Verified on
oc version: 4.11.0-0.nightly-2022-04-26-181148
nmstate version: kubernetes-nmstate-operator.4.11.0-202204291648

[kni@provisionhost-0-0 ~]$ oc get pods -n openshift-nmstate -o name | (xargs -I% oc describe % -n openshift-nmstate | grep QoS)
QoS Class:                   Burstable
QoS Class:                   Burstable
QoS Class:                   Burstable
QoS Class:                   Burstable
QoS Class:                   Burstable
QoS Class:                   Burstable
QoS Class:                   Burstable
QoS Class:                   Burstable
QoS Class:                   Burstable

Comment 26 errata-xmlrpc 2022-08-10 10:43:43 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069

Comment 27 Red Hat Bugzilla 2023-09-15 01:51:20 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 365 days