2046191 – Opeartor pod is missing correct qosClass and priorityClass

Bug 2046191 - Opeartor pod is missing correct qosClass and priorityClass

Summary: Opeartor pod is missing correct qosClass and priorityClass

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	4.10
Hardware:	powerpc
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	4.11.0
Assignee:	Christoph Stäbler
QA Contact:	Aleksandra Malykhin
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	2059095
TreeView+	depends on / blocked

Reported:	2022-01-26 11:10 UTC by Quique Llorente
Modified:	2023-09-15 01:51 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Clones:	2059095 (view as bug list)
Environment:
Last Closed:	2022-08-10 10:43:43 UTC
Target Upstream Version:
Embargoed:
Flags:	amalykhi: needinfo- cstabler: needinfo-

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	openshift kubernetes-nmstate pull 260	None	open	Rebase to upstream v0.67.0	2022-02-23 10:09:02 UTC
Github	openshift kubernetes-nmstate pull 261	None	open	[release-4.10] Rebase to upstream v0.64.7	2022-02-23 10:09:43 UTC
Github	openshift kubernetes-nmstate pull 267	None	Merged	Bug 2046191: Add qos class and priority class to operator template in CSV	2022-03-29 07:23:53 UTC
Red Hat Product Errata	RHSA-2022:5069	None	None	None	2022-08-10 10:44:05 UTC

Description Quique Llorente 2022-01-26 11:10:29 UTC

Description of problem:

Multiple issues seen on OCP 4.10 Cluster with nmstate operator pod installed on POWER platform, while running e2e test

Issue 1:
e2e test is failing because,nmstate-operator pod is running on master node in best-effort QoS.
[root@rdr-sh-4vlan-bastion-0 ~]# oc get pod -A -o wide | grep nmstate-operator
openshift-nmstate                                  nmstate-operator-678c5c4448-8b748                                 1/1     Running     0               26h     10.128.0.54    master-2   <none>           <none>


oc get pods -n openshift-nmstate -o name | (xargs -I% oc describe % -n openshift-nmstate | grep QoS)
QoS Class:                   BestEffort

Issue 2:
This test failed because, managed cluster does not have cpu and memory request
cat failed3.txt |grep nmstate -n
35:  apps/v1/Deployment/openshift-nmstate/nmstate-operator/container/nmstate-operator does not have a cpu request (rule: "apps/v1/Deployment/openshift-nmstate/nmstate-operator/container/nmstate-operator/request[cpu]")
36:  apps/v1/Deployment/openshift-nmstate/nmstate-operator/container/nmstate-operator does not have a memory request (rule: "apps/v1/Deployment/openshift-nmstate/nmstate-operator/container/nmstate-operator/request[memory]")

Issue 3:
This test failed : Managed cluster should ensure platform components have system-* priority class associated
1 pods found with invalid priority class
1 pods found with invalid priority class (should be openshift-user-critical or begin with system-):
openshift-nmstate/nmstate-operator-678c5c4448-8b748 (currently "")


Version-Release number of selected component (if applicable):


How reproducible: Always


Steps to Reproduce:
1. Deploy kubernetes-nmstate-operator
2. Check priorityClass and qosClass


Actual results:
qosClass is at BestEffort
There is no priorityClass


Expected results:
qosClass at operator pod has to be at least at Burstable
priorityClass at operator pod has to be system-cluster-critical


Additional info:

Comment 1 Quique Llorente 2022-01-26 11:11:06 UTC

The u/s fixes:
- https://github.com/nmstate/kubernetes-nmstate/pull/972
- https://github.com/nmstate/kubernetes-nmstate/pull/971

Comment 3 Julie 2022-02-01 07:27:36 UTC

(In reply to Quique Llorente from comment #1)
> The u/s fixes:
> - https://github.com/nmstate/kubernetes-nmstate/pull/972
> - https://github.com/nmstate/kubernetes-nmstate/pull/971


We want to verify the fix on OCP 4.10 cluster. Which build will have the fix included?, please confirm. 
Thanks.

Comment 5 Julie 2022-02-21 14:03:23 UTC

(In reply to Julie from comment #3)
> (In reply to Quique Llorente from comment #1)
> > The u/s fixes:
> > - https://github.com/nmstate/kubernetes-nmstate/pull/972
> > - https://github.com/nmstate/kubernetes-nmstate/pull/971
> 
> 
> We want to verify the fix on OCP 4.10 cluster. Which build will have the fix
> included?, please confirm. 
> Thanks.


Both the PRs listed above are Merged.
Will it be possible to verify the fix on OCP 4.10? or is it targetted for 4.11 only? Please confirm. Thanks.
@Quique Llorente

Comment 16 shweta 2022-03-29 05:13:53 UTC

Tried to verify fix on ocp4.11 cluster.

Nmstate operator version: 4.11.0-202203240844

Cluster version:

[root@rdr-shw-411nm-mon01-bastion-0 ~]# oc version
Client Version: 4.11.0-0.nightly-ppc64le-2022-03-27-144303
Server Version: 4.11.0-0.nightly-ppc64le-2022-03-27-144303
Kubernetes Version: v1.23.3+2a2851c

Issue 1: Some of the pods still running in BestEffort Qos class

[root@rdr-shw-411nm-mon01-bastion-0 ~]# oc get pods -n openshift-nmstate -o name | (xargs -I% oc describe % -n openshift-nmstate | grep QoS)
QoS Class:                   Burstable
QoS Class:                   Burstable
QoS Class:                   Burstable
QoS Class:                   Burstable
QoS Class:                   Burstable
QoS Class:                   Burstable
QoS Class:                   BestEffort
QoS Class:                   Burstable
QoS Class:                   Burstable

[root@rdr-shw-411nm-mon01-bastion-0]# oc get pod -n  openshift-nmstate
NAME                                  READY   STATUS    RESTARTS   AGE
nmstate-cert-manager-854cf745-wzkg5   1/1     Running   0          16h
nmstate-handler-6jpch                 1/1     Running   0          16h
nmstate-handler-hvw6q                 1/1     Running   0          16h
nmstate-handler-nflxc                 1/1     Running   0          16h
nmstate-handler-prnc7                 1/1     Running   0          16h
nmstate-handler-zkp5q                 1/1     Running   0          16h
nmstate-operator-7797c67864-22l47     1/1     Running   0          16h
nmstate-webhook-6b77ddc9-654l5        1/1     Running   0          16h
nmstate-webhook-6b77ddc9-nv4t7        1/1     Running   0          16h


[root@rdr-shw-411nm-mon01-bastion-0]# oc describe pod nmstate-operator-7797c67864-22l47  -n  openshift-nmstate
Name:         nmstate-operator-7797c67864-22l47
Namespace:    openshift-nmstate
Priority:     0
Node:         mon01-master-0.rdr-shw-411nm.redhat.com/193.168.200.116
Start Time:   Mon, 28 Mar 2022 07:47:41 -0400
Labels:       app=kubernetes-nmstate-operator
              name=kubernetes-nmstate-operator
              pod-template-hash=7797c67864
Annotations:  alm-examples:
                [{
                  "apiVersion": "nmstate.io/v1",
                  "kind": "NMState",
                  "metadata": {
                    "name": "nmstate"
                  }
                }]
              capabilities: Basic Install
              categories: OpenShift Optional
              certified: false
              containerImage:
                registry.redhat.io/openshift4/kubernetes-nmstate-rhel8-operator@sha256:cc9efa6f9644e1dea8c526a6ef9a8c5b89d6cdf6e4da00633cb6db4e066579c5
              createdAt: 2022-02-21 08:46:16
              description: Kubernetes NMState is a declaritive means of configuring NetworkManager.
              k8s.v1.cni.cncf.io/network-status:
                [{
                    "name": "openshift-sdn",
                    "interface": "eth0",
                    "ips": [
                        "10.130.0.68"
                    ],
                    "default": true,
                    "dns": {}
                }]
              k8s.v1.cni.cncf.io/networks-status:
                [{
                    "name": "openshift-sdn",
                    "interface": "eth0",
                    "ips": [
                        "10.130.0.68"
                    ],
                    "default": true,
                    "dns": {}
                }]
              olm.operatorGroup: openshift-nmstate-fgpbl
              olm.operatorNamespace: openshift-nmstate
              olm.skipRange: >=4.3.0 <4.11.0
              olm.targetNamespaces: openshift-nmstate
              openshift.io/scc: restricted
              operatorframework.io/properties:
                {"properties":[{"type":"olm.package","value":{"packageName":"kubernetes-nmstate-operator","version":"4.11.0-202203240844"}},{"type":"olm.g...
              operatorframework.io/suggested-namespace: openshift-nmstate
              operators.openshift.io/infrastructure-features: ["disconnected"]
              repository: https://github.com/openshift/kubernetes-nmstate
              support: Red Hat, Inc.
Status:       Running
IP:           10.130.0.68
IPs:
  IP:           10.130.0.68
Controlled By:  ReplicaSet/nmstate-operator-7797c67864
Containers:
  nmstate-operator:
    Container ID:  cri-o://cc415aba03529140d436b052855e57f22b2eaf8f153ec6b05d8cb32bea1d3850
    Image:         registry.redhat.io/openshift4/kubernetes-nmstate-rhel8-operator@sha256:cc9efa6f9644e1dea8c526a6ef9a8c5b89d6cdf6e4da00633cb6db4e066579c5
    Image ID:      registry.redhat.io/openshift4/kubernetes-nmstate-rhel8-operator@sha256:cc9efa6f9644e1dea8c526a6ef9a8c5b89d6cdf6e4da00633cb6db4e066579c5
    Port:          <none>
    Host Port:     <none>
    Command:
      manager
    State:          Running
      Started:      Mon, 28 Mar 2022 07:48:00 -0400
    Ready:          True
    Restart Count:  0
    Environment:
      WATCH_NAMESPACE:             (v1:metadata.annotations['olm.targetNamespaces'])
      OPERATOR_NAME:              kubernetes-nmstate-operator
      ENABLE_PROFILER:            False
      PROFILER_PORT:              6060
      RUN_OPERATOR:
      HANDLER_IMAGE:              registry.redhat.io/openshift4/ose-kubernetes-nmstate-handler-rhel8@sha256:faea8a56a64429156160f67b002f087affbb17534e813f1dd246cd06d36bfacb
      HANDLER_IMAGE_PULL_POLICY:  Always
      HANDLER_NAMESPACE:          openshift-nmstate
      OPERATOR_CONDITION_NAME:    kubernetes-nmstate-operator.4.11.0-202203240844
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-jcwjk (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True
Volumes:
  kube-api-access-jcwjk:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
    ConfigMapName:           openshift-service-ca.crt
    ConfigMapOptional:       <nil>
QoS Class:                   BestEffort
Node-Selectors:              node-role.kubernetes.io/master=
Tolerations:                 node-role.kubernetes.io/master:NoSchedule op=Exists
                             node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:                      <none>


Issue 2: Managed cluster does not have cpu and memory request

[root@rdr-shw-411nm-mon01-bastion-0 origin]# cat second.txt |grep nmstate -n
35:  apps/v1/Deployment/openshift-nmstate/nmstate-operator/container/nmstate-operator does not have a cpu request (rule: "apps/v1/Deployment/openshift-nmstate/nmstate-operator/container/nmstate-operator/request[cpu]")
36:  apps/v1/Deployment/openshift-nmstate/nmstate-operator/container/nmstate-operator does not have a memory request (rule: "apps/v1/Deployment/openshift-nmstate/nmstate-operator/container/nmstate-operator/request[memory]")

Issue 3: Managed cluster should ensure platform components have system-* priority class associated

1 pods found with invalid priority class (should be openshift-user-critical or begin with system-):
openshift-nmstate/nmstate-operator-7797c67864-22l47 (currently "")

Comment 17 shweta 2022-03-29 05:32:58 UTC

(In reply to shweta from comment #16)

> Tried to verify fix on ocp4.11 cluster.
> 
> Nmstate operator version: 4.11.0-202203240844
> 
> Cluster version:
> 
> [root@rdr-shw-411nm-mon01-bastion-0 ~]# oc version
> Client Version: 4.11.0-0.nightly-ppc64le-2022-03-27-144303
> Server Version: 4.11.0-0.nightly-ppc64le-2022-03-27-144303
> Kubernetes Version: v1.23.3+2a2851c


@cstabler  the issue is reproduced on 4.11 on Power. Please check. Thanks

Comment 21 Julie 2022-04-22 09:39:24 UTC

Re-tested on OCP4.11. 
Issue persists with 'Kubernetes NMState Operator 4.11.0-202203281806'.
On describing nmstate-operator pod, it shows that it is running on master node in 'Best-Effort' QoS.
FYI:

Comment 23 shweta 2022-04-27 04:39:27 UTC

Tried to verify fix on ocp4.11 cluster.

@cstabler 
The issue is fixed. Thanks.
 
Nmstate operator version: 4.11.0-202204221007
 
Cluster version:
 
[root@rdr-shw-411nm-mon01-bastion-0 ~]# oc version
Client Version: 4.11.0-0.nightly-ppc64le-2022-04-26-072249
Kustomize Version: v4.5.4
Server Version: 4.11.0-0.nightly-ppc64le-2022-04-26-072249
Kubernetes Version: v1.23.3+d464c70

Now all pods are running in  Burstable QoS class.

[root@rdr-shw-411nm-mon01-bastion-0 ~]# oc get pods -n openshift-nmstate -o name | (xargs -I% oc describe % -n openshift-nmstate | grep QoS)
QoS Class:                   Burstable
QoS Class:                   Burstable
QoS Class:                   Burstable
QoS Class:                   Burstable
QoS Class:                   Burstable
QoS Class:                   Burstable
QoS Class:                   Burstable
QoS Class:                   Burstable
QoS Class:                   Burstable

Comment 24 Aleksandra Malykhin 2022-05-01 06:06:32 UTC

Verified on
oc version: 4.11.0-0.nightly-2022-04-26-181148
nmstate version: kubernetes-nmstate-operator.4.11.0-202204291648

[kni@provisionhost-0-0 ~]$ oc get pods -n openshift-nmstate -o name | (xargs -I% oc describe % -n openshift-nmstate | grep QoS)
QoS Class:                   Burstable
QoS Class:                   Burstable
QoS Class:                   Burstable
QoS Class:                   Burstable
QoS Class:                   Burstable
QoS Class:                   Burstable
QoS Class:                   Burstable
QoS Class:                   Burstable
QoS Class:                   Burstable

Comment 26 errata-xmlrpc 2022-08-10 10:43:43 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069

Comment 27 Red Hat Bugzilla 2023-09-15 01:51:20 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 365 days

Note You need to log in before you can comment on or make changes to this bug.