Bug 1936710 - network-metrics-deamon not associated with a priorityClassName
Summary: network-metrics-deamon not associated with a priorityClassName
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.6
Hardware: Unspecified
OS: Unspecified
unspecified
low
Target Milestone: ---
: 4.6.z
Assignee: dofinn
QA Contact: zhaozhanqi
URL:
Whiteboard: wip
Depends On: 1936719
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-03-09 00:27 UTC by dofinn
Modified: 2021-05-12 12:18 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-05-12 12:18:10 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-network-operator pull 1006 0 None open Bug 1936710: OSD-6600 network-metrics missing priorityClass 2021-03-09 01:06:09 UTC
Red Hat Product Errata RHBA-2021:1487 0 None None None 2021-05-12 12:18:29 UTC

Description dofinn 2021-03-09 00:27:29 UTC
Description of problem:

Thet network-metrics-deamon does not have an associated priorityClassName.This causes issues when being prioritized against OSD addons like RHOAM that have a specified priorityClass. Although is this only 1000000, it still schedules ahead of network-metrics which has none. This causes upgrades to fail along with any other operation that requires consequtive node drains. 

```
oc get pc
NAME                      VALUE        GLOBAL-DEFAULT   AGE
rhoam-pod-priority        1000000000   false            34d
system-cluster-critical   2000000000   false            34d
system-node-critical      2000001000   false            34d
```


How reproducible:
Partitially. Dependent on instance resource capacity. 


Steps to Reproduce:
1. Upgrade a RHOAM cluster using MUO https://github.com/openshift/managed-upgrade-operator 
2. PostUpgradeVerification will fail if a worker instance is at resource capacity as RHOAM components will be prioritized ahead of network-metrics-daemon.

Actual results:

```
[~ {production} (ocp-prod:default)]$ oc describe node ip-10-0-174-68.ec2.internal
Name:               ip-10-0-174-68.ec2.internal
Roles:              worker
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/instance-type=m5.xlarge
                    beta.kubernetes.io/os=linux
                    failure-domain.beta.kubernetes.io/region=us-east-1
                    failure-domain.beta.kubernetes.io/zone=us-east-1a
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=ip-10-0-174-68
                    kubernetes.io/os=linux
                    node-role.kubernetes.io/worker=
                    node.kubernetes.io/instance-type=m5.xlarge
                    node.openshift.io/os_id=rhcos
                    topology.ebs.csi.aws.com/zone=us-east-1a
                    topology.kubernetes.io/region=us-east-1
                    topology.kubernetes.io/zone=us-east-1a
Annotations:        csi.volume.kubernetes.io/nodeid: {"ebs.csi.aws.com":"i-0e137b48264ac449c"}
                    machine.openshift.io/machine: openshift-machine-api/ocp-prod-6hh5f-worker-us-east-1a-8n8tn
                    machineconfiguration.openshift.io/currentConfig: rendered-worker-afc7cd321aebda60669cdcadeb31712a
                    machineconfiguration.openshift.io/desiredConfig: rendered-worker-afc7cd321aebda60669cdcadeb31712a
                    machineconfiguration.openshift.io/reason: 
                    machineconfiguration.openshift.io/state: Done
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Tue, 26 Jan 2021 00:15:24 +0000
Taints:             <none>
Unschedulable:      false
Lease:
  HolderIdentity:  ip-10-0-174-68.ec2.internal
  AcquireTime:     <unset>
  RenewTime:       Fri, 26 Feb 2021 23:17:26 +0000
Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  MemoryPressure   False   Fri, 26 Feb 2021 23:12:35 +0000   Fri, 26 Feb 2021 20:16:22 +0000   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     False   Fri, 26 Feb 2021 23:12:35 +0000   Fri, 26 Feb 2021 20:16:22 +0000   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure      False   Fri, 26 Feb 2021 23:12:35 +0000   Fri, 26 Feb 2021 20:16:22 +0000   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready            True    Fri, 26 Feb 2021 23:12:35 +0000   Fri, 26 Feb 2021 20:16:22 +0000   KubeletReady                 kubelet is posting ready status
Addresses:
  InternalIP:   10.0.174.68
  Hostname:     ip-10-0-174-68.ec2.internal
  InternalDNS:  ip-10-0-174-68.ec2.internal
Capacity:
  attachable-volumes-aws-ebs:  25
  cpu:                         4
  ephemeral-storage:           314020844Ki
  hugepages-1Gi:               0
  hugepages-2Mi:               0
  memory:                      15944120Ki
  pods:                        250
Allocatable:
  attachable-volumes-aws-ebs:  25
  cpu:                         3
  ephemeral-storage:           288327867528
  hugepages-1Gi:               0
  hugepages-2Mi:               0
  memory:                      14793144Ki
  pods:                        250
System Info:
  Machine ID:                                ec2cf991b5843eb46a577717802a1afd
  System UUID:                               ec2cf991-b584-3eb4-6a57-7717802a1afd
  Boot ID:                                   01232971-35f4-4774-84e8-01731cb95aaf
  Kernel Version:                            4.18.0-193.41.1.el8_2.x86_64
  OS Image:                                  Red Hat Enterprise Linux CoreOS 46.82.202102051640-0 (Ootpa)
  Operating System:                          linux
  Architecture:                              amd64
  Container Runtime Version:                 cri-o://1.19.1-7.rhaos4.6.git6377f68.el8
  Kubelet Version:                           v1.19.0+e405995
  Kube-Proxy Version:                        v1.19.0+e405995
ProviderID:                                  aws:///us-east-1a/i-0e137b48264ac449c
Non-terminated Pods:                         (25 in total)
  Namespace                                  Name                                     CPU Requests  CPU Limits  Memory Requests  Memory Limits  AGE
  ---------                                  ----                                     ------------  ----------  ---------------  -------------  ---
  openshift-cloud-ingress-operator           cloud-ingress-operator-registry-vchhn    10m (0%)      0 (0%)      50Mi (0%)        0 (0%)         3h
  openshift-cluster-csi-drivers              aws-ebs-csi-driver-node-fzv6v            30m (1%)      0 (0%)      150Mi (1%)       0 (0%)         3h42m
  openshift-cluster-node-tuning-operator     tuned-s2sp2                              10m (0%)      0 (0%)      50Mi (0%)        0 (0%)         3h43m
  openshift-dns                              dns-default-d6tfq                        65m (2%)      0 (0%)      110Mi (0%)       512Mi (3%)     3h23m
  openshift-image-registry                   node-ca-dlcs4                            10m (0%)      0 (0%)      10Mi (0%)        0 (0%)         3h42m
  openshift-machine-config-operator          machine-config-daemon-hdrzb              40m (1%)      0 (0%)      100Mi (0%)       0 (0%)         3h20m
  openshift-monitoring                       node-exporter-htlfc                      9m (0%)       0 (0%)      210Mi (1%)       0 (0%)         3h42m
  openshift-monitoring                       sre-dns-latency-exporter-zb9dd           0 (0%)        0 (0%)      0 (0%)           0 (0%)         31d
  openshift-multus                           multus-862j5                             10m (0%)      0 (0%)      150Mi (1%)       0 (0%)         3h38m
  openshift-sdn                              ovs-khqsn                                100m (3%)     0 (0%)      400Mi (2%)       0 (0%)         3h32m
  openshift-sdn                              sdn-wjnfr                                110m (3%)     0 (0%)      220Mi (1%)       0 (0%)         3h38m
  openshift-security                         splunkforwarder-ds-hmvwf                 0 (0%)        0 (0%)      0 (0%)           0 (0%)         31d
  redhat-rhoam-3scale                        backend-listener-3-xf4s9                 500m (16%)    1 (33%)     550Mi (3%)       700Mi (4%)     3h
  redhat-rhoam-3scale                        backend-worker-3-cm6b9                   150m (5%)     1 (33%)     50Mi (0%)        300Mi (2%)     3h
  redhat-rhoam-3scale                        backend-worker-3-lvstv                   150m (5%)     1 (33%)     50Mi (0%)        300Mi (2%)     3h
  redhat-rhoam-3scale                        system-app-5-kpgr5                       150m (5%)     3 (100%)    1800Mi (12%)     2400Mi (16%)   3h
  redhat-rhoam-3scale                        system-sidekiq-5-26b6c                   100m (3%)     1 (33%)     500Mi (3%)       2Gi (14%)      3h
  redhat-rhoam-3scale                        zync-database-2-qddv9                    50m (1%)      250m (8%)   250M (1%)        2G (13%)       3h
  redhat-rhoam-3scale                        zync-que-3-2vv6p                         250m (8%)     1 (33%)     250M (1%)        512Mi (3%)     3h
  redhat-rhoam-customer-monitoring-operator  grafana-deployment-5c56f5565d-hmjr6      250m (8%)     1 (33%)     256Mi (1%)       1Gi (7%)       3h
  redhat-rhoam-marin3r-operator              marin3r-operator-57b984bcbc-vblfg        0 (0%)        0 (0%)      0 (0%)           0 (0%)         3h
  redhat-rhoam-marin3r                       marin3r-instance-67f94d8466-lbp65        0 (0%)        0 (0%)      0 (0%)           0 (0%)         3h
  redhat-rhoam-marin3r                       ratelimit-649b469f6f-88qh9               0 (0%)        0 (0%)      0 (0%)           0 (0%)         3h
  redhat-rhoam-rhsso-operator                keycloak-operator-557546f88f-tlmz8       0 (0%)        0 (0%)      0 (0%)           0 (0%)         3h
  redhat-rhoam-user-sso                      keycloak-2                               1 (33%)       1 (33%)     2G (13%)         2G (13%)       3h
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource                    Requests          Limits
  --------                    --------          ------
  cpu                         2994m (99%)       10250m (341%)
  memory                      7382169856 (48%)  11889354Ki (80%)
  ephemeral-storage           0 (0%)            0 (0%)
  hugepages-1Gi               0 (0%)            0 (0%)
  hugepages-2Mi               0 (0%)            0 (0%)
  attachable-volumes-aws-ebs  0                 0
Events:                       <none>
```


Expected results:
The above description should include the network-metrics-daemon pod

Additional info:
Upgrade issue was resolved by manually deleting a RHOAM pod to releive capacity restrictions enabling network-metrics-deamon to schedule.

Comment 5 errata-xmlrpc 2021-05-12 12:18:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6.28 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:1487


Note You need to log in before you can comment on or make changes to this bug.