+++ This bug was initially created as a clone of Bug #1936719 +++ This bug was initially created as a copy of Bug #1936710 I am copying this bug because: Description of problem: Thet network-metrics-deamon does not have an associated priorityClassName.This causes issues when being prioritized against OSD addons like RHOAM that have a specified priorityClass. Although is this only 1000000, it still schedules ahead of network-metrics which has none. This causes upgrades to fail along with any other operation that requires consequtive node drains. ``` oc get pc NAME VALUE GLOBAL-DEFAULT AGE rhoam-pod-priority 1000000000 false 34d system-cluster-critical 2000000000 false 34d system-node-critical 2000001000 false 34d ``` How reproducible: Partitially. Dependent on instance resource capacity. Steps to Reproduce: 1. Upgrade a RHOAM cluster using MUO https://github.com/openshift/managed-upgrade-operator 2. PostUpgradeVerification will fail if a worker instance is at resource capacity as RHOAM components will be prioritized ahead of network-metrics-daemon. Actual results: ``` [~ {production} (ocp-prod:default)]$ oc describe node ip-10-0-174-68.ec2.internal Name: ip-10-0-174-68.ec2.internal Roles: worker Labels: beta.kubernetes.io/arch=amd64 beta.kubernetes.io/instance-type=m5.xlarge beta.kubernetes.io/os=linux failure-domain.beta.kubernetes.io/region=us-east-1 failure-domain.beta.kubernetes.io/zone=us-east-1a kubernetes.io/arch=amd64 kubernetes.io/hostname=ip-10-0-174-68 kubernetes.io/os=linux node-role.kubernetes.io/worker= node.kubernetes.io/instance-type=m5.xlarge node.openshift.io/os_id=rhcos topology.ebs.csi.aws.com/zone=us-east-1a topology.kubernetes.io/region=us-east-1 topology.kubernetes.io/zone=us-east-1a Annotations: csi.volume.kubernetes.io/nodeid: {"ebs.csi.aws.com":"i-0e137b48264ac449c"} machine.openshift.io/machine: openshift-machine-api/ocp-prod-6hh5f-worker-us-east-1a-8n8tn machineconfiguration.openshift.io/currentConfig: rendered-worker-afc7cd321aebda60669cdcadeb31712a machineconfiguration.openshift.io/desiredConfig: rendered-worker-afc7cd321aebda60669cdcadeb31712a machineconfiguration.openshift.io/reason: machineconfiguration.openshift.io/state: Done volumes.kubernetes.io/controller-managed-attach-detach: true CreationTimestamp: Tue, 26 Jan 2021 00:15:24 +0000 Taints: <none> Unschedulable: false Lease: HolderIdentity: ip-10-0-174-68.ec2.internal AcquireTime: <unset> RenewTime: Fri, 26 Feb 2021 23:17:26 +0000 Conditions: Type Status LastHeartbeatTime LastTransitionTime Reason Message ---- ------ ----------------- ------------------ ------ ------- MemoryPressure False Fri, 26 Feb 2021 23:12:35 +0000 Fri, 26 Feb 2021 20:16:22 +0000 KubeletHasSufficientMemory kubelet has sufficient memory available DiskPressure False Fri, 26 Feb 2021 23:12:35 +0000 Fri, 26 Feb 2021 20:16:22 +0000 KubeletHasNoDiskPressure kubelet has no disk pressure PIDPressure False Fri, 26 Feb 2021 23:12:35 +0000 Fri, 26 Feb 2021 20:16:22 +0000 KubeletHasSufficientPID kubelet has sufficient PID available Ready True Fri, 26 Feb 2021 23:12:35 +0000 Fri, 26 Feb 2021 20:16:22 +0000 KubeletReady kubelet is posting ready status Addresses: InternalIP: 10.0.174.68 Hostname: ip-10-0-174-68.ec2.internal InternalDNS: ip-10-0-174-68.ec2.internal Capacity: attachable-volumes-aws-ebs: 25 cpu: 4 ephemeral-storage: 314020844Ki hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 15944120Ki pods: 250 Allocatable: attachable-volumes-aws-ebs: 25 cpu: 3 ephemeral-storage: 288327867528 hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 14793144Ki pods: 250 System Info: Machine ID: ec2cf991b5843eb46a577717802a1afd System UUID: ec2cf991-b584-3eb4-6a57-7717802a1afd Boot ID: 01232971-35f4-4774-84e8-01731cb95aaf Kernel Version: 4.18.0-193.41.1.el8_2.x86_64 OS Image: Red Hat Enterprise Linux CoreOS 46.82.202102051640-0 (Ootpa) Operating System: linux Architecture: amd64 Container Runtime Version: cri-o://1.19.1-7.rhaos4.6.git6377f68.el8 Kubelet Version: v1.19.0+e405995 Kube-Proxy Version: v1.19.0+e405995 ProviderID: aws:///us-east-1a/i-0e137b48264ac449c Non-terminated Pods: (25 in total) Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE --------- ---- ------------ ---------- --------------- ------------- --- openshift-cloud-ingress-operator cloud-ingress-operator-registry-vchhn 10m (0%) 0 (0%) 50Mi (0%) 0 (0%) 3h openshift-cluster-csi-drivers aws-ebs-csi-driver-node-fzv6v 30m (1%) 0 (0%) 150Mi (1%) 0 (0%) 3h42m openshift-cluster-node-tuning-operator tuned-s2sp2 10m (0%) 0 (0%) 50Mi (0%) 0 (0%) 3h43m openshift-dns dns-default-d6tfq 65m (2%) 0 (0%) 110Mi (0%) 512Mi (3%) 3h23m openshift-image-registry node-ca-dlcs4 10m (0%) 0 (0%) 10Mi (0%) 0 (0%) 3h42m openshift-machine-config-operator machine-config-daemon-hdrzb 40m (1%) 0 (0%) 100Mi (0%) 0 (0%) 3h20m openshift-monitoring node-exporter-htlfc 9m (0%) 0 (0%) 210Mi (1%) 0 (0%) 3h42m openshift-monitoring sre-dns-latency-exporter-zb9dd 0 (0%) 0 (0%) 0 (0%) 0 (0%) 31d openshift-multus multus-862j5 10m (0%) 0 (0%) 150Mi (1%) 0 (0%) 3h38m openshift-sdn ovs-khqsn 100m (3%) 0 (0%) 400Mi (2%) 0 (0%) 3h32m openshift-sdn sdn-wjnfr 110m (3%) 0 (0%) 220Mi (1%) 0 (0%) 3h38m openshift-security splunkforwarder-ds-hmvwf 0 (0%) 0 (0%) 0 (0%) 0 (0%) 31d redhat-rhoam-3scale backend-listener-3-xf4s9 500m (16%) 1 (33%) 550Mi (3%) 700Mi (4%) 3h redhat-rhoam-3scale backend-worker-3-cm6b9 150m (5%) 1 (33%) 50Mi (0%) 300Mi (2%) 3h redhat-rhoam-3scale backend-worker-3-lvstv 150m (5%) 1 (33%) 50Mi (0%) 300Mi (2%) 3h redhat-rhoam-3scale system-app-5-kpgr5 150m (5%) 3 (100%) 1800Mi (12%) 2400Mi (16%) 3h redhat-rhoam-3scale system-sidekiq-5-26b6c 100m (3%) 1 (33%) 500Mi (3%) 2Gi (14%) 3h redhat-rhoam-3scale zync-database-2-qddv9 50m (1%) 250m (8%) 250M (1%) 2G (13%) 3h redhat-rhoam-3scale zync-que-3-2vv6p 250m (8%) 1 (33%) 250M (1%) 512Mi (3%) 3h redhat-rhoam-customer-monitoring-operator grafana-deployment-5c56f5565d-hmjr6 250m (8%) 1 (33%) 256Mi (1%) 1Gi (7%) 3h redhat-rhoam-marin3r-operator marin3r-operator-57b984bcbc-vblfg 0 (0%) 0 (0%) 0 (0%) 0 (0%) 3h redhat-rhoam-marin3r marin3r-instance-67f94d8466-lbp65 0 (0%) 0 (0%) 0 (0%) 0 (0%) 3h redhat-rhoam-marin3r ratelimit-649b469f6f-88qh9 0 (0%) 0 (0%) 0 (0%) 0 (0%) 3h redhat-rhoam-rhsso-operator keycloak-operator-557546f88f-tlmz8 0 (0%) 0 (0%) 0 (0%) 0 (0%) 3h redhat-rhoam-user-sso keycloak-2 1 (33%) 1 (33%) 2G (13%) 2G (13%) 3h Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) Resource Requests Limits -------- -------- ------ cpu 2994m (99%) 10250m (341%) memory 7382169856 (48%) 11889354Ki (80%) ephemeral-storage 0 (0%) 0 (0%) hugepages-1Gi 0 (0%) 0 (0%) hugepages-2Mi 0 (0%) 0 (0%) attachable-volumes-aws-ebs 0 0 Events: <none> ``` Expected results: The above description should include the network-metrics-daemon pod Additional info: Upgrade issue was resolved by manually deleting a RHOAM pod to releive capacity restrictions enabling network-metrics-deamon to schedule.
Verified this bug on 4.8.0-0.nightly-2021-03-10-142839 oc describe node ip-10-0-187-162.us-east-2.compute.internal | grep openshift-multus openshift-multus multus-hcc4w 10m (0%) 0 (0%) 150Mi (2%) 0 (0%) 3h25m openshift-multus network-metrics-daemon-74r5t 20m (1%) 0 (0%) 120Mi (1%) 0 (0%) 3h25m
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438