Description of problem: Thet network-metrics-deamon does not have an associated priorityClassName.This causes issues when being prioritized against OSD addons like RHOAM that have a specified priorityClass. Although is this only 1000000, it still schedules ahead of network-metrics which has none. This causes upgrades to fail along with any other operation that requires consequtive node drains. ``` oc get pc NAME VALUE GLOBAL-DEFAULT AGE rhoam-pod-priority 1000000000 false 34d system-cluster-critical 2000000000 false 34d system-node-critical 2000001000 false 34d ``` How reproducible: Partitially. Dependent on instance resource capacity. Steps to Reproduce: 1. Upgrade a RHOAM cluster using MUO https://github.com/openshift/managed-upgrade-operator 2. PostUpgradeVerification will fail if a worker instance is at resource capacity as RHOAM components will be prioritized ahead of network-metrics-daemon. Actual results: ``` [~ {production} (ocp-prod:default)]$ oc describe node ip-10-0-174-68.ec2.internal Name: ip-10-0-174-68.ec2.internal Roles: worker Labels: beta.kubernetes.io/arch=amd64 beta.kubernetes.io/instance-type=m5.xlarge beta.kubernetes.io/os=linux failure-domain.beta.kubernetes.io/region=us-east-1 failure-domain.beta.kubernetes.io/zone=us-east-1a kubernetes.io/arch=amd64 kubernetes.io/hostname=ip-10-0-174-68 kubernetes.io/os=linux node-role.kubernetes.io/worker= node.kubernetes.io/instance-type=m5.xlarge node.openshift.io/os_id=rhcos topology.ebs.csi.aws.com/zone=us-east-1a topology.kubernetes.io/region=us-east-1 topology.kubernetes.io/zone=us-east-1a Annotations: csi.volume.kubernetes.io/nodeid: {"ebs.csi.aws.com":"i-0e137b48264ac449c"} machine.openshift.io/machine: openshift-machine-api/ocp-prod-6hh5f-worker-us-east-1a-8n8tn machineconfiguration.openshift.io/currentConfig: rendered-worker-afc7cd321aebda60669cdcadeb31712a machineconfiguration.openshift.io/desiredConfig: rendered-worker-afc7cd321aebda60669cdcadeb31712a machineconfiguration.openshift.io/reason: machineconfiguration.openshift.io/state: Done volumes.kubernetes.io/controller-managed-attach-detach: true CreationTimestamp: Tue, 26 Jan 2021 00:15:24 +0000 Taints: <none> Unschedulable: false Lease: HolderIdentity: ip-10-0-174-68.ec2.internal AcquireTime: <unset> RenewTime: Fri, 26 Feb 2021 23:17:26 +0000 Conditions: Type Status LastHeartbeatTime LastTransitionTime Reason Message ---- ------ ----------------- ------------------ ------ ------- MemoryPressure False Fri, 26 Feb 2021 23:12:35 +0000 Fri, 26 Feb 2021 20:16:22 +0000 KubeletHasSufficientMemory kubelet has sufficient memory available DiskPressure False Fri, 26 Feb 2021 23:12:35 +0000 Fri, 26 Feb 2021 20:16:22 +0000 KubeletHasNoDiskPressure kubelet has no disk pressure PIDPressure False Fri, 26 Feb 2021 23:12:35 +0000 Fri, 26 Feb 2021 20:16:22 +0000 KubeletHasSufficientPID kubelet has sufficient PID available Ready True Fri, 26 Feb 2021 23:12:35 +0000 Fri, 26 Feb 2021 20:16:22 +0000 KubeletReady kubelet is posting ready status Addresses: InternalIP: 10.0.174.68 Hostname: ip-10-0-174-68.ec2.internal InternalDNS: ip-10-0-174-68.ec2.internal Capacity: attachable-volumes-aws-ebs: 25 cpu: 4 ephemeral-storage: 314020844Ki hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 15944120Ki pods: 250 Allocatable: attachable-volumes-aws-ebs: 25 cpu: 3 ephemeral-storage: 288327867528 hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 14793144Ki pods: 250 System Info: Machine ID: ec2cf991b5843eb46a577717802a1afd System UUID: ec2cf991-b584-3eb4-6a57-7717802a1afd Boot ID: 01232971-35f4-4774-84e8-01731cb95aaf Kernel Version: 4.18.0-193.41.1.el8_2.x86_64 OS Image: Red Hat Enterprise Linux CoreOS 46.82.202102051640-0 (Ootpa) Operating System: linux Architecture: amd64 Container Runtime Version: cri-o://1.19.1-7.rhaos4.6.git6377f68.el8 Kubelet Version: v1.19.0+e405995 Kube-Proxy Version: v1.19.0+e405995 ProviderID: aws:///us-east-1a/i-0e137b48264ac449c Non-terminated Pods: (25 in total) Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE --------- ---- ------------ ---------- --------------- ------------- --- openshift-cloud-ingress-operator cloud-ingress-operator-registry-vchhn 10m (0%) 0 (0%) 50Mi (0%) 0 (0%) 3h openshift-cluster-csi-drivers aws-ebs-csi-driver-node-fzv6v 30m (1%) 0 (0%) 150Mi (1%) 0 (0%) 3h42m openshift-cluster-node-tuning-operator tuned-s2sp2 10m (0%) 0 (0%) 50Mi (0%) 0 (0%) 3h43m openshift-dns dns-default-d6tfq 65m (2%) 0 (0%) 110Mi (0%) 512Mi (3%) 3h23m openshift-image-registry node-ca-dlcs4 10m (0%) 0 (0%) 10Mi (0%) 0 (0%) 3h42m openshift-machine-config-operator machine-config-daemon-hdrzb 40m (1%) 0 (0%) 100Mi (0%) 0 (0%) 3h20m openshift-monitoring node-exporter-htlfc 9m (0%) 0 (0%) 210Mi (1%) 0 (0%) 3h42m openshift-monitoring sre-dns-latency-exporter-zb9dd 0 (0%) 0 (0%) 0 (0%) 0 (0%) 31d openshift-multus multus-862j5 10m (0%) 0 (0%) 150Mi (1%) 0 (0%) 3h38m openshift-sdn ovs-khqsn 100m (3%) 0 (0%) 400Mi (2%) 0 (0%) 3h32m openshift-sdn sdn-wjnfr 110m (3%) 0 (0%) 220Mi (1%) 0 (0%) 3h38m openshift-security splunkforwarder-ds-hmvwf 0 (0%) 0 (0%) 0 (0%) 0 (0%) 31d redhat-rhoam-3scale backend-listener-3-xf4s9 500m (16%) 1 (33%) 550Mi (3%) 700Mi (4%) 3h redhat-rhoam-3scale backend-worker-3-cm6b9 150m (5%) 1 (33%) 50Mi (0%) 300Mi (2%) 3h redhat-rhoam-3scale backend-worker-3-lvstv 150m (5%) 1 (33%) 50Mi (0%) 300Mi (2%) 3h redhat-rhoam-3scale system-app-5-kpgr5 150m (5%) 3 (100%) 1800Mi (12%) 2400Mi (16%) 3h redhat-rhoam-3scale system-sidekiq-5-26b6c 100m (3%) 1 (33%) 500Mi (3%) 2Gi (14%) 3h redhat-rhoam-3scale zync-database-2-qddv9 50m (1%) 250m (8%) 250M (1%) 2G (13%) 3h redhat-rhoam-3scale zync-que-3-2vv6p 250m (8%) 1 (33%) 250M (1%) 512Mi (3%) 3h redhat-rhoam-customer-monitoring-operator grafana-deployment-5c56f5565d-hmjr6 250m (8%) 1 (33%) 256Mi (1%) 1Gi (7%) 3h redhat-rhoam-marin3r-operator marin3r-operator-57b984bcbc-vblfg 0 (0%) 0 (0%) 0 (0%) 0 (0%) 3h redhat-rhoam-marin3r marin3r-instance-67f94d8466-lbp65 0 (0%) 0 (0%) 0 (0%) 0 (0%) 3h redhat-rhoam-marin3r ratelimit-649b469f6f-88qh9 0 (0%) 0 (0%) 0 (0%) 0 (0%) 3h redhat-rhoam-rhsso-operator keycloak-operator-557546f88f-tlmz8 0 (0%) 0 (0%) 0 (0%) 0 (0%) 3h redhat-rhoam-user-sso keycloak-2 1 (33%) 1 (33%) 2G (13%) 2G (13%) 3h Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) Resource Requests Limits -------- -------- ------ cpu 2994m (99%) 10250m (341%) memory 7382169856 (48%) 11889354Ki (80%) ephemeral-storage 0 (0%) 0 (0%) hugepages-1Gi 0 (0%) 0 (0%) hugepages-2Mi 0 (0%) 0 (0%) attachable-volumes-aws-ebs 0 0 Events: <none> ``` Expected results: The above description should include the network-metrics-daemon pod Additional info: Upgrade issue was resolved by manually deleting a RHOAM pod to releive capacity restrictions enabling network-metrics-deamon to schedule.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6.28 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:1487