since 1967614 is failed to verify, move to ASSIGNED
will test this bug since PDB is not in 4.7
followed steps • OCP 4.6 cluster • node A is in AZ 1, nodes B and C in AZ 2 • prom0 and prom1 scheduled on node A with persistent volumes • upgrade to 4.7 • CMO goes unavailable/degraded because the hard affinity makes it impossible to schedule prom1 (or prom0) on nodes which are in AZ 2 (the PV sticks to AZ 1) bound PVs only for alertmanager/prometheus pods, and upgraded to 4.7, no "volume node affinity conflict" errors. # oc get node --show-labels NAME STATUS ROLES AGE VERSION LABELS ip-10-0-157-240.us-west-2.compute.internal Ready master 25m v1.19.0+c3e2e69 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=m5.xlarge,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-west-2,failure-domain.beta.kubernetes.io/zone=us-west-2a,kubernetes.io/arch=amd64,kubernetes.io/hostname=ip-10-0-157-240,kubernetes.io/os=linux,node-role.kubernetes.io/master=,node.kubernetes.io/instance-type=m5.xlarge,node.openshift.io/os_id=rhcos,topology.ebs.csi.aws.com/zone=us-west-2a,topology.kubernetes.io/region=us-west-2,topology.kubernetes.io/zone=us-west-2a ip-10-0-177-194.us-west-2.compute.internal Ready worker 16m v1.19.0+c3e2e69 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=m4.xlarge,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-west-2,failure-domain.beta.kubernetes.io/zone=us-west-2a,kubernetes.io/arch=amd64,kubernetes.io/hostname=ip-10-0-177-194,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.kubernetes.io/instance-type=m4.xlarge,node.openshift.io/os_id=rhcos,topology.ebs.csi.aws.com/zone=us-west-2a,topology.kubernetes.io/region=us-west-2,topology.kubernetes.io/zone=us-west-2a ip-10-0-178-213.us-west-2.compute.internal Ready worker 16m v1.19.0+c3e2e69 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=m4.xlarge,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-west-2,failure-domain.beta.kubernetes.io/zone=us-west-2a,kubernetes.io/arch=amd64,kubernetes.io/hostname=ip-10-0-178-213,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.kubernetes.io/instance-type=m4.xlarge,node.openshift.io/os_id=rhcos,topology.ebs.csi.aws.com/zone=us-west-2a,topology.kubernetes.io/region=us-west-2,topology.kubernetes.io/zone=us-west-2a ip-10-0-191-201.us-west-2.compute.internal Ready master 26m v1.19.0+c3e2e69 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=m5.xlarge,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-west-2,failure-domain.beta.kubernetes.io/zone=us-west-2a,kubernetes.io/arch=amd64,kubernetes.io/hostname=ip-10-0-191-201,kubernetes.io/os=linux,node-role.kubernetes.io/master=,node.kubernetes.io/instance-type=m5.xlarge,node.openshift.io/os_id=rhcos,topology.ebs.csi.aws.com/zone=us-west-2a,topology.kubernetes.io/region=us-west-2,topology.kubernetes.io/zone=us-west-2a ip-10-0-200-131.us-west-2.compute.internal Ready master 26m v1.19.0+c3e2e69 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=m5.xlarge,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-west-2,failure-domain.beta.kubernetes.io/zone=us-west-2b,kubernetes.io/arch=amd64,kubernetes.io/hostname=ip-10-0-200-131,kubernetes.io/os=linux,node-role.kubernetes.io/master=,node.kubernetes.io/instance-type=m5.xlarge,node.openshift.io/os_id=rhcos,topology.ebs.csi.aws.com/zone=us-west-2b,topology.kubernetes.io/region=us-west-2,topology.kubernetes.io/zone=us-west-2b ip-10-0-254-9.us-west-2.compute.internal Ready worker 16m v1.19.0+c3e2e69 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=m4.xlarge,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-west-2,failure-domain.beta.kubernetes.io/zone=us-west-2b,kubernetes.io/arch=amd64,kubernetes.io/hostname=ip-10-0-254-9,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.kubernetes.io/instance-type=m4.xlarge,node.openshift.io/os_id=rhcos,topology.ebs.csi.aws.com/zone=us-west-2b,topology.kubernetes.io/region=us-west-2,topology.kubernetes.io/zone=us-west-2b worker node ip-10-0-254-9.us-west-2.compute.internal has label topology.kubernetes.io/zone=us-west-2b, other 2 worker nodes' zone label is topology.kubernetes.io/zone=us-west-2a, bound PVs for alertmanager/prometheus pods and scheduled these pods to node ip-10-0-254-9.us-west-2.compute.internal # oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.6.0-0.nightly-2021-06-07-054625 True False 6m50s Cluster version is 4.6.0-0.nightly-2021-06-07-054625 # alertmanager-main-0 5/5 Running 0 9m14s 10.128.2.27 ip-10-0-254-9.us-west-2.compute.internal <none> <none> alertmanager-main-1 5/5 Running 0 9m14s 10.128.2.28 ip-10-0-254-9.us-west-2.compute.internal <none> <none> alertmanager-main-2 5/5 Running 0 9m14s 10.128.2.29 ip-10-0-254-9.us-west-2.compute.internal <none> <none> prometheus-k8s-0 6/6 Running 1 9m24s 10.128.2.25 ip-10-0-254-9.us-west-2.compute.internal <none> <none> prometheus-k8s-1 6/6 Running 1 9m24s 10.128.2.26 ip-10-0-254-9.us-west-2.compute.internal <none> <none> # oc -n openshift-monitoring get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE alertmanager-main-db-alertmanager-main-0 Bound pvc-0a328708-e447-4e80-8547-7f524d3d9d2b 4Gi RWO gp2 9m31s alertmanager-main-db-alertmanager-main-1 Bound pvc-035ca65b-2bd4-4a5a-ad93-373543dca3e3 4Gi RWO gp2 9m31s alertmanager-main-db-alertmanager-main-2 Bound pvc-e9fc6f9b-389d-44c6-9493-91b250110266 4Gi RWO gp2 9m31s prometheus-prometheus-k8s-0 Bound pvc-700d2f6f-f341-4208-8702-4ffe5e334f10 10Gi RWO gp2 9m41s prometheus-prometheus-k8s-1 Bound pvc-4e6e5805-a0b7-49ee-aa34-9ff25b6685c5 10Gi RWO gp2 9m41s upgrade to 4.7 # oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.7.0-0.nightly-2021-06-07-203428 True False 27m Cluster version is 4.7.0-0.nightly-2021-06-07-203428 # oc -n openshift-monitoring get po -o wide | grep -E "prometheus-k8s|alertmanager-main" alertmanager-main-0 5/5 Running 0 46m 10.128.2.15 ip-10-0-254-9.us-west-2.compute.internal <none> <none> alertmanager-main-1 5/5 Running 0 46m 10.128.2.14 ip-10-0-254-9.us-west-2.compute.internal <none> <none> alertmanager-main-2 5/5 Running 0 46m 10.128.2.12 ip-10-0-254-9.us-west-2.compute.internal <none> <none> prometheus-k8s-0 7/7 Running 1 46m 10.128.2.13 ip-10-0-254-9.us-west-2.compute.internal <none> <none> prometheus-k8s-1 7/7 Running 1 46m 10.128.2.11 ip-10-0-254-9.us-west-2.compute.internal <none> <none> # oc -n openshift-monitoring get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE alertmanager-main-db-alertmanager-main-0 Bound pvc-0a328708-e447-4e80-8547-7f524d3d9d2b 4Gi RWO gp2 105m alertmanager-main-db-alertmanager-main-1 Bound pvc-035ca65b-2bd4-4a5a-ad93-373543dca3e3 4Gi RWO gp2 105m alertmanager-main-db-alertmanager-main-2 Bound pvc-e9fc6f9b-389d-44c6-9493-91b250110266 4Gi RWO gp2 105m prometheus-prometheus-k8s-0 Bound pvc-700d2f6f-f341-4208-8702-4ffe5e334f10 10Gi RWO gp2 106m prometheus-prometheus-k8s-1 Bound pvc-4e6e5805-a0b7-49ee-aa34-9ff25b6685c5 10Gi RWO gp2 106m
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.16 security and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2286