Description of problem: In 4.9, we have introduced a new alert 'HighlyAvailableWorkloadIncorrectlySpread' which detects if a cluster has highly available workloads backed by PVCs that are incorrectly spread between multiple nodes. The goal was to have customers fix their clusters to allow the monitoring stack to move from soft-affinity on hostname to hard-affinity without experience node affinity scheduling issues. To make sure that clusters are fixed before the next minor upgrade, we should set the `Upgradeable` status to false in case HA workload is incorrectly spread. Version-Release number of selected component (if applicable): 4.9 How reproducible: Always Steps to Reproduce: 1. configure persistent storage for prometheus-k8s 2. Cordon all worker nodes except the one running prometheus-k8s-0 3. delete the PVC bound to prometheus-k8s-1 4. oc delete pod -n openshift-monitoring prometheus-k8s-1 5. Check that both prometheus-k8s pods are scheduled on the same node and that the `HighlyAvailableWorkloadIncorrectlySpread` alert is pending 6. See that CMO is still reporting `Upgradeable` status true Actual results: CMO reports `Upgradeable` status true Expected results: CMO should report `Upgradeable` status false
Updating this bugzilla to urgent priority/severity to be in line with its dependant bug: https://bugzilla.redhat.com/show_bug.cgi?id=1933847
4.10.0-0.nightly-2021-10-12-002740 cluster, cordon all workers except one worker, bind PVs for prometheus, and schedule prometheus-k8s pods to the same one node, Upgradeable is false now # oc get node | grep worker ip-10-0-132-142.us-east-2.compute.internal Ready,SchedulingDisabled worker 29m v1.22.1+9312243 ip-10-0-174-65.us-east-2.compute.internal Ready,SchedulingDisabled worker 33m v1.22.1+9312243 ip-10-0-255-240.us-east-2.compute.internal Ready worker 33m v1.22.1+9312243 # oc -n openshift-monitoring get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE prometheus-prometheus-k8s-0 Bound pvc-38fd8f1c-5698-4ef3-8fd0-a9c2232c2a7f 10Gi RWO gp2 8m12s prometheus-prometheus-k8s-1 Bound pvc-44e78437-3f4f-40fa-88cf-9397604cf1a1 10Gi RWO gp2 8m12s # oc -n openshift-monitoring get pod -o wide | grep prometheus-k8s prometheus-k8s-0 7/7 Running 0 7m41s 10.128.2.17 ip-10-0-255-240.us-east-2.compute.internal <none> <none> prometheus-k8s-1 7/7 Running 0 7m41s 10.128.2.18 ip-10-0-255-240.us-east-2.compute.internal <none> <none> # oc get co monitoring NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE monitoring 4.10.0-0.nightly-2021-10-12-002740 True False False 26m # oc get co monitoring -oyaml ... status: conditions: - lastTransitionTime: "2021-10-12T07:49:48Z" status: "False" type: Progressing - lastTransitionTime: "2021-10-12T08:08:23Z" status: "False" type: Degraded - lastTransitionTime: "2021-10-12T08:08:23Z" message: |- Highly-available workload in namespace openshift-monitoring, with label map["app.kubernetes.io/name":"prometheus"] and persistent storage enabled has a single point of failure. Manual intervention is needed to upgrade to the next minor version. For each highly-available workload that has a single point of failure please mark at least one of their PersistentVolumeClaim for deletion by annotating them with map["openshift.io/cluster-monitoring-drop-pvc":"yes"]. reason: WorkloadSinglePointOfFailure status: "False" type: Upgradeable - lastTransitionTime: "2021-10-12T07:49:48Z" message: Successfully rolled out the stack. reason: RollOutDone status: "True" type: Available # oc adm upgrade Cluster version is 4.10.0-0.nightly-2021-10-12-002740 Upgradeable=False Reason: ClusterOperatorsNotUpgradeable Message: Multiple cluster operators should not be upgraded between minor versions: * Cluster operator monitoring should not be upgraded between minor versions: WorkloadSinglePointOfFailure: Highly-available workload in namespace openshift-monitoring, with label map["app.kubernetes.io/name":"prometheus"] and persistent storage enabled has a single point of failure. Manual intervention is needed to upgrade to the next minor version. For each highly-available workload that has a single point of failure please mark at least one of their PersistentVolumeClaim for deletion by annotating them with map["openshift.io/cluster-monitoring-drop-pvc":"yes"]. * Cluster operator machine-config should not be upgraded between minor versions: PoolUpdating: One or more machine config pools are updating, please see `oc get mcp` for further details warning: Cannot display available updates: Reason: NoChannel Message: The update channel has not been configured.
Reverting the changes since we have noticed a lot of CI failures [1] caused by this addition. [1] https://search.ci.openshift.org/?search=monitoring.*Upgradeable%3DFalse&maxAge=48h&context=1&type=bug%2Bjunit&name=&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job
Set the bug as assigned, as the changed code is reverted
tested with 4.10.0-0.nightly-2021-10-13-001151, bind PVs for alertmanager/prometheus pods and schedule these pods to one same node, Upgradeable is True, since the code is reverted, if we want to track the revert code change, please file one new bug. # oc -n openshift-monitoring get pod -o wide | grep -E "prometheus-k8s|alertmanager-main" alertmanager-main-0 5/5 Running 0 19m 10.128.2.10 ip-10-0-174-1.ec2.internal <none> <none> alertmanager-main-1 5/5 Running 0 19m 10.128.2.14 ip-10-0-174-1.ec2.internal <none> <none> alertmanager-main-2 5/5 Running 0 19m 10.128.2.11 ip-10-0-174-1.ec2.internal <none> <none> prometheus-k8s-0 7/7 Running 0 19m 10.128.2.12 ip-10-0-174-1.ec2.internal <none> <none> prometheus-k8s-1 7/7 Running 0 19m 10.128.2.13 ip-10-0-174-1.ec2.internal <none> <none> # oc -n openshift-monitoring get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE alertmanager-alertmanager-main-0 Bound pvc-15e0bbd5-4331-4991-8038-d0a42bf02fa7 4Gi RWO gp2 20m alertmanager-alertmanager-main-1 Bound pvc-a76783eb-1ac6-4b22-ab4c-d1beb1d411d3 4Gi RWO gp2 20m alertmanager-alertmanager-main-2 Bound pvc-7321a16a-0fd5-4346-9d92-dea3e0aef6d3 4Gi RWO gp2 20m prometheus-prometheus-k8s-0 Bound pvc-c4a23f2d-29c5-4050-9dd3-778ce12a30e4 10Gi RWO gp2 21m prometheus-prometheus-k8s-1 Bound pvc-f21e76af-42fe-48a6-8580-8a88c2eeeee0 10Gi RWO gp2 21m ALERTS{alertname="HighlyAvailableWorkloadIncorrectlySpread"} ALERTS{alertname="HighlyAvailableWorkloadIncorrectlySpread", alertstate="pending", namespace="openshift-monitoring", severity="warning", workload="alertmanager-main"} 1 ALERTS{alertname="HighlyAvailableWorkloadIncorrectlySpread", alertstate="pending", namespace="openshift-monitoring", severity="warning", workload="prometheus-k8s"} 1 # oc get co monitoring -oyaml ... status: conditions: - lastTransitionTime: "2021-10-13T04:03:17Z" status: "False" type: Degraded - lastTransitionTime: "2021-10-13T03:24:27Z" reason: AsExpected status: "True" type: Upgradeable - lastTransitionTime: "2021-10-13T03:36:48Z" message: Successfully rolled out the stack. reason: RollOutDone status: "True" type: Available - lastTransitionTime: "2021-10-13T03:36:48Z" status: "False" type: Progressing
as the case title is `CMO should report `Upgradeable: false` when HA workload is incorrectly spread`, move the case to Assigned is expected
test with pr openshift cluster-monitoring-operator pull 1431 + 4.9.0-0.ci.test-2021-10-18-074221-ci-ln-2w3bhdk-latest hongyli@hongyli-mac Downloads % oc -n openshift-monitoring get pod -owide |grep -E 'alertmanager-main|prometheus-k8s' alertmanager-main-0 5/5 Running 0 26m 10.131.0.36 ci-ln-2w3bhdk-f76d1-vsbkj-worker-a-6c8xq <none> <none> alertmanager-main-1 5/5 Running 0 26m 10.131.0.34 ci-ln-2w3bhdk-f76d1-vsbkj-worker-a-6c8xq <none> <none> alertmanager-main-2 5/5 Running 0 26m 10.131.0.35 ci-ln-2w3bhdk-f76d1-vsbkj-worker-a-6c8xq <none> <none> prometheus-k8s-0 6/6 Running 0 26m 10.131.0.32 ci-ln-2w3bhdk-f76d1-vsbkj-worker-a-6c8xq <none> <none> prometheus-k8s-1 6/6 Running 0 26m 10.131.0.33 ci-ln-2w3bhdk-f76d1-vsbkj-worker-a-6c8xq <none> <none> hongyli@hongyli-mac Downloads % oc -n openshift-monitoring get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE alertmanager-main-db-alertmanager-main-0 Bound pvc-3726479e-6e6a-4583-8420-73a4457b6bc9 1Gi RWO standard 26m alertmanager-main-db-alertmanager-main-1 Bound pvc-c5a4d85c-ec52-41b0-8ced-eba55281a769 1Gi RWO standard 26m alertmanager-main-db-alertmanager-main-2 Bound pvc-d80f9ae3-2509-483f-aa67-542b14bb096f 1Gi RWO standard 26m prometheus-k8s-db-prometheus-k8s-0 Bound pvc-f1c04c5a-b22e-43a2-8442-9ce670b116bc 2Gi RWO standard 26m prometheus-k8s-db-prometheus-k8s-1 Bound pvc-4baf3853-6a1e-4332-8da2-6b82e0c86047 2Gi RWO standard 26m % oc get co monitoring NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE monitoring 4.9.0-0.ci.test-2021-10-18-074221-ci-ln-2w3bhdk-latest True False False 40m % oc get co monitoring -oyaml --- status: conditions: - lastTransitionTime: "2021-10-18T08:07:46Z" status: "False" type: Progressing - lastTransitionTime: "2021-10-18T08:26:24Z" status: "False" type: Degraded - lastTransitionTime: "2021-10-18T08:27:55Z" message: |- Highly-available workload in namespace openshift-monitoring, with label map["app.kubernetes.io/name":"prometheus"] and persistent storage enabled has a single point of failure. Highly-available workload in namespace openshift-monitoring, with label map["app.kubernetes.io/name":"alertmanager"] and persistent storage enabled has a single point of failure. Highly-available workload in namespace openshift-user-workload-monitoring, with label map["app.kubernetes.io/name":"prometheus"] and persistent storage enabled has a single point of failure. Highly-available workload in namespace openshift-user-workload-monitoring, with label map["app.kubernetes.io/name":"thanos-ruler"] and persistent storage enabled has a single point of failure. Manual intervention is needed to upgrade to the next minor version. For each highly-available workload that has a single point of failure please mark at least one of their PersistentVolumeClaim for deletion by annotating them with map["openshift.io/cluster-monitoring-drop-pvc":"yes"]. reason: WorkloadSinglePointOfFailure status: "False" type: Upgradeable - lastTransitionTime: "2021-10-18T08:07:46Z" message: Successfully rolled out the stack. reason: RollOutDone status: "True" type: Available % oc adm upgrade Cluster version is 4.9.0-0.ci.test-2021-10-18-074221-ci-ln-2w3bhdk-latest Upgradeable=False Reason: WorkloadSinglePointOfFailure Message: Cluster operator monitoring should not be upgraded between minor versions: Highly-available workload in namespace openshift-monitoring, with label map["app.kubernetes.io/name":"prometheus"] and persistent storage enabled has a single point of failure. Highly-available workload in namespace openshift-monitoring, with label map["app.kubernetes.io/name":"alertmanager"] and persistent storage enabled has a single point of failure. Highly-available workload in namespace openshift-user-workload-monitoring, with label map["app.kubernetes.io/name":"prometheus"] and persistent storage enabled has a single point of failure. Highly-available workload in namespace openshift-user-workload-monitoring, with label map["app.kubernetes.io/name":"thanos-ruler"] and persistent storage enabled has a single point of failure. Manual intervention is needed to upgrade to the next minor version. For each highly-available workload that has a single point of failure please mark at least one of their PersistentVolumeClaim for deletion by annotating them with map["openshift.io/cluster-monitoring-drop-pvc":"yes"]. warning: Cannot display available updates: Reason: NoChannel Message: The update channel has not been configured. oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://prometheus-k8s.openshift-monitoring.svc:9091/api/v1/alerts'|jq ... { "labels": { "alertname": "HighlyAvailableWorkloadIncorrectlySpread", "namespace": "openshift-monitoring", "severity": "warning", "workload": "alertmanager-main" }, "annotations": { "description": "Workload openshift-monitoring/alertmanager-main is incorrectly spread across multiple nodes which breaks high-availability requirements. Since the workload is using persistent volumes, manual intervention is needed. Please follow the guidelines provided in the runbook of this alert to fix this issue.", "runbook_url": "https://github.com/openshift/runbooks/blob/master/alerts/HighlyAvailableWorkloadIncorrectlySpread.md", "summary": "Highly-available workload is incorrectly spread across multiple nodes and manual intervention is needed." }, "state": "pending", "activeAt": "2021-10-18T08:27:35.421613011Z", "value": "1e+00" }, { "labels": { "alertname": "HighlyAvailableWorkloadIncorrectlySpread", "namespace": "openshift-monitoring", "severity": "warning", "workload": "prometheus-k8s" }, "annotations": { "description": "Workload openshift-monitoring/prometheus-k8s is incorrectly spread across multiple nodes which breaks high-availability requirements. Since the workload is using persistent volumes, manual intervention is needed. Please follow the guidelines provided in the runbook of this alert to fix this issue.", "runbook_url": "https://github.com/openshift/runbooks/blob/master/alerts/HighlyAvailableWorkloadIncorrectlySpread.md", "summary": "Highly-available workload is incorrectly spread across multiple nodes and manual intervention is needed." }, "state": "pending", "activeAt": "2021-10-18T08:27:35.421613011Z", "value": "1e+00" }, { "labels": { "alertname": "HighlyAvailableWorkloadIncorrectlySpread", "namespace": "openshift-user-workload-monitoring", "severity": "warning", "workload": "prometheus-user-workload" }, "annotations": { "description": "Workload openshift-user-workload-monitoring/prometheus-user-workload is incorrectly spread across multiple nodes which breaks high-availability requirements. Since the workload is using persistent volumes, manual intervention is needed. Please follow the guidelines provided in the runbook of this alert to fix this issue.", "runbook_url": "https://github.com/openshift/runbooks/blob/master/alerts/HighlyAvailableWorkloadIncorrectlySpread.md", "summary": "Highly-available workload is incorrectly spread across multiple nodes and manual intervention is needed." }, "state": "pending", "activeAt": "2021-10-18T08:28:05.421613011Z", "value": "1e+00" }, { "labels": { "alertname": "HighlyAvailableWorkloadIncorrectlySpread", "namespace": "openshift-user-workload-monitoring", "severity": "warning", "workload": "thanos-ruler-user-workload" }, "annotations": { "description": "Workload openshift-user-workload-monitoring/thanos-ruler-user-workload is incorrectly spread across multiple nodes which breaks high-availability requirements. Since the workload is using persistent volumes, manual intervention is needed. Please follow the guidelines provided in the runbook of this alert to fix this issue.", "runbook_url": "https://github.com/openshift/runbooks/blob/master/alerts/HighlyAvailableWorkloadIncorrectlySpread.md", "summary": "Highly-available workload is incorrectly spread across multiple nodes and manual intervention is needed." }, "state": "pending", "activeAt": "2021-10-18T08:28:05.421613011Z", "value": "1e+00" } ] } }
checked with 4.10.0-0.nightly-2021-11-11-170956, bound PVs for prometheus, and schedule prometheus pods to one same node, Upgradeable is False now oc -n openshift-monitoring get pod -o wide |grep prometheus-k8s prometheus-k8s-0 6/6 Running 0 6m11s 10.129.2.44 ip-10-0-185-227.us-east-2.compute.internal <none> <none> prometheus-k8s-1 6/6 Running 0 6m11s 10.129.2.45 ip-10-0-185-227.us-east-2.compute.internal <none> <none> # oc -n openshift-monitoring get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE prometheus-prometheus-k8s-0 Bound pvc-8df5236e-0142-4ff4-972e-98ead2aee5f4 10Gi RWO gp2 7m2s prometheus-prometheus-k8s-1 Bound pvc-b17950b3-af81-4bcb-9d22-b29c921d89f8 10Gi RWO gp2 7m2s # oc get co monitoring -oyaml ... - lastTransitionTime: "2021-11-12T08:52:33Z" message: |- Highly-available workload in namespace openshift-monitoring, with label map["app.kubernetes.io/name":"prometheus"] and persistent storage enabled has a single point of failure. Manual intervention is needed to upgrade to the next minor version. For each highly-available workload that has a single point of failure please mark at least one of their PersistentVolumeClaim for deletion by annotating them with map["openshift.io/cluster-monitoring-drop-pvc":"yes"]. reason: WorkloadSinglePointOfFailure status: "False" type: Upgradeable # oc adm upgrade Cluster version is 4.10.0-0.nightly-2021-11-11-170956 Upgradeable=False Reason: WorkloadSinglePointOfFailure Message: Cluster operator monitoring should not be upgraded between minor versions: Highly-available workload in namespace openshift-monitoring, with label map["app.kubernetes.io/name":"prometheus"] and persistent storage enabled has a single point of failure. Manual intervention is needed to upgrade to the next minor version. For each highly-available workload that has a single point of failure please mark at least one of their PersistentVolumeClaim for deletion by annotating them with map["openshift.io/cluster-monitoring-drop-pvc":"yes"]. Upstream: https://amd64.ocp.releases.ci.openshift.org/graph Channel: stable-4.10
Suppose the bug need update documentation, only annotating pvc with map["openshift.io/cluster-monitoring-drop-pvc":"yes"] can't make upgrade true and the pvc with be recreated quickly
correct comments 18, annotating pvc with map["openshift.io/cluster-monitoring-drop-pvc":"yes"] can make upgrade true, I added annotation by edit one pvc
see https://bugzilla.redhat.com/show_bug.cgi?id=2008540#c8 HighlyAvailableWorkloadIncorrectlySpread is removed
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056