1933847 – Prometheus goes unavailable (both instances down) during 4.8 upgrade

Bug 1933847 - Prometheus goes unavailable (both instances down) during 4.8 upgrade

Summary: Prometheus goes unavailable (both instances down) during 4.8 upgrade

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Monitoring
Sub Component:
Version:	4.8
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	urgent
Target Milestone:	---
Target Release:	4.10.0
Assignee:	Simon Pasquier
QA Contact:	hongyan li
Docs Contact:	Brian Burt
URL:
Whiteboard:
Duplicates (1):	1992446 (view as bug list)
Depends On:	1995924
Blocks:	1984103
TreeView+	depends on / blocked

Reported:	2021-03-01 20:43 UTC by Clayton Coleman
Modified:	2024-12-20 19:42 UTC (History)
CC List:	23 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Previously, during OpenShift Container Platform upgrades, the Prometheus service could become unavailable because either two Prometheus pods were located on the same node or the two nodes running the pods rebooted during the same interval. This situation was possible because the Prometheus pods had soft anti-affinity rules regarding node placement and no pod disruption budget. As a result, metrics were not collected and rules were not evaluated over a period of time. To address this issue, the Cluster Monitoring Operator (CMO) now configures hard anti-affinity rules to ensure that the two Prometheus pods are scheduled on different nodes. The CMO also provisions a pod disruption budget to ensure that at least one Prometheus pod is always running. As a result, during upgrades, the nodes now reboot in sequence to ensure that at least one Prometheus pod is always running.
Clone Of:
Environment:
Last Closed:	2022-03-12 04:34:58 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	openshift cluster-monitoring-operator pull 1151	None	closed	Bug 1933847: Manage PodDisruptionBudget objects	2021-05-21 09:18:40 UTC
Github	openshift cluster-monitoring-operator pull 1341	None	None	None	2021-08-24 15:33:31 UTC
Github	openshift release pull 21258	None	None	None	2021-09-01 19:53:34 UTC
Github	prometheus-operator prometheus-operator issues 3917	None	open	Manage PodDisruptionBudget objects for alertmanager and prometheus	2021-06-03 20:09:42 UTC
Red Hat Product Errata	RHSA-2022:0056	None	None	None	2022-03-12 04:35:19 UTC

Description Clayton Coleman 2021-03-01 20:43:12 UTC

During a 4.8 to 4.8 PR job upgrade prometheus had both instances down at the same time (pod was on the same node).  Prometheus needs anti-affinity and a PDB, otherwise availability of metrics is violated.

https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/25904/pull-ci-openshift-origin-master-e2e-gcp-upgrade/1366424416887508992

Mar 01 18:14:32.764 W ns/openshift-monitoring pod/prometheus-k8s-1 node/ci-op-rzz100ym-db044-jl74q-worker-d-xzb54 reason/Deleted
Mar 01 18:14:32.885 W ns/openshift-monitoring pod/prometheus-k8s-0 node/ci-op-rzz100ym-db044-jl74q-worker-d-xzb54 reason/Deleted
Mar 01 18:14:32.910 I ns/openshift-monitoring pod/prometheus-k8s-1 node/ reason/Created
Mar 01 18:14:32.932 W ns/openshift-monitoring pod/prometheus-k8s-1 reason/FailedScheduling 0/6 nodes are available: 2 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 2 node(s) had volume node affinity conflict, 2 node(s) were unschedulable.
Mar 01 18:14:32.932 I ns/openshift-monitoring statefulset/prometheus-k8s reason/SuccessfulCreate create Pod prometheus-k8s-1 in StatefulSet prometheus-k8s successful
Mar 01 18:14:32.974 W ns/openshift-monitoring pod/prometheus-k8s-1 reason/FailedScheduling 0/6 nodes are available: 2 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 2 node(s) had volume node affinity conflict, 2 node(s) were unschedulable.
Mar 01 18:14:33.070 I ns/openshift-monitoring pod/prometheus-k8s-0 node/ reason/Created
Mar 01 18:14:33.080 W ns/openshift-monitoring pod/prometheus-k8s-0 reason/FailedScheduling 0/6 nodes are available: 2 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 2 node(s) had volume node affinity conflict, 2 node(s) were unschedulable.
Mar 01 18:14:33.081 I ns/openshift-monitoring statefulset/prometheus-k8s reason/SuccessfulCreate create Pod prometheus-k8s-0 in StatefulSet prometheus-k8s successful
Mar 01 18:14:33.112 W ns/openshift-monitoring pod/prometheus-k8s-0 reason/FailedScheduling 0/6 nodes are available: 2 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 2 node(s) had volume node affinity conflict, 2 node(s) were unschedulable.
Mar 01 18:14:35.369 W ns/openshift-monitoring pod/prometheus-k8s-1 reason/FailedScheduling 0/6 nodes are available: 2 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 2 node(s) had volume node affinity conflict, 2 node(s) were unschedulable.
Mar 01 18:15:05.240 W ns/openshift-monitoring pod/prometheus-k8s-0 reason/FailedScheduling 0/6 nodes are available: 2 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 2 node(s) had volume node affinity conflict, 2 node(s) were unschedulable.
Mar 01 18:15:32.883 - 216s  W ns/openshift-monitoring pod/prometheus-k8s-1 node/ pod has been pending longer than a minute
Mar 01 18:15:33.883 - 217s  W ns/openshift-monitoring pod/prometheus-k8s-0 node/ pod has been pending longer than a minute
Mar 01 18:18:49.576 W ns/openshift-monitoring pod/prometheus-k8s-1 reason/FailedScheduling 0/6 nodes are available: 2 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 2 node(s) had volume node affinity conflict, 2 node(s) were unschedulable.
Mar 01 18:18:49.703 W ns/openshift-monitoring pod/prometheus-k8s-0 reason/FailedScheduling 0/6 nodes are available: 2 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 2 node(s) had volume node affinity conflict, 2 node(s) were unschedulable.
Mar 01 18:19:00.433 W ns/openshift-monitoring pod/prometheus-k8s-1 reason/FailedScheduling 0/6 nodes are available: 2 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 2 node(s) had volume node affinity conflict, 2 node(s) were unschedulable.
Mar 01 18:19:00.433 W ns/openshift-monitoring pod/prometheus-k8s-0 reason/FailedScheduling 0/6 nodes are available: 2 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 2 node(s) had volume node affinity conflict, 2 node(s) were unschedulable.
Mar 01 18:19:10.491 I ns/openshift-monitoring pod/prometheus-k8s-1 node/ci-op-rzz100ym-db044-jl74q-worker-d-xzb54 reason/Scheduled
Mar 01 18:19:10.883 - 11s   W ns/openshift-monitoring pod/prometheus-k8s-1 node/ci-op-rzz100ym-db044-jl74q-worker-d-xzb54 pod has been pending longer than a minute

The PR job in question tests that thanos querier reports continuous availability of the Watchdog alert, which failed, which is because for 4 minutes there was no prometheus instance scheduled

Mar 01 18:19:10.491 I ns/openshift-monitoring pod/prometheus-k8s-1 node/ci-op-rzz100ym-db044-jl74q-worker-d-xzb54 reason/Scheduled
Mar 01 18:19:11.445 I ns/openshift-monitoring pod/prometheus-k8s-0 node/ci-op-rzz100ym-db044-jl74q-worker-d-xzb54 reason/Scheduled
minute

So the test correctly caught the lack of availability of prometheus.

If this bug is duped on another bug please ensure both PDB and antiaffinity are set, not just one or the other.

Comment 1 Ben Parees 2021-03-04 20:46:24 UTC

the prom stateful set has:
   spec:
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: prometheus
                  operator: In
                  values:
                  - k8s
              namespaces:
              - openshift-monitoring
              topologyKey: kubernetes.io/hostname
            weight: 100


and alertmanager has:
   spec:
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: alertmanager
                  operator: In
                  values:
                  - main
              namespaces:
              - openshift-monitoring
              topologyKey: kubernetes.io/hostname
            weight: 100

thanos-querier has:
   spec:
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: app.kubernetes.io/name
                  operator: In
                  values:
                  - thanos-query
              namespaces:
              - openshift-monitoring
              topologyKey: kubernetes.io/hostname
            weight: 100



prometheus adapter has no anti-affinity

so:
1) possible we need it on the adapter?
2) possible the existing anti-affinity rules need to be weighted higher, or something else happened on the cluster that meant anti affinity could not be enforced?

on the 6 node (3 worker, 3 master) cluster i'm looking at, the pods are mostly spread across nodes, but alert manager does have 2 pods on one node (and the third one another node), which would imply its anti-affinity request was not respected:

alertmanager-main-0                            5/5     Running   0          18m   10.128.2.8     ip-10-0-191-156.us-east-2.compute.internal   <none>           <none>
alertmanager-main-1                            5/5     Running   0          18m   10.131.0.19    ip-10-0-192-222.us-east-2.compute.internal   <none>           <none>
alertmanager-main-2                            5/5     Running   0          18m   10.128.2.9     ip-10-0-191-156.us-east-2.compute.internal   <none>           <none>

Comment 2 Junqi Zhao 2021-03-05 01:58:17 UTC

Mar 01 18:14:33.112 W ns/openshift-monitoring pod/prometheus-k8s-0 reason/FailedScheduling 0/6 nodes are available: 2 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 2 node(s) had volume node affinity conflict, 2 node(s) were unschedulable.

it seems it is related to storage

Comment 3 Ben Parees 2021-03-31 19:45:51 UTC

per see: https://github.com/openshift/enhancements/pull/718/files

I guess the issue here is that we are using "preferredDuringSchedulingIgnoredDuringExecution" and not "requiredDuringSchedulingIgnoredDuringExecution":

https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#always-co-located-in-the-same-node

but it sounds like a pod disruption budget needs to be set also to ensure that at least 1 is kept running in cases where there is nowhere to schedule the 2nd instance.

Comment 4 Clayton Coleman 2021-04-20 21:24:43 UTC

Yes, PDB is necessary to survive dual disruption during normal upgrade, and spreading is necessary to survive single host failure.

Comment 5 Jessica Forrester 2021-04-20 21:28:25 UTC

As clayton is eluding to, we allow configuration of the parallelism in normal upgrades so it is not uncommon to have 2+ nodes evicted / upgraded at once. Without a PDB that could easily end up hitting both prometheus pods at the same time.

Comment 6 Ben Parees 2021-04-20 21:33:39 UTC

that and the current configuration of "preferred" instead of required means 2 pods can actually still land on the same node.

Comment 7 Simon Pasquier 2021-04-26 08:40:29 UTC

bug 1949262 targets the scheduling configuration for Prometheus and Alertmanager (e.g. switching from soft to hard affinity). This bug is about setting up pod disuption budget. As far as I understand, Pawel has done the ground work in kube-prometheus but he needs to wire it up in the cluster monitoring operator.

Comment 8 Simon Pasquier 2021-04-27 06:43:36 UTC

*** Bug 1953647 has been marked as a duplicate of this bug. ***

Comment 9 Damien Grisonnet 2021-04-30 13:11:17 UTC

Since the discussion has been moved to upstream prometheus-operator, would it be meaningful to extend the scope of this BZ to thanos-ruler?

Comment 10 Simon Pasquier 2021-04-30 14:13:36 UTC

Yes. If Pawel thinks that is feasible, this BZ can cover all the resources managed by prometheus-operator (not just prometheus).

Comment 11 Pawel Krupa 2021-05-06 08:43:58 UTC

The solution that works for OpenShift already landed in CMO, so if needed let's open new BZ for thanos-ruler.

For a more generic solution we have an open issue[1] in prometheus-operator to move the PDB setting logic into the operator itself.

I am setting MODIFIED as from OpenShift perspective it should be fixed.

[1]: https://github.com/prometheus-operator/prometheus-operator/issues/3917

Comment 14 Junqi Zhao 2021-05-25 03:55:40 UTC

upgrade from 4.8.0-0.nightly-2021-05-21-200728 to 4.8.0-0.nightly-2021-05-21-233425, no Prometheus instance go unavailable
# oc -n openshift-monitoring get PodDisruptionBudget
NAME                 MIN AVAILABLE   MAX UNAVAILABLE   ALLOWED DISRUPTIONS   AGE
prometheus-adapter   1               N/A               1                     162m
prometheus-k8s       1               N/A               1                     158m

# oc -n openshift-monitoring get prometheus k8s -oyaml | grep podAntiAffinity -A10
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchLabels:
            app.kubernetes.io/component: prometheus
            app.kubernetes.io/name: prometheus
            app.kubernetes.io/part-of: openshift-monitoring
            prometheus: k8s
        namespaces:
        - openshift-monitoring
        topologyKey: kubernetes.io/hostname

# oc -n openshift-monitoring get deploy prometheus-adapter -oyaml | grep podAntiAffinity -A10
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchLabels:
                app.kubernetes.io/component: metrics-adapter
                app.kubernetes.io/name: prometheus-adapter
                app.kubernetes.io/part-of: openshift-monitoring
            namespaces:
            - openshift-monitoring
            topologyKey: kubernetes.io/hostname
      containers:

Comment 15 Damien Grisonnet 2021-06-14 15:09:08 UTC

Moving this BZ back to assigned since the fix was reverted in [1], because of 4.8 blocker bug 1967614.

[1] https://github.com/openshift/cluster-monitoring-operator/pull/1204

Comment 16 Scott Dodson 2021-08-20 15:43:27 UTC

Can we get an updated summary of what's happening with this bug? For the time being I'm marking it as a blocker for 4.9 given it was reported well before 4.8 GA and we've had 3 months to come up with a plan by now.

Comment 19 Simon Pasquier 2021-08-24 13:15:40 UTC

We have a few prerequisites that need to happen before we can fix this specific issue. This is the plan we've discussed with Damien so far.

1. cluster monitoring operator sets Upgradeable condition to false whenever it detects that stateful pods aren't correctly spread across nodes (bug 1995924 addressed by https://github.com/openshift/cluster-monitoring-operator/pull/1330).
2. we're adding a script to the the upgrade CI jobs ensuring that pods are correctly spread before the upgrade happens (see https://github.com/openshift/release/pull/21258).
3. we backport bug 1995924 to 4.8.z so CMO blocks the upgrade to 4.9 when monitoring pods aren't correctly balanced.
4. we enable hard anti-affinity + PDB for the monitoring stateful workloads that can support it (*): bug 1933847, bug 1949262, bug 1955490. I'll work on a work-in-progress PR ASAP but the bits should already be there (we did the implementation in 4.8 already but we had to revert).

We believe that steps 1, 2 and 3 are doable before 4.9 GA. Step 4 might be more risky (changing the scheduling constraints can break things as we've experienced during the 4.8 dev cycle) but if we notice regressions, a revert is always possible.

(*) which excludes Alertmanager because we run 3 replicas of them and the minimal number of workers is 2 so there might environments where 2 pods end up on the same node, no matter what. The plan (as described in bug 1955489) is to scale down to 2 replicas but we don't have all the features needed yet in Prometheus operator to make it happen in 4.9.

Comment 26 Simon Pasquier 2021-10-12 12:42:11 UTC

*** Bug 1992446 has been marked as a duplicate of this bug. ***

Comment 30 Simon Pasquier 2021-11-23 07:56:07 UTC

https://github.com/openshift/cluster-monitoring-operator/pull/1341 has been merged

Comment 32 hongyan li 2021-11-23 12:47:06 UTC

set the bug as verified as bug https://bugzilla.redhat.com/show_bug.cgi?id=1995924 has been verified.

Comment 33 hongyan li 2021-11-23 12:52:42 UTC

Test with payload 4.10.0-0.nightly-2021-11-23-090522

Label node
schedule prometheus-k8s to one node with node selector and create pvc at the same time
remove label from node and delete node selector from config map

annotate one pvc with "openshift.io/cluster-monitoring-drop-pvc":"yes" by editing pvc
% oc -n openshift-monitoring get pod -owide |grep prometheus-k8s                                                                         
prometheus-k8s-0                               7/7     Running   0            8m4s    10.129.2.41    ip-10-0-189-252.ap-northeast-2.compute.internal   <none>           <none>
prometheus-k8s-1                               7/7     Running   0            100m    10.128.2.192   ip-10-0-147-182.ap-northeast-2.compute.internal   <none>           <none> 

% oc adm upgrade
Cluster version is 4.10.0-0.nightly-2021-11-23-090522

Upstream: https://openshift-release.apps.ci.l2s4.p1.openshiftapps.com/graph
Channel: stable-4.10
Updates:

VERSION                            IMAGE

Annotate pvc solved single point failure.

Comment 34 hongyan li 2021-11-24 02:47:34 UTC

correct comment 33:

Set up cluster with ocp 4.9.8

Label one worker node
schedule prometheus-k8s to the node with node selector and create pvc at the same time
Two instances of prometheus-k8s are running on the same node.

hongyli@hongyli-mac Downloads % oc -n openshift-monitoring get pod -owide |grep prometheus-k8s
prometheus-k8s-0                               7/7     Running   0               61m     10.128.2.188   ip-10-0-147-182.ap-northeast-2.compute.internal   <none>           <none>
prometheus-k8s-1                               7/7     Running   0               72m     10.128.2.187   ip-10-0-147-182.ap-northeast-2.compute.internal   <none>           <none>
hongyli@hongyli-mac Downloads % oc -n openshift-monitoring get pvc                                          
NAME                                 STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
prometheus-k8s-db-prometheus-k8s-0   Bound    pvc-202c790a-5de0-438f-89d0-a6f4a0455c2c   10Gi       RWO            gp2            61m
prometheus-k8s-db-prometheus-k8s-1   Bound    pvc-6e9245d3-899e-4d70-9a12-14b2a6f3e5b6   10Gi       RWO            gp2            72m
hongyli@hongyli-mac Downloads % oc adm upgrade
Cluster version is 4.9.8

Upgradeable=False

  Reason: WorkloadSinglePointOfFailure
  Message: Cluster operator monitoring should not be upgraded between minor versions: Highly-available workload in namespace openshift-monitoring, with label map["app.kubernetes.io/name":"prometheus"] and persistent storage enabled has a single point of failure.
Manual intervention is needed to upgrade to the next minor version. For each highly-available workload that has a single point of failure please mark at least one of their PersistentVolumeClaim for deletion by annotating them with map["openshift.io/cluster-monitoring-drop-pvc":"yes"].

Upstream: https://openshift-release.apps.ci.l2s4.p1.openshiftapps.com/graph
Channel: stable-4.10
Updates:

VERSION                            IMAGE
4.10.0-0.ci-2021-11-20-192530      registry.ci.openshift.org/ocp/release@sha256:b59c2a8b6347e3bd91ce8920d5b1c84c4d267e7ea3085ddbd0419d4d6e95843c

remove node selector from configmap and remove label from the node
annotate one pvc with "openshift.io/cluster-monitoring-drop-pvc":"yes" by edit one pvc

% oc -n openshift-monitoring get pod -owide |grep prometheus-k8s                                                                         
prometheus-k8s-0                               7/7     Running   0            8m4s    10.129.2.41    ip-10-0-189-252.ap-northeast-2.compute.internal   <none>           <none>
prometheus-k8s-1                               7/7     Running   0            100m    10.128.2.192   ip-10-0-147-182.ap-northeast-2.compute.internal   <none>           <none> 

% oc adm upgrade
Cluster version is 4.9.8

Upstream: https://openshift-release.apps.ci.l2s4.p1.openshiftapps.com/graph
Channel: stable-4.10
Updates:

VERSION                            IMAGE
4.10.0-0.ci-2021-11-20-192530      registry.ci.openshift.org/ocp/release@sha256:b59c2a8b6347e3bd91ce8920d5b1c84c4d267e7ea3085ddbd0419d4d6e95843c

% oc adm upgrade --to='4.10.0-0.nightly-2021-11-23-090522'
% oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.10.0-0.nightly-2021-11-23-090522   True        False         12h     Cluster version is 4.10.0-0.nightly-2021-11-23-090522

% oc -n openshift-monitoring get PodDisruptionBudget prometheus-k8s -oyaml
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  creationTimestamp: "2021-11-23T12:57:14Z"
  generation: 1
  labels:
    app.kubernetes.io/component: prometheus
    app.kubernetes.io/name: prometheus
    app.kubernetes.io/part-of: openshift-monitoring
    app.kubernetes.io/version: 2.30.3
  name: prometheus-k8s
  namespace: openshift-monitoring
spec:
  minAvailable: 1
  selector:
    matchLabels:
      app.kubernetes.io/component: prometheus
      app.kubernetes.io/name: prometheus
      app.kubernetes.io/part-of: openshift-monitoring
      prometheus: k8s
status:

% oc -n openshift-monitoring get sts prometheus-k8s -oyaml|grep podAntiAffinity -A10
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchLabels:
                app.kubernetes.io/component: prometheus
                app.kubernetes.io/name: prometheus
                app.kubernetes.io/part-of: openshift-monitoring
                prometheus: k8s
            namespaces:
            - openshift-monitoring

% oc -n openshift-user-workload-monitoring get poddisruptionbudget prometheus-user-workload -oyaml
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  creationTimestamp: "2021-11-24T02:39:50Z"
  generation: 1
  labels:
    app.kubernetes.io/component: prometheus
    app.kubernetes.io/name: prometheus
    app.kubernetes.io/part-of: openshift-monitoring
    app.kubernetes.io/version: 2.30.3
  name: prometheus-user-workload
  namespace: openshift-user-workload-monitoring
  resourceVersion: "533928"
  uid: efa467ec-2a00-4c6d-a2f3-636fbc543425
spec:
  minAvailable: 1
  selector:
    matchLabels:
      app.kubernetes.io/component: prometheus
      app.kubernetes.io/name: prometheus
      app.kubernetes.io/part-of: openshift-monitoring
      prometheus: user-workload

% oc -n openshift-user-workload-monitoring get poddisruptionbudget thanos-ruler-user-workload -oyaml
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  creationTimestamp: "2021-11-24T02:39:55Z"
  generation: 1
  labels:
    thanosRulerName: user-workload
  name: thanos-ruler-user-workload
  namespace: openshift-user-workload-monitoring
  resourceVersion: "534024"
  uid: d2a70bc9-34d8-4eac-bc93-0d31e675eec3
spec:
  minAvailable: 1
  selector:
    matchLabels:
      app.kubernetes.io/name: thanos-ruler
      thanos-ruler: user-workload

% oc -n openshift-user-workload-monitoring get sts prometheus-user-workload -oyaml|grep affinity -A10
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchLabels:
                app.kubernetes.io/component: prometheus
                app.kubernetes.io/name: prometheus
                app.kubernetes.io/part-of: openshift-monitoring
                prometheus: user-workload
            namespaces:
            - openshift-user-workload-monitoring
% oc -n openshift-user-workload-monitoring get sts thanos-ruler-user-workload -oyaml|grep affinity -A10
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchLabels:
                app.kubernetes.io/name: thanos-ruler
                thanos-ruler: user-workload
            namespaces:
            - openshift-user-workload-monitoring
            topologyKey: kubernetes.io/hostname
      containers:

Comment 37 errata-xmlrpc 2022-03-12 04:34:58 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056

Note You need to log in before you can comment on or make changes to this bug.

aharchin
amuller
anpicker
bburt
bparees
cruhm
dgrisonn
erooth
hongyli
jbasquil
jforrest
jhopper
joedward
kahara
mbargenq
mbukatov
qili
rh-container
scuppett
sdodson
spasquie
travi
wking