Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1957704

Summary: Prometheus Statefulsets should have 2 replicas and hard affinity set
Product: OpenShift Container Platform Reporter: Damien Grisonnet <dgrisonn>
Component: MonitoringAssignee: Damien Grisonnet <dgrisonn>
Status: CLOSED WONTFIX QA Contact: Junqi Zhao <juzhao>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.6CC: anpicker, aos-bugs, dgrisonn, erooth, jeder, juzhao, lcosic, rgudimet, spasquie, sraje, vjaypurk, wking
Target Milestone: ---   
Target Release: 4.6.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1957703 Environment:
Last Closed: 2021-10-15 12:34:48 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1957703    
Bug Blocks:    

Comment 1 Damien Grisonnet 2021-05-25 17:29:21 UTC
Backport to 4.7 is still waiting for patch manager approval.

Comment 2 Junqi Zhao 2021-05-31 12:08:31 UTC
Tested with the not merged PR, hard anti-affinity to Prometheuses is added, prometheus-k8s and prometheus-user-workload pods are scheduled to different nodes
# oc -n openshift-monitoring get sts prometheus-k8s -oyaml | grep podAntiAffinity -A10
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: prometheus
                operator: In
                values:
                - k8s
            namespaces:
            - openshift-monitoring
            topologyKey: kubernetes.io/hostname
# oc -n openshift-user-workload-monitoring get sts prometheus-user-workload -oyaml  | grep podAntiAffinity -A10
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: prometheus
                operator: In
                values:
                - user-workload
            namespaces:
            - openshift-user-workload-monitoring
            topologyKey: kubernetes.io/hostname
# oc -n openshift-monitoring get pod -o wide | grep prometheus-k8s
prometheus-k8s-0                               6/6     Running   1          41m   10.129.2.10   ci-ln-ckgb6k2-f76d1-ks7lc-worker-d-dccrt   <none>           <none>
prometheus-k8s-1                               6/6     Running   1          41m   10.128.2.6    ci-ln-ckgb6k2-f76d1-ks7lc-worker-c-5t7fg   <none>           <none>
# oc -n openshift-user-workload-monitoring get po -o wide 
NAME                                  READY   STATUS    RESTARTS   AGE    IP            NODE                                       NOMINATED NODE   READINESS GATES
prometheus-operator-7f9c94d4b-4jdnb   2/2     Running   0          102s   10.129.0.51   ci-ln-ckgb6k2-f76d1-ks7lc-master-2         <none>           <none>
prometheus-user-workload-0            4/4     Running   1          96s    10.128.2.9    ci-ln-ckgb6k2-f76d1-ks7lc-worker-c-5t7fg   <none>           <none>
prometheus-user-workload-1            4/4     Running   1          96s    10.129.2.26   ci-ln-ckgb6k2-f76d1-ks7lc-worker-d-dccrt   <none>           <none>
thanos-ruler-user-workload-0          3/3     Running   0          94s    10.129.2.27   ci-ln-ckgb6k2-f76d1-ks7lc-worker-d-dccrt   <none>           <none>
thanos-ruler-user-workload-1          3/3     Running   0          94s    10.128.2.10   ci-ln-ckgb6k2-f76d1-ks7lc-worker-c-5t7fg   <none>           <none>

Comment 5 W. Trevor King 2021-06-22 23:23:49 UTC
About the move back from POST, dropping 1186.  The issue was bug 1967966 showing some issues with the transition to hard-anti-affinity with a PDB guard when Prometheus was backed by a persistent volume [1].  So this whole hard-anti-affinity bug chain descended from bug 1949262 is back to pre-POST while we work out a fix that we can backport without surprising folks who currently have volumes that would block an attempt to push Prom out to separate nodes.  Bug 1974832 may be the next thing that moves in this space, but we'll see.

[1]: https://github.com/openshift/cluster-monitoring-operator/pull/1186#issuecomment-860766266