1997948 – should schedule thanos-ruler pods to different node

Bug 1997948 - should schedule thanos-ruler pods to different node

Summary: should schedule thanos-ruler pods to different node

Keywords:
Status:	CLOSED DUPLICATE of bug 1955490
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Monitoring
Sub Component:
Version:	4.9
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Simon Pasquier
QA Contact:	Junqi Zhao
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-08-26 07:00 UTC by Junqi Zhao
Modified:	2021-08-26 07:52 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-08-26 07:52:52 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
thanos-ruler-user-workload StatefulSet file (7.97 KB, text/plain) 2021-08-26 07:00 UTC, Junqi Zhao	no flags	Details
View All

Description Junqi Zhao 2021-08-26 07:00:22 UTC

Created attachment 1817730 [details]
thanos-ruler-user-workload StatefulSet file

Description of problem:
sometimes, thanos-ruler pods would be scheduled to the same one node, if we attach PVs for thanos-ruler, this will fire HighlyAvailableWorkloadIncorrectlySpread alert, checked thanos-ruler-user-workload StatefulSet file, there is not podAntiAffinity setting
# oc -n openshift-user-workload-monitoring get pod -o wide
NAME                                   READY   STATUS             RESTARTS         AGE   IP            NODE                                     NOMINATED NODE   READINESS GATES
prometheus-operator-5688b4798d-5jdt5   2/2     Running            0                47m   10.129.0.55   juzhao-share1-2sqt7-master-1             <none>           <none>
prometheus-user-workload-0             5/5     Running            0                47m   10.128.4.13   juzhao-share1-2sqt7-juzhao-share1-79-0   <none>           <none>
prometheus-user-workload-1             5/5     Running            0                47m   10.131.2.15   juzhao-share1-2sqt7-juzhao-share1-79-1   <none>           <none>
thanos-ruler-user-workload-0           3/3     Running            0                33m   10.131.2.21   juzhao-share1-2sqt7-juzhao-share1-79-1   <none>           <none>
thanos-ruler-user-workload-1           2/3     Running            0                33m   10.131.2.22   juzhao-share1-2sqt7-juzhao-share1-79-1   <none>           <none>

*************************
        - alert: HighlyAvailableWorkloadIncorrectlySpread
          annotations:
            description: Workload {{ $labels.namespace }}/{{ $labels.workload }} is incorrectly
              spread across multiple nodes which breaks high-availability requirements.
              Since the workload is using persistent volumes, manual intervention is needed.
              Please follow the guidelines provided in the runbook of this alert to fix
              this issue.
            runbook_url: https://github.com/openshift/runbooks/blob/master/alerts/HighlyAvailableWorkloadIncorrectlySpread.md
            summary: Highly-available workload is incorrectly spread across multiple nodes
              and manual intervention is needed.
          expr: |
            count without (node)
            (
              group by (node, workload, namespace)
              (
                kube_pod_info{node!=""}
                * on(namespace,pod) group_left(workload)
                (
                  kube_pod_spec_volumes_persistentvolumeclaims_info
                  * on(namespace,pod) group_left(workload)
                  (
                    namespace_workload_pod:kube_pod_owner:relabel
                    * on(namespace,workload,workload_type) group_left()
                    (
                      count without(pod) (namespace_workload_pod:kube_pod_owner:relabel{namespace=~"(openshift-.*|kube-.*|default)"}) > 1
                    )
                  )
                )
              )
            ) == 1
          for: 1h
          labels:
            severity: warning
*************************
Version-Release number of selected component (if applicable):
4.9.0-0.nightly-2021-08-25-185404

How reproducible:
sometimes

Steps to Reproduce:
1. see the description
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Simon Pasquier 2021-08-26 07:52:52 UTC


*** This bug has been marked as a duplicate of bug 1955490 ***

Note You need to log in before you can comment on or make changes to this bug.