Bug 2016753 - when enabling user workload monitoring, there is no anti affinity rule for the thanos-ruler
Summary: when enabling user workload monitoring, there is no anti affinity rule for th...
Keywords:
Status: CLOSED DUPLICATE of bug 1955490
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Monitoring
Version: 4.8
Hardware: All
OS: All
unspecified
medium
Target Milestone: ---
: ---
Assignee: Simon Pasquier
QA Contact: Junqi Zhao
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-10-23 10:41 UTC by Kai-Uwe Rommel
Modified: 2021-10-25 07:55 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-10-25 07:55:56 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Kai-Uwe Rommel 2021-10-23 10:41:37 UTC
Description of problem:
When enabling user workload monitoring then in the openshift-user-workload-monitoring project, a "prometheus" and a "thanosruler" object are created from which the associated controler creates statefulsets etc. For both, two replicas are created.
In the "prometheus" object, there is an anti affinity rule in the specification, for the "thanosruler", it is missing.
Often, then the thanos-ruler pods are created on the same node and this causes "HighlyAvailableWorkloadIncorrectlySpread" alerts. It would be easy to add such a rule to the "thanosruler" object as well but the operator needs to do it.

How reproducible:
100%

Steps to Reproduce:
1. enable user workload monitoring
2. sometimes the thanos-ruler isntances both run on the same node
3. this causes an alert.

Actual results:
both thanos-ruler instances sometimes run on the same node

Expected results:
the two thanos-ruler instances should always run on different nodes

Additional info:
The thanosruler API object supports an anti affinity rule:
https://docs.openshift.com/container-platform/4.9/rest_api/monitoring_apis/thanosruler-monitoring-coreos-com-v1.html
But the monitoring operator needs to use it.

Comment 1 Simon Pasquier 2021-10-25 07:55:56 UTC

*** This bug has been marked as a duplicate of bug 1955490 ***


Note You need to log in before you can comment on or make changes to this bug.