Bug 2016753

Summary: when enabling user workload monitoring, there is no anti affinity rule for the thanos-ruler
Product: OpenShift Container Platform Reporter: Kai-Uwe Rommel <kai-uwe.rommel>
Component: MonitoringAssignee: Simon Pasquier <spasquie>
Status: CLOSED DUPLICATE QA Contact: Junqi Zhao <juzhao>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.8CC: amuller, anpicker, aos-bugs, erooth
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-10-25 07:55:56 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Kai-Uwe Rommel 2021-10-23 10:41:37 UTC
Description of problem:
When enabling user workload monitoring then in the openshift-user-workload-monitoring project, a "prometheus" and a "thanosruler" object are created from which the associated controler creates statefulsets etc. For both, two replicas are created.
In the "prometheus" object, there is an anti affinity rule in the specification, for the "thanosruler", it is missing.
Often, then the thanos-ruler pods are created on the same node and this causes "HighlyAvailableWorkloadIncorrectlySpread" alerts. It would be easy to add such a rule to the "thanosruler" object as well but the operator needs to do it.

How reproducible:
100%

Steps to Reproduce:
1. enable user workload monitoring
2. sometimes the thanos-ruler isntances both run on the same node
3. this causes an alert.

Actual results:
both thanos-ruler instances sometimes run on the same node

Expected results:
the two thanos-ruler instances should always run on different nodes

Additional info:
The thanosruler API object supports an anti affinity rule:
https://docs.openshift.com/container-platform/4.9/rest_api/monitoring_apis/thanosruler-monitoring-coreos-com-v1.html
But the monitoring operator needs to use it.

Comment 1 Simon Pasquier 2021-10-25 07:55:56 UTC

*** This bug has been marked as a duplicate of bug 1955490 ***