Bug 1564939 - RFE: Cassandra pods for metrics are not failure domain aware
Summary: RFE: Cassandra pods for metrics are not failure domain aware
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Hawkular
Version: 3.9.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: 3.11.z
Assignee: John Sanda
QA Contact: Junqi Zhao
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-04-09 01:33 UTC by raffaele spazzoli
Modified: 2018-09-10 18:57 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-09-10 18:57:31 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description raffaele spazzoli 2018-04-09 01:33:00 UTC
Description of problem:
when installing metrics in a cluster that has failure domains, the cassandra pods should be failure domain aware and distributing themselves across those domains.

The conventional label that identifies the failure domains are:
failure-domain.beta.kubernetes.io/region=us-central1,failure-domain.beta.kubernetes.io/zone=us-central1-a

These labels are set automatically when running in a cloud environment, but can be manually set also for on premise deployments.


Version-Release number of selected component (if applicable):
metrics-cassandra:v3.9.14


How reproducible:
100%


Steps to Reproduce:
1. deploy a cassandra cluster in a multi-AZ OCP cluster 



Actual results:

the multiple cassandra rc ignore the failure domain.
They may actually accidentally distribute themselves correctly, but there is no directive for creating an anti affinity behavior based on the failure domain.



Additional info:
If stateful set was used to deploy the cassandra cluster this behavior would be easier to configure.

Comment 1 John Sanda 2018-05-29 19:44:03 UTC
Is this the same as bug 1563853? If so, can we close this ticket?

Comment 2 raffaele spazzoli 2018-05-29 20:35:49 UTC
John, it is not the same.
1564939 is asking for two pods to not be started in the same failure domain.
1563853 is asking for two pods to not be started on the same node


Note You need to log in before you can comment on or make changes to this bug.