Bug 1564939

Summary:	RFE: Cassandra pods for metrics are not failure domain aware
Product:	OpenShift Container Platform	Reporter:	raffaele spazzoli <rspazzol>
Component:	Hawkular	Assignee:	John Sanda <jsanda>
Status:	CLOSED WONTFIX	QA Contact:	Junqi Zhao <juzhao>
Severity:	unspecified	Docs Contact:
Priority:	unspecified
Version:	3.9.0	CC:	aos-bugs, rspazzol, rvargasp
Target Milestone:	---
Target Release:	3.11.z
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2018-09-10 18:57:31 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description raffaele spazzoli 2018-04-09 01:33:00 UTC

Description of problem:
when installing metrics in a cluster that has failure domains, the cassandra pods should be failure domain aware and distributing themselves across those domains.

The conventional label that identifies the failure domains are:
failure-domain.beta.kubernetes.io/region=us-central1,failure-domain.beta.kubernetes.io/zone=us-central1-a

These labels are set automatically when running in a cloud environment, but can be manually set also for on premise deployments.


Version-Release number of selected component (if applicable):
metrics-cassandra:v3.9.14


How reproducible:
100%


Steps to Reproduce:
1. deploy a cassandra cluster in a multi-AZ OCP cluster 



Actual results:

the multiple cassandra rc ignore the failure domain.
They may actually accidentally distribute themselves correctly, but there is no directive for creating an anti affinity behavior based on the failure domain.



Additional info:
If stateful set was used to deploy the cassandra cluster this behavior would be easier to configure.

Comment 1 John Sanda 2018-05-29 19:44:03 UTC

Is this the same as bug 1563853? If so, can we close this ticket?

Comment 2 raffaele spazzoli 2018-05-29 20:35:49 UTC

John, it is not the same.
1564939 is asking for two pods to not be started in the same failure domain.
1563853 is asking for two pods to not be started on the same node