Bug 1462985

Summary:	Cassandra pods may enter the ready state prematurely when multiple pods are restarted with new ip addresses
Product:	OpenShift Container Platform	Reporter:	Matt Wringe <mwringe>
Component:	Hawkular	Assignee:	Matt Wringe <mwringe>
Status:	CLOSED DEFERRED	QA Contact:	Junqi Zhao <juzhao>
Severity:	high	Docs Contact:
Priority:	low
Version:	3.5.1	CC:	aos-bugs, erjones, gbaufake, jcantril, pweil, snegrea, stwalter
Target Milestone:	---
Target Release:	3.8.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2017-10-05 18:51:51 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Matt Wringe 2017-06-19 19:48:56 UTC

Description of problem:
In certain situations, when we scale down Cassandra and bring it back up again, the cluster will not properly form.

The pods in the cluster will try to join older existing ip address and not connect to the new cluster size.

Comment 2 Guilherme Baufaker Rêgo 2017-06-19 20:49:51 UTC

This problem occurs when you scale up cassandra (eg: Two pods and then scale it back to one) and when you scale cassandra pods to zero and then create a new ones.


It is very difficult to reproduce it on Openshift, but it seems to be related to the bug that logged (https://bugzilla.redhat.com/show_bug.cgi?id=1459345).

Comment 3 Matt Wringe 2017-06-20 19:06:33 UTC

(In reply to Guilherme Baufaker Rêgo from comment #2)
> This problem occurs when you scale up cassandra (eg: Two pods and then scale
> it back to one) and when you scale cassandra pods to zero and then create a
> new ones.

This would be another situation. We had an issue opened where we need to clarify in the docs what you need to do to scale up and down Cassandra pods. You can't scale down without encountering problems.

> 
> It is very difficult to reproduce it on Openshift, but it seems to be
> related to the bug that logged
> (https://bugzilla.redhat.com/show_bug.cgi?id=1459345).

Its not related to this.

Comment 4 Matt Wringe 2017-06-20 19:09:37 UTC

I am lowing the priority of this issue.

When you bring back up the Cassandra pods, it will for a moment try and connect to the old IP address, but it will eventually resolve the proper IP address of the new pods.

There are a few things we need to figure out here:

1) if there is some option we can give to Cassandra to only use the seed list when trying to determine what else is in the cluster and ignore any existing IP addresses it knew about in the past.

2) we should update our readiness probes to take into affect this situation. Currently the probes will go into the ready state when the cluster is still trying to figure things out.

Comment 10 Matt Wringe 2017-08-04 16:17:27 UTC

This will be resolved when we move over to stateful sets for our Cassandra pods.

Comment 11 Matt Wringe 2017-10-05 18:51:51 UTC

Deferring as this should be resolved when we move to stateful sets.