Bug 1462985 - Cassandra pods may enter the ready state prematurely when multiple pods are restarted with new ip addresses
Cassandra pods may enter the ready state prematurely when multiple pods are r...
Status: CLOSED DEFERRED
Product: OpenShift Container Platform
Classification: Red Hat
Component: Metrics (Show other bugs)
3.5.1
Unspecified Unspecified
low Severity high
: ---
: 3.8.0
Assigned To: Matt Wringe
Junqi Zhao
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-06-19 15:48 EDT by Matt Wringe
Modified: 2017-10-05 14:51 EDT (History)
7 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-10-05 14:51:51 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Matt Wringe 2017-06-19 15:48:56 EDT
Description of problem:
In certain situations, when we scale down Cassandra and bring it back up again, the cluster will not properly form.

The pods in the cluster will try to join older existing ip address and not connect to the new cluster size.
Comment 2 Guilherme Baufaker Rêgo 2017-06-19 16:49:51 EDT
This problem occurs when you scale up cassandra (eg: Two pods and then scale it back to one) and when you scale cassandra pods to zero and then create a new ones.


It is very difficult to reproduce it on Openshift, but it seems to be related to the bug that logged (https://bugzilla.redhat.com/show_bug.cgi?id=1459345).
Comment 3 Matt Wringe 2017-06-20 15:06:33 EDT
(In reply to Guilherme Baufaker Rêgo from comment #2)
> This problem occurs when you scale up cassandra (eg: Two pods and then scale
> it back to one) and when you scale cassandra pods to zero and then create a
> new ones.

This would be another situation. We had an issue opened where we need to clarify in the docs what you need to do to scale up and down Cassandra pods. You can't scale down without encountering problems.

> 
> It is very difficult to reproduce it on Openshift, but it seems to be
> related to the bug that logged
> (https://bugzilla.redhat.com/show_bug.cgi?id=1459345).

Its not related to this.
Comment 4 Matt Wringe 2017-06-20 15:09:37 EDT
I am lowing the priority of this issue.

When you bring back up the Cassandra pods, it will for a moment try and connect to the old IP address, but it will eventually resolve the proper IP address of the new pods.

There are a few things we need to figure out here:

1) if there is some option we can give to Cassandra to only use the seed list when trying to determine what else is in the cluster and ignore any existing IP addresses it knew about in the past.

2) we should update our readiness probes to take into affect this situation. Currently the probes will go into the ready state when the cluster is still trying to figure things out.
Comment 10 Matt Wringe 2017-08-04 12:17:27 EDT
This will be resolved when we move over to stateful sets for our Cassandra pods.
Comment 11 Matt Wringe 2017-10-05 14:51:51 EDT
Deferring as this should be resolved when we move to stateful sets.

Note You need to log in before you can comment on or make changes to this bug.