Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1494673 - Cassandra readiness probe can incorrectly fail in multi node setup
Cassandra readiness probe can incorrectly fail in multi node setup
Status: CLOSED ERRATA
Product: OpenShift Container Platform
Classification: Red Hat
Component: Hawkular (Show other bugs)
3.7.0
Unspecified Unspecified
unspecified Severity unspecified
: ---
: 3.7.0
Assigned To: Matt Wringe
Junqi Zhao
:
Depends On:
Blocks: 1496228 1511627 1511628 1511629 1511631
  Show dependency treegraph
 
Reported: 2017-09-22 15:15 EDT by Matt Wringe
Modified: 2017-11-28 17:12 EST (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1496228 1511627 1511628 1511629 1511631 (view as bug list)
Environment:
Last Closed: 2017-11-28 17:12:28 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2017:3188 normal SHIPPED_LIVE Moderate: Red Hat OpenShift Container Platform 3.7 security, bug, and enhancement update 2017-11-28 21:34:54 EST

  None (edit)
Description Matt Wringe 2017-09-22 15:15:04 EDT
Our Cassandra readiness probe will parse the output of 'nodetool status' to determine if the Cassandra instance is in the 'up' and 'normal' state.

Our string parsing of the output can have an issue in certain situations. If the string value of the current host's ip address is contained within the ip of another node in the cluster, then we will try and parse two lines of the output instead of just one.

For instance, consider the case where we have two nodes in our Cassandra cluster where their ip addresses are '172.17.0.3' and '172.17.0.3' ('72.17.0.3' and '172.17.0.3' would also cause a problem as well).

How we are parsing this output, our script would incorrectly try and handle both entries from 'nodetool status' instead of just the one.

This will cause the readiness probe to get unexpected information and fail.

If the pod is brought down and restarted, it should be granted a new ip address which should not conflict with the second ip address anymore and then be able to continue.
Comment 1 Matt Wringe 2017-09-22 15:16:20 EDT
Simple PR which fixes this issue by checking for whitespace before and after the ip address, thus preventing the script from considering the ip address the same: https://github.com/openshift/origin-metrics/pull/380
Comment 2 Junqi Zhao 2017-09-30 05:29:33 EDT
@Matt
Which image version contain the fix?
Do we still need to verify it failed with previous versions?
Comment 3 Junqi Zhao 2017-09-30 06:07:23 EDT
Tested with currently latest image:metrics-cassandra-v3.7.0-0.135.0.0, it returned "Cassandra is in the up and normal state"
Comment 4 Matt Wringe 2017-10-06 13:24:36 EDT
(In reply to Junqi Zhao from comment #2)
> @Matt
> Which image version contain the fix?

The latest 3.7 release should have this fixed.

> Do we still need to verify it failed with previous versions?

Its the exact same change as https://bugzilla.redhat.com/show_bug.cgi?id=1496228 which I believe was verified there.
Comment 5 Junqi Zhao 2017-10-08 20:38:48 EDT
Closed based on Comment 3 and Comment 4
Comment 9 errata-xmlrpc 2017-11-28 17:12:28 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:3188

Note You need to log in before you can comment on or make changes to this bug.