Bug 1377239 - Cassandra Connection ERROR encountered in Hawkular-metrics pod when IAAS is responding too slow
Summary: Cassandra Connection ERROR encountered in Hawkular-metrics pod when IAAS is r...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Hawkular
Version: 3.3.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: ---
Assignee: Matt Wringe
QA Contact: Peng Li
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-09-19 09:37 UTC by Xia Zhao
Modified: 2016-10-12 06:46 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
This fixes an issue where if Hawkular Metrics was restarted while it was originally creating its schema in Cassandra, then it could not properly connect to Cassandra on subsequent restarts. This issue could be encountered if Hawkular Metrics was manually stopped or if the the Hawkular Metrics instance was automatically restarted due to a lifecycle script timeout.
Clone Of:
Environment:
Last Closed: 2016-10-06 09:36:39 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
hawkular_metrics_log (29.23 KB, text/plain)
2016-09-19 09:39 UTC, Xia Zhao
no flags Details
events (23.10 KB, text/plain)
2016-09-19 09:39 UTC, Xia Zhao
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker HWKMETRICS-458 0 None None None Never
Red Hat Product Errata RHBA-2016:2015 0 normal SHIPPED_LIVE OpenShift Container Platform 3.3 metrics-hawkular-metrics image bug fix 2016-10-05 19:55:00 UTC

Comment 1 Xia Zhao 2016-09-19 09:39:03 UTC
Created attachment 1202409 [details]
hawkular_metrics_log

Comment 2 Xia Zhao 2016-09-19 09:39:31 UTC
Created attachment 1202410 [details]
events

Comment 4 Xia Zhao 2016-09-19 10:07:15 UTC
@ Matt, Hmm... The behavior is really weird, I've reproduced the issue on another env where all metrics pods are deployed on same node, so I give up the decision that this occur on seperate nodes.
So far , the only thing I confirm is:  this only occur with images on registry.ops.openshift.com

Comment 15 Peng Li 2016-09-28 06:48:48 UTC
@mwringe @tdawson we hit a similar issue on AWS today when try to deploy metrics 3.3.0, could you help to build the images and sync to registry.ops.openshift.com/openshift3

Comment 16 Matt Wringe 2016-09-28 15:51:50 UTC
@penli Yep, but it will most likely take a few days for it to be available.

Comment 17 Peng Li 2016-09-29 06:16:35 UTC
@mwringe thanks.

Comment 19 John Sanda 2016-10-04 20:37:13 UTC
I think you are running into HWKMETRICS-458. It is a schema installation/upgrade issue which can occur if hawkular-metrics is shutdown before the schema updates are finished being applied. When hawkular-metrics starts back up, it resumes schema updates but incorrectly tries to apply them to the system keyspace. The work around for now is to shutdown both Cassandra and hawkular-metrics, purge Cassandra's data and commit log directories, and then restart them. In order to avoid this error for now you will have to let hawkular-metrics fully initialize before shutting it down; otherwise, you will run into this again.

Comment 22 errata-xmlrpc 2016-10-06 09:36:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:2015


Note You need to log in before you can comment on or make changes to this bug.