Bug 1377239

Summary: Cassandra Connection ERROR encountered in Hawkular-metrics pod when IAAS is responding too slow
Product: OpenShift Container Platform Reporter: Xia Zhao <xiazhao>
Component: HawkularAssignee: Matt Wringe <mwringe>
Status: CLOSED ERRATA QA Contact: Peng Li <penli>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.3.0CC: aos-bugs, jsanda, mwringe, penli, tdawson
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
This fixes an issue where if Hawkular Metrics was restarted while it was originally creating its schema in Cassandra, then it could not properly connect to Cassandra on subsequent restarts. This issue could be encountered if Hawkular Metrics was manually stopped or if the the Hawkular Metrics instance was automatically restarted due to a lifecycle script timeout.
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-10-06 09:36:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
hawkular_metrics_log
none
events none

Comment 1 Xia Zhao 2016-09-19 09:39:03 UTC
Created attachment 1202409 [details]
hawkular_metrics_log

Comment 2 Xia Zhao 2016-09-19 09:39:31 UTC
Created attachment 1202410 [details]
events

Comment 4 Xia Zhao 2016-09-19 10:07:15 UTC
@ Matt, Hmm... The behavior is really weird, I've reproduced the issue on another env where all metrics pods are deployed on same node, so I give up the decision that this occur on seperate nodes.
So far , the only thing I confirm is:  this only occur with images on registry.ops.openshift.com

Comment 15 Peng Li 2016-09-28 06:48:48 UTC
@mwringe @tdawson we hit a similar issue on AWS today when try to deploy metrics 3.3.0, could you help to build the images and sync to registry.ops.openshift.com/openshift3

Comment 16 Matt Wringe 2016-09-28 15:51:50 UTC
@penli Yep, but it will most likely take a few days for it to be available.

Comment 17 Peng Li 2016-09-29 06:16:35 UTC
@mwringe thanks.

Comment 19 John Sanda 2016-10-04 20:37:13 UTC
I think you are running into HWKMETRICS-458. It is a schema installation/upgrade issue which can occur if hawkular-metrics is shutdown before the schema updates are finished being applied. When hawkular-metrics starts back up, it resumes schema updates but incorrectly tries to apply them to the system keyspace. The work around for now is to shutdown both Cassandra and hawkular-metrics, purge Cassandra's data and commit log directories, and then restart them. In order to avoid this error for now you will have to let hawkular-metrics fully initialize before shutting it down; otherwise, you will run into this again.

Comment 22 errata-xmlrpc 2016-10-06 09:36:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:2015