Bug 1377239
| Summary: | Cassandra Connection ERROR encountered in Hawkular-metrics pod when IAAS is responding too slow | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Xia Zhao <xiazhao> | ||||||
| Component: | Hawkular | Assignee: | Matt Wringe <mwringe> | ||||||
| Status: | CLOSED ERRATA | QA Contact: | Peng Li <penli> | ||||||
| Severity: | medium | Docs Contact: | |||||||
| Priority: | medium | ||||||||
| Version: | 3.3.0 | CC: | aos-bugs, jsanda, mwringe, penli, tdawson | ||||||
| Target Milestone: | --- | ||||||||
| Target Release: | --- | ||||||||
| Hardware: | Unspecified | ||||||||
| OS: | Unspecified | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||
| Doc Text: |
This fixes an issue where if Hawkular Metrics was restarted while it was originally creating its schema in Cassandra, then it could not properly connect to Cassandra on subsequent restarts. This issue could be encountered if Hawkular Metrics was manually stopped or if the the Hawkular Metrics instance was automatically restarted due to a lifecycle script timeout.
|
Story Points: | --- | ||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2016-10-06 09:36:39 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Attachments: |
|
||||||||
Created attachment 1202410 [details]
events
@ Matt, Hmm... The behavior is really weird, I've reproduced the issue on another env where all metrics pods are deployed on same node, so I give up the decision that this occur on seperate nodes. So far , the only thing I confirm is: this only occur with images on registry.ops.openshift.com @mwringe @tdawson we hit a similar issue on AWS today when try to deploy metrics 3.3.0, could you help to build the images and sync to registry.ops.openshift.com/openshift3 @penli Yep, but it will most likely take a few days for it to be available. @mwringe thanks. I think you are running into HWKMETRICS-458. It is a schema installation/upgrade issue which can occur if hawkular-metrics is shutdown before the schema updates are finished being applied. When hawkular-metrics starts back up, it resumes schema updates but incorrectly tries to apply them to the system keyspace. The work around for now is to shutdown both Cassandra and hawkular-metrics, purge Cassandra's data and commit log directories, and then restart them. In order to avoid this error for now you will have to let hawkular-metrics fully initialize before shutting it down; otherwise, you will run into this again. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2016:2015 |
Created attachment 1202409 [details] hawkular_metrics_log