Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1386405

Summary:	Hawkular-Metrics can excessively log errors.
Product:	OpenShift Container Platform	Reporter:	Eric Jones <erjones>
Component:	Hawkular	Assignee:	Matt Wringe <mwringe>
Status:	CLOSED WONTFIX	QA Contact:	Peng Li <penli>
Severity:	low	Docs Contact:
Priority:	medium
Version:	3.3.0	CC:	aos-bugs, erjones
Target Milestone:	---	Keywords:	Unconfirmed
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2017-10-05 18:47:19 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Eric Jones 2016-10-18 20:45:16 UTC

Description of problem:
Hawkular-metrics pod is repeating [0] over and over (dozens of times every second) and rather than signaling that the pod is not healthy and needs to restart it just sits there retrying over and over again.

[0]
ERROR [org.hawkular.metrics.api.jaxrs.util.ApiUtils] (RxComputationScheduler-4) HAWKMETRICS200010: Failed to process request: java.util.concurrent.ExecutionException: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (no host was tried)

Additional Information:
Node the pod was deployed to ran out of Docker Storage if that might introduce an additional factor

Comment 1 Matt Wringe 2016-10-18 21:36:25 UTC

We should probably decrease the amount of error messages being generated in this case. I created a JIRA for this here: https://issues.jboss.org/browse/HWKMETRICS-513)

The Hawkular Metrics pod should not be restarted in this situation. The problem is it is not able to connect to the Cassandra instance as as such restarting the Hawkular Metrics pod is not going to resolve that. When the Cassandra connection is valid again, Hawkular Metrics should just be able to reconnect and continue to function.

If this is not the case (eg restarting the Hawkular Metrics pod did fix this issue) then we will need to fix that sitation.

When you say "Node the pod was deployed to ran out of Docker Storage" are you talking about the Cassandra Pod or the Hawkular Metrics pod?

Comment 2 Matt Wringe 2016-10-27 14:23:25 UTC

I am requesting more information about which pod ran out of storage.

The Hawkular Metrics pod not restarting is expected and desired behaviour, it should automatically restart once the Cassandra instance is available again.

The contestant logging is a more pressing concern that needs to be looked into.

Comment 3 Matt Wringe 2016-10-31 20:23:27 UTC

Lowing the priority of this issue as its more of a problem with logging in specific error conditions than with reduced functionality. It is something we need to handle in a better fashion though.

Comment 4 Eric Jones 2016-11-10 16:41:08 UTC

My apologies for the delay. 

The storage I referenced was not the the storage provided to the pod but the docker storage setup for the node that the cassandra and hawkular metrics pods were running on.

Unfortunately the associated case is closed but I will be attaching the logs, provided at the start of the case, shortly.

Comment 6 Matt Wringe 2017-02-09 20:27:40 UTC

Upstream tracking https://issues.jboss.org/browse/HWKMETRICS-513