Bug 1386405 - Hawkular-Metrics can excessively log errors.
Summary: Hawkular-Metrics can excessively log errors.
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Hawkular
Version: 3.3.0
Hardware: Unspecified
OS: Unspecified
medium
low
Target Milestone: ---
: ---
Assignee: Matt Wringe
QA Contact: Peng Li
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-10-18 20:45 UTC by Eric Jones
Modified: 2019-12-16 07:09 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-10-05 18:47:19 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Eric Jones 2016-10-18 20:45:16 UTC
Description of problem:
Hawkular-metrics pod is repeating [0] over and over (dozens of times every second) and rather than signaling that the pod is not healthy and needs to restart it just sits there retrying over and over again.

[0]
ERROR [org.hawkular.metrics.api.jaxrs.util.ApiUtils] (RxComputationScheduler-4) HAWKMETRICS200010: Failed to process request: java.util.concurrent.ExecutionException: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (no host was tried)

Additional Information:
Node the pod was deployed to ran out of Docker Storage if that might introduce an additional factor

Comment 1 Matt Wringe 2016-10-18 21:36:25 UTC
We should probably decrease the amount of error messages being generated in this case. I created a JIRA for this here: https://issues.jboss.org/browse/HWKMETRICS-513)

The Hawkular Metrics pod should not be restarted in this situation. The problem is it is not able to connect to the Cassandra instance as as such restarting the Hawkular Metrics pod is not going to resolve that. When the Cassandra connection is valid again, Hawkular Metrics should just be able to reconnect and continue to function.

If this is not the case (eg restarting the Hawkular Metrics pod did fix this issue) then we will need to fix that sitation.

When you say "Node the pod was deployed to ran out of Docker Storage" are you talking about the Cassandra Pod or the Hawkular Metrics pod?

Comment 2 Matt Wringe 2016-10-27 14:23:25 UTC
I am requesting more information about which pod ran out of storage.

The Hawkular Metrics pod not restarting is expected and desired behaviour, it should automatically restart once the Cassandra instance is available again.

The contestant logging is a more pressing concern that needs to be looked into.

Comment 3 Matt Wringe 2016-10-31 20:23:27 UTC
Lowing the priority of this issue as its more of a problem with logging in specific error conditions than with reduced functionality. It is something we need to handle in a better fashion though.

Comment 4 Eric Jones 2016-11-10 16:41:08 UTC
My apologies for the delay. 

The storage I referenced was not the the storage provided to the pod but the docker storage setup for the node that the cassandra and hawkular metrics pods were running on.

Unfortunately the associated case is closed but I will be attaching the logs, provided at the start of the case, shortly.

Comment 6 Matt Wringe 2017-02-09 20:27:40 UTC
Upstream tracking https://issues.jboss.org/browse/HWKMETRICS-513


Note You need to log in before you can comment on or make changes to this bug.