Bug 1584552
| Summary: | All host(s) tried for query failed (com.datastax.driver.core.exceptions.ConnectionException: Write attempt on defunct connection) | ||||||
|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Min Woo Park <mpark> | ||||
| Component: | Hawkular | Assignee: | John Sanda <jsanda> | ||||
| Status: | CLOSED DEFERRED | QA Contact: | Junqi Zhao <juzhao> | ||||
| Severity: | unspecified | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 3.4.1 | CC: | aos-bugs, jsanda, mpark, rvargasp | ||||
| Target Milestone: | --- | ||||||
| Target Release: | 3.4.z | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2018-08-29 14:42:34 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
When the charts are blank or when there are gaps in them it is usually because data points are missing over the time range, missing in the sense that they were never persisted. In the latest log there a lot of request timeout triggered by the compression job which runs in the background every two hours. This job reads raw, uncompressed data out of Cassandra, compresses it, and writes it back to a different table. I would like the customer to disable the job to see if we can get things more stable. The job can be disabled by adding the following environment variable to the hawkular-metrics RC: COMPRESSION_JOB_ENABLED=false Then scale hawkular-metrics down and back up to pick up the changes. Please collect all logs, including hawkular-metrics, cassandra, and heapster after hawkular-metrics has been running for at least 6 hours. Thank you. |
Created attachment 1446142 [details] hawkular-cassandra log, hawkular-cassandra tablestats, hawkular-cassandra information Description of problem: Getting the below errors in hawkular-metrics pod log. ~~~~~ ERROR [org.hawkular.metrics.api.jaxrs.util.ApiUtils] (RxComputationScheduler-11) HAWKMETRICS200010: Failed to process request: java.util.concurrent.ExecutionException: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: hawkular-cassandra/172.xx.xx.xx:9042 (com.datastax.driver.core.exceptions.ConnectionException: [hawkular-cassandra/172.xx.xx.xx] Write attempt on defunct connection)) at com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299) at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286) at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) at rx.observable.ListenableFutureObservable$2$1.run(ListenableFutureObservable.java:78) at rx.observable.ListenableFutureObservable$1$1.call(ListenableFutureObservable.java:50) at rx.internal.schedulers.EventLoopsScheduler$EventLoopWorker$1.call(EventLoopsScheduler.java:172) at rx.internal.schedulers.ScheduledAction.run(ScheduledAction.java:55) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: hawkular-cassandra/172.xx.xx.xx:9042 (com.datastax.driver.core.exceptions.ConnectionException: [hawkular-cassandra/172.xx.xx.xx] Write attempt on defunct connection)) at com.datastax.driver.core.RequestHandler.reportNoMoreHosts(RequestHandler.java:211) at com.datastax.driver.core.RequestHandler.access$1000(RequestHandler.java:43) at com.datastax.driver.core.RequestHandler$SpeculativeExecution.sendRequest(RequestHandler.java:277) at com.datastax.driver.core.RequestHandler.startNewExecution(RequestHandler.java:115) at com.datastax.driver.core.RequestHandler.sendRequest(RequestHandler.java:91) at com.datastax.driver.core.SessionManager.executeAsync(SessionManager.java:132) at org.hawkular.rx.cassandra.driver.RxSessionImpl.execute(RxSessionImpl.java:103) at org.hawkular.metrics.core.service.DataAccessImpl.addTags(DataAccessImpl.java:582) at org.hawkular.metrics.core.service.MetricsServiceImpl.addTags(MetricsServiceImpl.java:581) at sun.reflect.GeneratedMethodAccessor78.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.jboss.weld.bean.proxy.AbstractBeanInstance.invoke(AbstractBeanInstance.java:38) at org.jboss.weld.bean.proxy.ProxyMethodHandler.invoke(ProxyMethodHandler.java:100) at org.hawkular.metrics.core.service.MetricsService$428231947$Proxy$_$$_WeldClientProxy.addTags(Unknown Source) at org.hawkular.metrics.api.jaxrs.handler.GaugeHandler.updateMetricTags(GaugeHandler.java:227) at org.hawkular.metrics.api.jaxrs.handler.GaugeHandler$Proxy$_$$_WeldClientProxy.updateMetricTags(Unknown Source) at sun.reflect.GeneratedMethodAccessor79.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.jboss.resteasy.core.MethodInjectorImpl.invoke(MethodInjectorImpl.java:139) at org.jboss.resteasy.core.ResourceMethodInvoker.invokeOnTarget(ResourceMethodInvoker.java:295) at org.jboss.resteasy.core.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:249) at org.jboss.resteasy.core.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:236) at org.jboss.resteasy.core.SynchronousDispatcher.invoke(SynchronousDispatcher.java:402) at org.jboss.resteasy.core.SynchronousDispatcher.invoke(SynchronousDispatcher.java:209) at org.jboss.resteasy.plugins.server.servlet.ServletContainerDispatcher.service(ServletContainerDispatcher.java:221) ~~~~~ The error was disappeared after restarting hawkular-metrics pod. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Want to find RCA and no error. Additional info: