Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1584552

Summary: All host(s) tried for query failed (com.datastax.driver.core.exceptions.ConnectionException: Write attempt on defunct connection)
Product: OpenShift Container Platform Reporter: Min Woo Park <mpark>
Component: HawkularAssignee: John Sanda <jsanda>
Status: CLOSED DEFERRED QA Contact: Junqi Zhao <juzhao>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 3.4.1CC: aos-bugs, jsanda, mpark, rvargasp
Target Milestone: ---   
Target Release: 3.4.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-08-29 14:42:34 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
hawkular-cassandra log, hawkular-cassandra tablestats, hawkular-cassandra information none

Description Min Woo Park 2018-05-31 07:59:28 UTC
Created attachment 1446142 [details]
hawkular-cassandra log, hawkular-cassandra tablestats, hawkular-cassandra information

Description of problem:

Getting the below errors in hawkular-metrics pod log.

~~~~~
ERROR [org.hawkular.metrics.api.jaxrs.util.ApiUtils] (RxComputationScheduler-11) HAWKMETRICS200010: Failed to process request: java.util.concurrent.ExecutionException: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: hawkular-cassandra/172.xx.xx.xx:9042 (com.datastax.driver.core.exceptions.ConnectionException: [hawkular-cassandra/172.xx.xx.xx] Write attempt on defunct connection))
	at com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299)
	at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286)
	at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
	at rx.observable.ListenableFutureObservable$2$1.run(ListenableFutureObservable.java:78)
	at rx.observable.ListenableFutureObservable$1$1.call(ListenableFutureObservable.java:50)
	at rx.internal.schedulers.EventLoopsScheduler$EventLoopWorker$1.call(EventLoopsScheduler.java:172)
	at rx.internal.schedulers.ScheduledAction.run(ScheduledAction.java:55)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: hawkular-cassandra/172.xx.xx.xx:9042 (com.datastax.driver.core.exceptions.ConnectionException: [hawkular-cassandra/172.xx.xx.xx] Write attempt on defunct connection))
	at com.datastax.driver.core.RequestHandler.reportNoMoreHosts(RequestHandler.java:211)
	at com.datastax.driver.core.RequestHandler.access$1000(RequestHandler.java:43)
	at com.datastax.driver.core.RequestHandler$SpeculativeExecution.sendRequest(RequestHandler.java:277)
	at com.datastax.driver.core.RequestHandler.startNewExecution(RequestHandler.java:115)
	at com.datastax.driver.core.RequestHandler.sendRequest(RequestHandler.java:91)
	at com.datastax.driver.core.SessionManager.executeAsync(SessionManager.java:132)
	at org.hawkular.rx.cassandra.driver.RxSessionImpl.execute(RxSessionImpl.java:103)
	at org.hawkular.metrics.core.service.DataAccessImpl.addTags(DataAccessImpl.java:582)
	at org.hawkular.metrics.core.service.MetricsServiceImpl.addTags(MetricsServiceImpl.java:581)
	at sun.reflect.GeneratedMethodAccessor78.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.jboss.weld.bean.proxy.AbstractBeanInstance.invoke(AbstractBeanInstance.java:38)
	at org.jboss.weld.bean.proxy.ProxyMethodHandler.invoke(ProxyMethodHandler.java:100)
	at org.hawkular.metrics.core.service.MetricsService$428231947$Proxy$_$$_WeldClientProxy.addTags(Unknown Source)
	at org.hawkular.metrics.api.jaxrs.handler.GaugeHandler.updateMetricTags(GaugeHandler.java:227)
	at org.hawkular.metrics.api.jaxrs.handler.GaugeHandler$Proxy$_$$_WeldClientProxy.updateMetricTags(Unknown Source)
	at sun.reflect.GeneratedMethodAccessor79.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.jboss.resteasy.core.MethodInjectorImpl.invoke(MethodInjectorImpl.java:139)
	at org.jboss.resteasy.core.ResourceMethodInvoker.invokeOnTarget(ResourceMethodInvoker.java:295)
	at org.jboss.resteasy.core.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:249)
	at org.jboss.resteasy.core.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:236)
	at org.jboss.resteasy.core.SynchronousDispatcher.invoke(SynchronousDispatcher.java:402)
	at org.jboss.resteasy.core.SynchronousDispatcher.invoke(SynchronousDispatcher.java:209)
	at org.jboss.resteasy.plugins.server.servlet.ServletContainerDispatcher.service(ServletContainerDispatcher.java:221)
~~~~~

The error was disappeared after restarting hawkular-metrics pod.


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:
Want to find RCA and no error.

Additional info:

Comment 32 John Sanda 2018-07-06 11:00:29 UTC
When the charts are blank or when there are gaps in them it is usually because data points are missing over the time range, missing in the sense that they were never persisted. In the latest log there a lot of request timeout triggered by the compression job which runs in the background every two hours. This job reads raw, uncompressed data out of Cassandra, compresses it, and writes it back to a different table. I would like the customer to disable the job to see if we can get things more stable. The job can be disabled by adding the following environment variable to the hawkular-metrics RC:

COMPRESSION_JOB_ENABLED=false

Then scale hawkular-metrics down and back up to pick up the changes. Please collect all logs, including hawkular-metrics, cassandra, and heapster after hawkular-metrics has been running for at least 6 hours. Thank you.