Description of problem: Deployed metrics 3.6, hawkular-metrics would restart four times to reach Running status images metrics-cassandra-v3.6.173.0.130-1 metrics-hawkular-metrics-v3.6.173.0.130-3 metrics-heapster-v3.6.173.0.130-1 # oc -n openshift-infra get pod NAME READY STATUS RESTARTS AGE hawkular-cassandra-1-bfbzj 1/1 Running 0 16m hawkular-metrics-zcg7z 1/1 Running 4 16m heapster-s6bb3 1/1 Running 1 16m error in hawkular-metrics pod, failed to connect to hawkular-cassandra *************************************************************** 2018-09-21 02:10:12,192 FATAL [org.hawkular.metrics.api.jaxrs.MetricsServiceLifecycle] (metricsservice-lifecycle-thread) HAWKMETRICS200006: An error occurred trying to connect to the Cassandra cluster: java.lang.RuntimeException: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: hawkular-cassandra/172.30.236.36:9042 (com.datastax.driver.core.exceptions.OperationTimedOutException: [hawkular-cassandra/172.30.236.36:9042] Timed out waiting for server response)) at org.hawkular.metrics.api.jaxrs.DistributedLock.lockAndThen(DistributedLock.java:111) at org.hawkular.metrics.api.jaxrs.MetricsServiceLifecycle.initSchema(MetricsServiceLifecycle.java:648) at org.hawkular.metrics.api.jaxrs.MetricsServiceLifecycle.startMetricsService(MetricsServiceLifecycle.java:405) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: hawkular-cassandra/172.30.236.36:9042 (com.datastax.driver.core.exceptions.OperationTimedOutException: [hawkular-cassandra/172.30.236.36:9042] Timed out waiting for server response)) at com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:84) at com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:37) at com.datastax.driver.core.DriverThrowables.propagateCause(DriverThrowables.java:37) at com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:245) at com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:68) at com.datastax.driver.core.Session$execute$0.call(Unknown Source) at org.cassalog.core.CassalogImpl.executeCQL(CassalogImpl.groovy:351) at sun.reflect.GeneratedMethodAccessor67.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:93) at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325) at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:384) at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1021) at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.callCurrent(PogoMetaClassSite.java:69) at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:166) at org.cassalog.core.CassalogImpl$_applyChangeSet_closure16.doCall(CassalogImpl.groovy:323) at sun.reflect.GeneratedMethodAccessor69.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:93) at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325) at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:294) at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1021) at groovy.lang.Closure.call(Closure.java:426) at groovy.lang.Closure.call(Closure.java:442) at org.codehaus.groovy.runtime.DefaultGroovyMethods.each(DefaultGroovyMethods.java:2030) at org.codehaus.groovy.runtime.DefaultGroovyMethods.each(DefaultGroovyMethods.java:2015) at org.codehaus.groovy.runtime.DefaultGroovyMethods.each(DefaultGroovyMethods.java:2056) at org.codehaus.groovy.runtime.dgm$162.invoke(Unknown Source) at org.codehaus.groovy.runtime.callsite.PojoMetaMethodSite$PojoMetaMethodSiteNoUnwrapNoCoerce.invoke(PojoMetaMethodSite.java:274) at org.codehaus.groovy.runtime.callsite.PojoMetaMethodSite.call(PojoMetaMethodSite.java:56) at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:125) at org.cassalog.core.CassalogImpl.applyChangeSet(CassalogImpl.groovy:323) at sun.reflect.GeneratedMethodAccessor68.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:93) at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325) at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:384) at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1021) at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.callCurrent(PogoMetaClassSite.java:69) at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:182) at org.cassalog.core.CassalogImpl$_execute_closure3.doCall(CassalogImpl.groovy:135) at sun.reflect.GeneratedMethodAccessor62.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:93) at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325) at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:294) at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1021) at groovy.lang.Closure.call(Closure.java:426) at org.codehaus.groovy.runtime.DefaultGroovyMethods.eachWithIndex(DefaultGroovyMethods.java:1946) at org.codehaus.groovy.runtime.DefaultGroovyMethods.eachWithIndex(DefaultGroovyMethods.java:1926) at org.codehaus.groovy.runtime.DefaultGroovyMethods.eachWithIndex(DefaultGroovyMethods.java:1976) at org.codehaus.groovy.runtime.dgm$174.invoke(Unknown Source) at org.codehaus.groovy.runtime.callsite.PojoMetaMethodSite$PojoMetaMethodSiteNoUnwrapNoCoerce.invoke(PojoMetaMethodSite.java:274) at org.codehaus.groovy.runtime.callsite.PojoMetaMethodSite.call(PojoMetaMethodSite.java:56) at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:125) at org.cassalog.core.CassalogImpl.execute(CassalogImpl.groovy:109) at org.hawkular.metrics.schema.SchemaService.run(SchemaService.java:67) at org.hawkular.metrics.api.jaxrs.MetricsServiceLifecycle.lambda$initSchema$2(MetricsServiceLifecycle.java:650) at org.hawkular.metrics.api.jaxrs.DistributedLock.lockAndThen(DistributedLock.java:109) ... 9 more *************************************************************** How reproducible: Steps to Reproduce: 1. Deploy metrics 3.6 2. 3. Actual results: Expected results: Additional info:
Created attachment 1485376 [details] metrics logs - metrics-hawkular-metrics:v3.6.173.0.130-3
Could you please also attach Cassandra logs?
Can you also provide details on what kind of disk is being used for Cassandra's persistent volume?
Are you still seeing this issue? If not, can I close this BZ?
OK, close it now, issue is not reproduced every time
It seems it's related to openstack dynamic pv, when use it, hawkular-metrics pod would restart a few times to become ready. Error is the same with Comment 0 # oc -n openshift-infra get pod NAME READY STATUS RESTARTS AGE hawkular-cassandra-1-pls3b 1/1 Running 0 21m hawkular-metrics-w4chw 1/1 Running 4 21m heapster-8kp5f 1/1 Running 1 21m parameters: openshift_metrics_install_metrics=true openshift_metrics_image_prefix=brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/ openshift_metrics_image_version=v3.6.173.0.140 openshift_metrics_cassandra_storage_type=dynamic Attached metrics logs
Created attachment 1508689 [details] metrics logs with openstack dynamic pv for cassandra no such issue if use aws dynamic pv
OCP 3.6-3.10 is no longer on full support [1]. Marking CLOSED DEFERRED. If you have a customer case with a support exception or have reproduced on 3.11+, please reopen and include those details. When reopening, please set the Target Release to the appropriate version where needed. [1]: https://access.redhat.com/support/policy/updates/openshift