Bug 1631598 - [3.6] hawkular-metrics would restart a few times to reach Running status
Summary: [3.6] hawkular-metrics would restart a few times to reach Running status
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Hawkular
Version: 3.6.1
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 3.6.z
Assignee: Ruben Vargas Palma
QA Contact: Junqi Zhao
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-09-21 03:08 UTC by Junqi Zhao
Modified: 2019-11-20 19:10 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-11-20 19:10:07 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
metrics logs - metrics-hawkular-metrics:v3.6.173.0.130-3 (81.47 KB, text/plain)
2018-09-21 03:21 UTC, Junqi Zhao
no flags Details
metrics logs with openstack dynamic pv for cassandra (32.18 KB, application/x-gzip)
2018-11-27 03:14 UTC, Junqi Zhao
no flags Details

Description Junqi Zhao 2018-09-21 03:08:43 UTC
Description of problem:
Deployed metrics 3.6, hawkular-metrics would restart four times to reach Running status

images
metrics-cassandra-v3.6.173.0.130-1
metrics-hawkular-metrics-v3.6.173.0.130-3
metrics-heapster-v3.6.173.0.130-1

# oc -n openshift-infra get pod
NAME                         READY     STATUS    RESTARTS   AGE
hawkular-cassandra-1-bfbzj   1/1       Running   0          16m
hawkular-metrics-zcg7z       1/1       Running   4          16m
heapster-s6bb3               1/1       Running   1          16m

error in hawkular-metrics pod, failed to connect to hawkular-cassandra
***************************************************************
2018-09-21 02:10:12,192 FATAL [org.hawkular.metrics.api.jaxrs.MetricsServiceLifecycle] (metricsservice-lifecycle-thread) HAWKMETRICS200006: An error occurred trying to connect to the Cassandra cluster: java.lang.RuntimeException: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: hawkular-cassandra/172.30.236.36:9042 (com.datastax.driver.core.exceptions.OperationTimedOutException: [hawkular-cassandra/172.30.236.36:9042] Timed out waiting for server response))
	at org.hawkular.metrics.api.jaxrs.DistributedLock.lockAndThen(DistributedLock.java:111)
	at org.hawkular.metrics.api.jaxrs.MetricsServiceLifecycle.initSchema(MetricsServiceLifecycle.java:648)
	at org.hawkular.metrics.api.jaxrs.MetricsServiceLifecycle.startMetricsService(MetricsServiceLifecycle.java:405)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: hawkular-cassandra/172.30.236.36:9042 (com.datastax.driver.core.exceptions.OperationTimedOutException: [hawkular-cassandra/172.30.236.36:9042] Timed out waiting for server response))
	at com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:84)
	at com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:37)
	at com.datastax.driver.core.DriverThrowables.propagateCause(DriverThrowables.java:37)
	at com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:245)
	at com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:68)
	at com.datastax.driver.core.Session$execute$0.call(Unknown Source)
	at org.cassalog.core.CassalogImpl.executeCQL(CassalogImpl.groovy:351)
	at sun.reflect.GeneratedMethodAccessor67.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:93)
	at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325)
	at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:384)
	at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1021)
	at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.callCurrent(PogoMetaClassSite.java:69)
	at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:166)
	at org.cassalog.core.CassalogImpl$_applyChangeSet_closure16.doCall(CassalogImpl.groovy:323)
	at sun.reflect.GeneratedMethodAccessor69.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:93)
	at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325)
	at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:294)
	at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1021)
	at groovy.lang.Closure.call(Closure.java:426)
	at groovy.lang.Closure.call(Closure.java:442)
	at org.codehaus.groovy.runtime.DefaultGroovyMethods.each(DefaultGroovyMethods.java:2030)
	at org.codehaus.groovy.runtime.DefaultGroovyMethods.each(DefaultGroovyMethods.java:2015)
	at org.codehaus.groovy.runtime.DefaultGroovyMethods.each(DefaultGroovyMethods.java:2056)
	at org.codehaus.groovy.runtime.dgm$162.invoke(Unknown Source)
	at org.codehaus.groovy.runtime.callsite.PojoMetaMethodSite$PojoMetaMethodSiteNoUnwrapNoCoerce.invoke(PojoMetaMethodSite.java:274)
	at org.codehaus.groovy.runtime.callsite.PojoMetaMethodSite.call(PojoMetaMethodSite.java:56)
	at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:125)
	at org.cassalog.core.CassalogImpl.applyChangeSet(CassalogImpl.groovy:323)
	at sun.reflect.GeneratedMethodAccessor68.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:93)
	at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325)
	at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:384)
	at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1021)
	at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.callCurrent(PogoMetaClassSite.java:69)
	at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:182)
	at org.cassalog.core.CassalogImpl$_execute_closure3.doCall(CassalogImpl.groovy:135)
	at sun.reflect.GeneratedMethodAccessor62.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:93)
	at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325)
	at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:294)
	at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1021)
	at groovy.lang.Closure.call(Closure.java:426)
	at org.codehaus.groovy.runtime.DefaultGroovyMethods.eachWithIndex(DefaultGroovyMethods.java:1946)
	at org.codehaus.groovy.runtime.DefaultGroovyMethods.eachWithIndex(DefaultGroovyMethods.java:1926)
	at org.codehaus.groovy.runtime.DefaultGroovyMethods.eachWithIndex(DefaultGroovyMethods.java:1976)
	at org.codehaus.groovy.runtime.dgm$174.invoke(Unknown Source)
	at org.codehaus.groovy.runtime.callsite.PojoMetaMethodSite$PojoMetaMethodSiteNoUnwrapNoCoerce.invoke(PojoMetaMethodSite.java:274)
	at org.codehaus.groovy.runtime.callsite.PojoMetaMethodSite.call(PojoMetaMethodSite.java:56)
	at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:125)
	at org.cassalog.core.CassalogImpl.execute(CassalogImpl.groovy:109)
	at org.hawkular.metrics.schema.SchemaService.run(SchemaService.java:67)
	at org.hawkular.metrics.api.jaxrs.MetricsServiceLifecycle.lambda$initSchema$2(MetricsServiceLifecycle.java:650)
	at org.hawkular.metrics.api.jaxrs.DistributedLock.lockAndThen(DistributedLock.java:109)
	... 9 more
***************************************************************

How reproducible:


Steps to Reproduce:
1. Deploy metrics 3.6
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Junqi Zhao 2018-09-21 03:21:12 UTC
Created attachment 1485376 [details]
metrics logs - metrics-hawkular-metrics:v3.6.173.0.130-3

Comment 2 Ruben Vargas Palma 2018-09-21 04:34:25 UTC
Could you please also attach Cassandra logs?

Comment 3 John Sanda 2018-09-24 13:36:21 UTC
Can you also provide details on what kind of disk is being used for Cassandra's persistent volume?

Comment 4 Ruben Vargas Palma 2018-10-10 16:18:50 UTC
Are you still seeing this issue? If not, can I close this BZ?

Comment 5 Junqi Zhao 2018-10-11 06:28:37 UTC
OK, close it now, issue is not reproduced every time

Comment 6 Junqi Zhao 2018-11-27 03:10:17 UTC
It seems it's related to openstack dynamic pv, when use it, hawkular-metrics pod would restart a few times to become ready. Error is the same with Comment 0

# oc -n openshift-infra get pod
NAME                         READY     STATUS    RESTARTS   AGE
hawkular-cassandra-1-pls3b   1/1       Running   0          21m
hawkular-metrics-w4chw       1/1       Running   4          21m
heapster-8kp5f               1/1       Running   1          21m

parameters:
openshift_metrics_install_metrics=true
openshift_metrics_image_prefix=brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/
openshift_metrics_image_version=v3.6.173.0.140
openshift_metrics_cassandra_storage_type=dynamic


Attached metrics logs

Comment 7 Junqi Zhao 2018-11-27 03:14:45 UTC
Created attachment 1508689 [details]
metrics logs with openstack dynamic pv for cassandra

no such issue if use aws dynamic pv

Comment 8 Stephen Cuppett 2019-11-20 19:10:07 UTC
OCP 3.6-3.10 is no longer on full support [1]. Marking CLOSED DEFERRED. If you have a customer case with a support exception or have reproduced on 3.11+, please reopen and include those details. When reopening, please set the Target Release to the appropriate version where needed.

[1]: https://access.redhat.com/support/policy/updates/openshift


Note You need to log in before you can comment on or make changes to this bug.