Bug 1611941
Summary: | [3.6]hawkular-metrics pod failed to start up due to unsuccessful version check | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Junqi Zhao <juzhao> | ||||||
Component: | Hawkular | Assignee: | Ruben Vargas Palma <rvargasp> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Junqi Zhao <juzhao> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | high | ||||||||
Version: | 3.6.1 | CC: | ahaile, aos-bugs, jsanda, wsun | ||||||
Target Milestone: | --- | Keywords: | Regression, TestBlocker | ||||||
Target Release: | 3.6.z | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | |||||||||
: | 1612648 1612813 1619497 (view as bug list) | Environment: | |||||||
Last Closed: | 2019-01-30 15:12:36 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 1612648, 1612813, 1619497 | ||||||||
Attachments: |
|
https://github.com/openshift/openshift-ansible/pull/8961 will be backported to 3.6 as well. That is holding things at the moment. Changed to ON_QA by errata, change back to MODIFIED again (In reply to John Sanda from comment #1) > https://github.com/openshift/openshift-ansible/pull/8961 will be backported > to 3.6 as well. That is holding things at the moment. I don't think we should backport hawkular_schema_job to 3.6 or 3.7, it will bring us efforts to test metrics, since it is one regression bug, we should not change too much. Tested again with the same images,Keyspace hawkular_metrics does not exist actually. metrics-cassandra-v3.6.173.0.128-2 metrics-hawkular-metrics-v3.6.173.0.128-3 metrics-heapster-v3.6.173.0.128-2 # oc get pod NAME READY STATUS RESTARTS AGE hawkular-cassandra-1-bf8wp 1/1 Running 0 6m hawkular-metrics-kk24s 0/1 Running 0 6m heapster-nkp8h 0/1 Running 0 5m # oc rsh hawkular-cassandra-1-bf8wp sh-4.2$ cqlsh --ssl -e "select table_name from system_schema.tables where keyspace_name = 'hawkular_metrics'" table_name ------------ (0 rows) (In reply to Junqi Zhao from comment #4) > (In reply to John Sanda from comment #1) > > https://github.com/openshift/openshift-ansible/pull/8961 will be backported > > to 3.6 as well. That is holding things at the moment. > > I don't think we should backport hawkular_schema_job to 3.6 or 3.7, it will > bring us efforts to test metrics, since it is one regression bug, we should > not change too much. We decided to backport to address bug 1564681, which is linked to a customer case. At the time of our decision, the customer was on 3.6 and has subsequently upgraded to 3.7. These changes also make hawkular-metrics more stable and should consequently help reduce maintenance. Prior to the schema installer, starting multiple replicas of hawkular-metrics can result in inconsistent state that has sometimes required manual intervention to fix. Tested with, metrics-cassandra-v3.6.173.0.128-10 metrics-hawkular-metrics-v3.6.173.0.128-10 metrics-heapster-v3.6.173.0.128-10 it is still "Version check unsuccessful after 30 attempts" # oc get pod -n openshift-infra NAME READY STATUS RESTARTS AGE hawkular-cassandra-1-gkp68 1/1 Running 0 16m hawkular-metrics-438hl 0/1 Running 2 16m heapster-3fzkd 0/1 Running 1 16m logs in hawkular-metrics pods ****************************************************************** 2018-08-07 05:45:40,958 INFO [org.hawkular.metrics.api.jaxrs.util.SchemaVersionChecker] (metricsservice-lifecycle-thread) Version check failed: Keyspace hawkular_metrics does not exist 2018-08-07 05:45:40,958 INFO [org.hawkular.metrics.api.jaxrs.util.SchemaVersionChecker] (metricsservice-lifecycle-thread) Trying again in 10000 ms 2018-08-07 05:45:50,965 INFO [org.hawkular.metrics.api.jaxrs.util.SchemaVersionChecker] (metricsservice-lifecycle-thread) Version check failed: Keyspace hawkular_metrics does not exist 2018-08-07 05:45:50,966 INFO [org.hawkular.metrics.api.jaxrs.util.SchemaVersionChecker] (metricsservice-lifecycle-thread) Trying again in 10000 ms 2018-08-07 05:46:00,970 INFO [org.hawkular.metrics.api.jaxrs.util.SchemaVersionChecker] (metricsservice-lifecycle-thread) Version check failed: Keyspace hawkular_metrics does not exist 2018-08-07 05:46:00,970 INFO [org.hawkular.metrics.api.jaxrs.util.SchemaVersionChecker] (metricsservice-lifecycle-thread) Trying again in 10000 ms 2018-08-07 05:46:10,975 INFO [org.hawkular.metrics.api.jaxrs.util.SchemaVersionChecker] (metricsservice-lifecycle-thread) Version check failed: Keyspace hawkular_metrics does not exist 2018-08-07 05:46:10,975 INFO [org.hawkular.metrics.api.jaxrs.util.SchemaVersionChecker] (metricsservice-lifecycle-thread) Trying again in 10000 ms 2018-08-07 05:46:20,976 FATAL [org.hawkular.metrics.api.jaxrs.MetricsServiceLifecycle] (metricsservice-lifecycle-thread) The schema version check failed. Start up cannot proceed.: org.hawkular.metrics.api.jaxrs.util.SchemaVersionCheckException: Version check unsuccessful after 30 attempts at org.hawkular.metrics.api.jaxrs.util.SchemaVersionChecker.waitForSchemaUpdates(SchemaVersionChecker.java:72) at org.hawkular.metrics.api.jaxrs.MetricsServiceLifecycle.doSchemaVersionCheck(MetricsServiceLifecycle.java:522) at org.hawkular.metrics.api.jaxrs.MetricsServiceLifecycle.startMetricsService(MetricsServiceLifecycle.java:362) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) *** JBossAS process (462) received TERM signal *** 2018-08-07 05:46:49,525 INFO [org.jboss.as.server] (Thread-3) WFLYSRV0220: Server shutdown has been requested via an OS signal 2018-08-07 05:46:49,650 INFO [org.wildfly.extension.undertow] (ServerService Thread Pool -- 64) WFLYUT0022: Unregistered web context: /hawkular/metrics 2018-08-07 05:46:49,695 INFO [org.wildfly.extension.undertow] (MSC service thread 1-2) WFLYUT0019: Host default-host stopping 2018-08-07 05:46:49,705 INFO [org.jboss.weld.deployer] (MSC service thread 1-6) WFLYWELD0010: Stopping weld service for deployment hawkular-metrics.war 2018-08-07 05:46:49,762 INFO [org.jboss.as.connector.subsystems.datasources] (MSC service thread 1-6) WFLYJCA0010: Unbound data source [java:jboss/datasources/ExampleDS] 2018-08-07 05:46:49,790 INFO [org.jboss.as.server.deployment] (MSC service thread 1-8) WFLYSRV0028: Stopped deployment activemq-rar.rar (runtime-name: activemq-rar.rar) in 251ms 2018-08-07 05:46:49,815 INFO [org.jboss.as.connector.deployers.jdbc] (MSC service thread 1-7) WFLYJCA0019: Stopped Driver service with driver-name = h2 2018-08-07 05:46:49,826 INFO [org.wildfly.extension.undertow] (MSC service thread 1-8) WFLYUT0008: Undertow HTTPS listener https suspending 2018-08-07 05:46:49,830 INFO [org.wildfly.extension.undertow] (MSC service thread 1-4) WFLYUT0008: Undertow HTTP listener default suspending 2018-08-07 05:46:49,831 INFO [org.wildfly.extension.undertow] (MSC service thread 1-4) WFLYUT0007: Undertow HTTP listener default stopped, was bound to 0.0.0.0:8080 2018-08-07 05:46:49,836 INFO [org.wildfly.extension.undertow] (MSC service thread 1-8) WFLYUT0007: Undertow HTTPS listener https stopped, was bound to 0.0.0.0:8443 2018-08-07 05:46:49,837 INFO [org.wildfly.extension.undertow] (MSC service thread 1-7) WFLYUT0004: Undertow 1.3.31.Final-redhat-1 stopping 2018-08-07 05:46:49,904 INFO [org.jboss.as.server.deployment] (MSC service thread 1-1) WFLYSRV0028: Stopped deployment hawkular-metrics.war (runtime-name: hawkular-metrics.war) in 364ms 2018-08-07 05:46:49,911 INFO [org.jboss.as] (MSC service thread 1-7) WFLYSRV0050: JBoss EAP 7.0.8.GA (WildFly Core 2.1.18.Final-redhat-1) stopped in 368ms *** JBossAS process (462) received TERM signal *** ****************************************************************** Keyspace hawkular_metrics does not exist # oc exec hawkular-cassandra-1-gkp68 -n openshift-infra -- cqlsh --ssl -e "select table_name from system_schema.tables where keyspace_name = 'hawkular_metrics'" table_name ------------ (0 rows) tested with metrics-cassandra-v3.6.173.0.128-11 metrics-hawkular-metrics-v3.6.173.0.128-11 metrics-heapster-v3.6.173.0.128-11 Issue is fixed, and metrics works well Please change to ON_QA Issue is not fixed with images: metrics-cassandra-v3.6.173.0.129-2 metrics-hawkular-metrics-v3.6.173.0.129-2 metrics-heapster-v3.6.173.0.129-2 # oc get pod NAME READY STATUS RESTARTS AGE hawkular-cassandra-1-b7mrf 1/1 Running 0 44m hawkular-metrics-c8kd4 0/1 Running 7 44m heapster-k38ch 0/1 Running 5 44m # oc logs hawkular-metrics-c8kd4 2018-08-22 03:29:08,034 INFO [org.hawkular.metrics.api.jaxrs.util.SchemaVersionChecker] (metricsservice-lifecycle-thread) Trying again in 10000 ms 2018-08-22 03:29:18,037 INFO [org.hawkular.metrics.api.jaxrs.util.SchemaVersionChecker] (metricsservice-lifecycle-thread) Version check failed: Keyspace hawkular_metrics does not exist 2018-08-22 03:29:18,037 INFO [org.hawkular.metrics.api.jaxrs.util.SchemaVersionChecker] (metricsservice-lifecycle-thread) Trying again in 10000 ms 2018-08-22 03:29:28,040 INFO [org.hawkular.metrics.api.jaxrs.util.SchemaVersionChecker] (metricsservice-lifecycle-thread) Version check failed: Keyspace hawkular_metrics does not exist 2018-08-22 03:29:28,041 INFO [org.hawkular.metrics.api.jaxrs.util.SchemaVersionChecker] (metricsservice-lifecycle-thread) Trying again in 10000 ms 2018-08-22 03:29:38,041 FATAL [org.hawkular.metrics.api.jaxrs.MetricsServiceLifecycle] (metricsservice-lifecycle-thread) The schema version check failed. Start up cannot proceed.: org.hawkular.metrics.api.jaxrs.util.SchemaVersionCheckException: Version check unsuccessful after 30 attempts at org.hawkular.metrics.api.jaxrs.util.SchemaVersionChecker.waitForSchemaUpdates(SchemaVersionChecker.java:72) at org.hawkular.metrics.api.jaxrs.MetricsServiceLifecycle.doSchemaVersionCheck(MetricsServiceLifecycle.java:522) at org.hawkular.metrics.api.jaxrs.MetricsServiceLifecycle.startMetricsService(MetricsServiceLifecycle.java:362) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 2018-08-22 03:30:02,443 INFO [org.jboss.as.server] (Thread-5) WFLYSRV0220: Server shutdown has been requested via an OS signal *** JBossAS process (463) received TERM signal *** 2018-08-22 03:30:02,487 INFO [org.wildfly.extension.undertow] (ServerService Thread Pool -- 66) WFLYUT0022: Unregistered web context: /hawkular/metrics 2018-08-22 03:30:02,531 INFO [org.wildfly.extension.undertow] (MSC service thread 1-5) WFLYUT0019: Host default-host stopping 2018-08-22 03:30:02,546 INFO [org.jboss.weld.deployer] (MSC service thread 1-4) WFLYWELD0010: Stopping weld service for deployment hawkular-metrics.war 2018-08-22 03:30:02,570 INFO [org.jboss.as.connector.subsystems.datasources] (MSC service thread 1-4) WFLYJCA0010: Unbound data source [java:jboss/datasources/ExampleDS] 2018-08-22 03:30:02,583 INFO [org.jboss.as.connector.deployers.jdbc] (MSC service thread 1-3) WFLYJCA0019: Stopped Driver service with driver-name = h2 2018-08-22 03:30:02,603 INFO [org.jboss.as.server.deployment] (MSC service thread 1-6) WFLYSRV0028: Stopped deployment activemq-rar.rar (runtime-name: activemq-rar.rar) in 151ms 2018-08-22 03:30:02,604 INFO [org.wildfly.extension.undertow] (MSC service thread 1-3) WFLYUT0008: Undertow HTTP listener default suspending 2018-08-22 03:30:02,605 INFO [org.wildfly.extension.undertow] (MSC service thread 1-1) WFLYUT0008: Undertow HTTPS listener https suspending 2018-08-22 03:30:02,605 INFO [org.wildfly.extension.undertow] (MSC service thread 1-3) WFLYUT0007: Undertow HTTP listener default stopped, was bound to 0.0.0.0:8080 2018-08-22 03:30:02,605 INFO [org.wildfly.extension.undertow] (MSC service thread 1-1) WFLYUT0007: Undertow HTTPS listener https stopped, was bound to 0.0.0.0:8443 2018-08-22 03:30:02,606 INFO [org.wildfly.extension.undertow] (MSC service thread 1-6) WFLYUT0004: Undertow 1.3.31.Final-redhat-1 stopping 2018-08-22 03:30:02,642 INFO [org.jboss.as.server.deployment] (MSC service thread 1-5) WFLYSRV0028: Stopped deployment hawkular-metrics.war (runtime-name: hawkular-metrics.war) in 191ms 2018-08-22 03:30:02,645 INFO [org.jboss.as] (MSC service thread 1-1) WFLYSRV0050: JBoss EAP 7.0.8.GA (WildFly Core 2.1.18.Final-redhat-1) stopped in 194ms *** JBossAS process (463) received TERM signal *** Issue is not fixed with images: metrics-cassandra:v3.6.173.0.129-2 metrics-hawkular-metrics:v3.6.173.0.129-2 metrics-heapster:v3.6.173.0.129-2 The latest 3.6 image is v3.6.173.0.129 and the problem still exist because that image uses Hawkular Metrics 0.27.8. The schema installer was introduced in OCP 3.10. We wanted to back port it to 3.6. The changes were introduced upstream in Hawkular Metrics 0.27.8. Because of the ongoing problems we have had with trying to back port, we are going to update the 3.6 image to use Hawkular Metrics 0.27.7. Tested metrics-hawkular-metrics-v3.6.173.0.130-2 with metrics-cassandra-v3.6.173.0.130-1 metrics-heapster-v3.6.173.0.130-1 hawkular-metrics pods failed to start up, error in pod logs in under below We need to add back the infinispan configurtation in standalone.xml in the image as did for 3.7 and 3.9 images ********************************************************************** 2018-09-20 04:26:49,352 ERROR [org.jboss.as.controller.management-operation] (Controller Boot Thread) WFLYCTL0013: Operation ("deploy") failed - address: ([("deployment" => "hawkular-metrics.war")]) - failure description: {"WFLYCTL0180: Services with missing/unavailable dependencies" => [ "jboss.naming.context.java.module.hawkular-metrics.hawkular-metrics.env.cache.locks is missing [jboss.naming.context.java.jboss.infinispan.cache.hawkular-metrics.locks]", "jboss.naming.context.java.module.hawkular-metrics.hawkular-metrics.env.container.hawkular-metrics is missing [jboss.naming.context.java.jboss.infinispan.container.hawkular-metrics]" ]} 2018-09-20 04:26:49,368 INFO [org.jboss.as.server] (ServerService Thread Pool -- 34) WFLYSRV0010: Deployed "hawkular-metrics.war" (runtime-name : "hawkular-metrics.war") 2018-09-20 04:26:49,369 INFO [org.jboss.as.server] (ServerService Thread Pool -- 34) WFLYSRV0010: Deployed "activemq-rar.rar" (runtime-name : "activemq-rar.rar") 2018-09-20 04:26:49,373 INFO [org.jboss.as.controller] (Controller Boot Thread) WFLYCTL0183: Service status report WFLYCTL0184: New missing/unsatisfied dependencies: service jboss.naming.context.java.jboss.infinispan.cache.hawkular-metrics.locks (missing) dependents: [service jboss.naming.context.java.module.hawkular-metrics.hawkular-metrics.env.cache.locks] service jboss.naming.context.java.jboss.infinispan.container.hawkular-metrics (missing) dependents: [service jboss.naming.context.java.module.hawkular-metrics.hawkular-metrics.env.container.hawkular-metrics] 2018-09-20 04:26:49,453 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0060: Http management interface listening on http://127.0.0.1:9990/management 2018-09-20 04:26:49,453 INFO [org.jboss.as] (Controller Boot Thread) WFLYSRV0051: Admin console listening on http://127.0.0.1:9990 2018-09-20 04:26:49,453 ERROR [org.jboss.as] (Controller Boot Thread) WFLYSRV0026: JBoss EAP 7.0.8.GA (WildFly Core 2.1.18.Final-redhat-1) started (with errors) in 5114ms - Started 387 of 706 services (22 services failed or missing dependencies, 394 services are lazy, passive or on-demand) Created attachment 1485008 [details]
metrics logs - metrics-hawkular-metrics:v3.6.173.0.130-2
Issue is fixed with metrics-hawkular-metrics:v3.6.173.0.130-3 other images metrics-cassandra-v3.6.173.0.130-1 metrics-heapster-v3.6.173.0.130-1 new Bug 1631598 is filed, but not a test blocker please change to ON_QA, so we can close it. Per Comment 17, move to VERIFIED. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0101 |
Created attachment 1472883 [details] metrics pods log Description of problem: Deploy metrics 3.6,hawkular-metrics pod failed to start up due to unsuccessful version check, this blocks metrics installation. # oc get po NAME READY STATUS RESTARTS AGE hawkular-cassandra-1-hvhr0 1/1 Running 0 14m hawkular-metrics-2r82k 0/1 Running 4 14m heapster-8996q 0/1 Running 1 13m logs in hawkular-metrics pod 2018-08-03 05:13:24,048 INFO [org.hawkular.metrics.api.jaxrs.util.SchemaVersionChecker] (metricsservice-lifecycle-thread) Version check failed: Keyspace hawkular_metrics does not exist 2018-08-03 05:13:24,048 INFO [org.hawkular.metrics.api.jaxrs.util.SchemaVersionChecker] (metricsservice-lifecycle-thread) Trying again in 10000 ms 2018-08-03 05:13:34,052 INFO [org.hawkular.metrics.api.jaxrs.util.SchemaVersionChecker] (metricsservice-lifecycle-thread) Version check failed: Keyspace hawkular_metrics does not exist 2018-08-03 05:13:34,052 INFO [org.hawkular.metrics.api.jaxrs.util.SchemaVersionChecker] (metricsservice-lifecycle-thread) Trying again in 10000 ms 2018-08-03 05:13:44,053 FATAL [org.hawkular.metrics.api.jaxrs.MetricsServiceLifecycle] (metricsservice-lifecycle-thread) The schema version check failed. Start up cannot proceed.: org.hawkular.metrics.api.jaxrs.util.SchemaVersionCheckException: Version check unsuccessful after 30 attempts at org.hawkular.metrics.api.jaxrs.util.SchemaVersionChecker.waitForSchemaUpdates(SchemaVersionChecker.java:72) at org.hawkular.metrics.api.jaxrs.MetricsServiceLifecycle.doSchemaVersionCheck(MetricsServiceLifecycle.java:522) at org.hawkular.metrics.api.jaxrs.MetricsServiceLifecycle.startMetricsService(MetricsServiceLifecycle.java:362) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Although "Keyspace hawkular_metrics does not exist" shows, actually it exsits # oc rsh hawkular-cassandra-1-hvhr0 sh-4.2$ cqlsh --ssl -e "select table_name from system_schema.tables where keyspace_name = 'hawkular_metrics'" table_name -------------------- active_time_slices cassalog data data_compressed finished_jobs_idx jobs leases locks metrics_idx metrics_tags_idx retentions_idx scheduled_jobs_idx sys_config tasks tenants (15 rows) Version-Release number of selected component (if applicable): metrics-cassandra-v3.6.173.0.128-2 metrics-hawkular-metrics-v3.6.173.0.128-3 metrics-heapster-v3.6.173.0.128-2 # openshift version openshift v3.6.173.0.128 kubernetes v1.6.1+5115d708d7 etcd 3.2.1 How reproducible: Always Steps to Reproduce: 1. Deploy Metrics 3.6, parameters see the [Additional info] part 2. 3. Actual results: hawkular-metrics pod failed to start up due to unsuccessful version check Expected results: Additional info: openshift_metrics_install_metrics=true openshift_metrics_image_prefix=registry.reg-aws.openshift.com:443/openshift3/ openshift_metrics_image_version=v3.6.173.0.128 openshift_metrics_cassandra_storage_type=dynamic