Bug 1626908 - Metrics install leads to non-working metrics stack due to version check failure.
Summary: Metrics install leads to non-working metrics stack due to version check failure.
Keywords:
Status: CLOSED DUPLICATE of bug 1632870
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Hawkular
Version: 3.10.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 3.11.z
Assignee: Ruben Vargas Palma
QA Contact: Junqi Zhao
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-09-10 05:29 UTC by ggore
Modified: 2022-03-13 15:31 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-04-26 01:15:56 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description ggore 2018-09-10 05:29:10 UTC
Description of problem:
Metrics install leads to non-working metrics stack with hawkular-cassandra pod showing below logs:

# oc logs hawkular-cassandra-1-xxxxx |grep "hawkular_metrics"
2018-09-06 06:28:36,583 INFO  [org.hawkular.metrics.api.jaxrs.util.SchemaVersionChecker] (metricsservice-lifecycle-thread) Version check failed: Keyspace hawkular_metrics does not exist
2018-09-06 06:28:36,583 INFO  [org.hawkular.metrics.api.jaxrs.util.SchemaVersionChecker] (metricsservice-lifecycle-thread) Trying again in 10000 ms
2018-09-06 06:28:46,597 INFO  [org.hawkular.metrics.api.jaxrs.util.SchemaVersionChecker] (metricsservice-lifecycle-thread) Version check failed: Keyspace hawkular_metrics does not exist
2018-09-06 06:28:46,597 INFO  [org.hawkular.metrics.api.jaxrs.util.SchemaVersionChecker] (metricsservice-lifecycle-thread) Trying again in 10000 ms
2018-09-06 06:28:56,599 INFO  [org.hawkular.metrics.api.jaxrs.util.SchemaVersionChecker] (metricsservice-lifecycle-thread) Version check failed: Keyspace hawkular_metrics does not exist
2018-09-06 06:28:56,599 INFO  [org.hawkular.metrics.api.jaxrs.util.SchemaVersionChecker] (metricsservice-lifecycle-thread) Trying again in 10000 ms


Additional info:
openshift-ansible ==> v3.10.41

# oc version
oc v3.10.34
kubernetes v1.10.0+b81c8f8
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://masters.lab.example.com:8443
openshift v3.10.34
kubernetes v1.10.0+b81c8f8:

Comment 1 Junqi Zhao 2018-09-10 07:04:36 UTC
image: 
registry.access.redhat.com/openshift3/metrics-cassandra:v3.10
registry.access.redhat.com/openshift3/metrics-hawkular-metrics:v3.10 registry.access.redhat.com/openshift3/metrics-heapster:v3.10


openshift3/metrics-cassandra:v3.10.14-12
openshift3/metrics-hawkular-metrics:v3.10.14-12
openshift3/metrics-heapster:v3.10.14-13

Comment 2 John Sanda 2018-09-10 19:41:08 UTC
This looks very similar to bug 1625417. See https://access.redhat.com/solutions/3606401 for a work around.

Comment 5 John Sanda 2018-09-12 13:24:27 UTC
The problem appears to be that the schema installer job (see https://goo.gl/Ry6vPr) failed or did not run. Prior to 3.10, Hawkular Metrics at start up would install/update schema in Cassandra. That has changed in 3.10 however. The hawkular-metrics pod no longer applies any schema changes. The schema installer k8s job installs/updates schema. The hawkular-metrics pod polls cassandra, waiting for the schema to be updated (if necessary).

You can check to see if there is a pod for the schema installer job. I suspect that it was never deployed. We will need to investigate more to figure out what caused the regression.

Comment 6 John Sanda 2018-09-12 13:28:47 UTC
Do you have the output from running the playbook? If not, can you run the playbook again and share the output?

Comment 15 Ruben Vargas Palma 2018-10-15 16:50:53 UTC
A new playbook for run the schema job on demand was introduced to solve this BZ.

The PR for 3.11 was already merged:

 https://github.com/openshift/openshift-ansible/pull/10340

The PR for 3.10 is still in review:

 https://github.com/openshift/openshift-ansible/pull/10340

You will be able to run the schema installer job running the following playbook.

ansible-playbook ./openshift-ansible/playbooks/openshift-metrics/schema.yml -i <inventory_file>

Comment 16 Ruben Vargas Palma 2018-10-31 15:53:10 UTC
Both PRs are already merged, those changes allow you to re-run the schema installer. 

Could we close this BZ? or there is still something pending here?

Comment 23 Vijay Samanthapuri 2021-04-21 16:36:56 UTC
Hello,


Customer is using Persistent volume for his cassandra pod but he is still getting the error and following workaround mentioned in below KCS every time after pod patch update.

https://access.redhat.com/solutions/3645682

customer is currently using openshift-ansible version 3.11.374-1. I see that this issue has been fixed in version openshift-ansible-3.11.23-1.

Thanks,
Vijay

Comment 24 Vijay Samanthapuri 2021-04-26 01:15:56 UTC
Customer has confirmed that issue is with his PV. Hence closing the bug

*** This bug has been marked as a duplicate of bug 1632870 ***


Note You need to log in before you can comment on or make changes to this bug.