Bug 1610733

Summary: Hawkular metrics pods failing schema check and unable to start
Product: OpenShift Container Platform Reporter: stobin
Component: HawkularAssignee: Ruben Vargas Palma <rvargasp>
Status: CLOSED ERRATA QA Contact: Junqi Zhao <juzhao>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.9.0CC: aos-bugs, jsanda, mbarnes, rvargasp, stobin
Target Milestone: ---Keywords: OpsBlocker
Target Release: 3.9.zFlags: stobin: needinfo-
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-12-13 19:27:05 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1613095    
Bug Blocks:    

Description stobin 2018-08-01 10:49:12 UTC
Description of problem:

After update to 3.9.38 hawkular metrics and heapster pods are unable to start. The hawkular pod fails with this error:

2018-08-01 10:23:01,775 FATAL [org.hawkular.metrics.api.jaxrs.MetricsServiceLifecycle] (metricsservice-lifecycle-thread) The schema version check failed. Start up cannot proceed.: org.hawkular.metrics.api.jaxrs.util.SchemaVersionCheckException: Version check unsuccessful after 30 attempts


Version-Release number of selected component (if applicable): 3.9.38


How reproducible: After every upgrade


Steps to Reproduce:
1. Upgrade cluster from 3.7 to 3.9.38
2. Hawkular metrics pod doesn't start correctly

Actual results: Metrics pods do not start and metrics service is unavailable


Expected results: Metrics pods start and metrcis service is available


Additional info: To workaround the problem the hawkular-metrics image is downgraded to 3.9.33

Comment 1 John Sanda 2018-08-01 14:52:36 UTC
Please provide logs for the hawkular-metrics, schema-installer, and cassandra pods.

Comment 2 John Sanda 2018-08-01 15:16:14 UTC
Actually we don't need logs. https://github.com/openshift/openshift-ansible/pull/8961 is blocking.

Comment 4 Junqi Zhao 2018-10-17 13:01:32 UTC
Issue is fixed with metrics-hawkular-metrics-v3.9.47-1

other images
metrics-cassandra-v3.9.47-1
metrics-heapster-v3.9.47-1

# oc -n openshift-infra get pod
NAME                         READY     STATUS    RESTARTS   AGE
hawkular-cassandra-1-k4wvd   1/1       Running   0          9m
hawkular-metrics-xfz8c       1/1       Running   0          9m
heapster-m6fvg               1/1       Running   0          9m

Comment 7 errata-xmlrpc 2018-12-13 19:27:05 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:3748