Bug 1560695

Summary: Implement standalone schema installer
Product: OpenShift Container Platform Reporter: John Sanda <jsanda>
Component: HawkularAssignee: John Sanda <jsanda>
Status: CLOSED ERRATA QA Contact: Junqi Zhao <juzhao>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 3.10.0CC: aos-bugs, cstark, dzhukous, jsanda
Target Milestone: ---   
Target Release: 3.10.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1590449 1590451 1592966 (view as bug list) Environment:
Last Closed: 2018-07-30 19:11:31 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1469423, 1540413, 1590449, 1590451, 1592966    

Description John Sanda 2018-03-26 19:08:30 UTC
Description of problem:
At start up hawkular-metrics applies schema updates to Cassandra if necessary. Schema updates should generally be done serially in Cassandra so as to avoid inconsistencies between Cassandra nodes. In theory concurrent schema updates to a Cassandra cluster should not be a problem. In reality, they often are a source of problems. 

If the replica count for hawkular-metrics is greater than one, there is a possibility of concurrent schema updates. We use an infinispan cache at start up in hawkular-metrics for coordination with schema updates. On the one hand, this seem like overkill to introduce infinispan just for this one small use case. At the time it seemed like a reasonable approach because it could be used in environments other than OpenShift.

As it turns out now, OpenShift is the only environment we need to worry about for hawkular-metrics. The Infinispan integration has been a source of some problems (see bug 1469423). Most importantly, I do not think it has prevented concurrent schema updates.

To properly address (or prevent) the issue of concurrent schema updates and the problems with infinispan/jgroups, we will move schema updates out of the hawkular-metrics server and into a separate standalone installer that will run as a kubernetes job. Running schema updates in a kubernetes job ensures we do not have to worry about concurrent updates; therefore, there is no longer a need to use infinispan and jgroups.

There is no guarantee about start up order of pods; so, hawkular-metrics will poll cassandra for a property that is to be set by the installer. The installer will set the property only after all schema updates are done, at which point hawkular-metrics can proceed with its start up.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 4 Junqi Zhao 2018-05-17 03:42:41 UTC
Tested with default/non-default value for openshift_metrics_hawkular_replicas and openshift_metrics_cassandra_replicas, all metrics pods were running well, and sanity testing was passed.

openshift-ansible vesion
openshift-ansible-3.10.0-0.47.0.git.0.c018c8f.el7.noarch

Images:
metrics-heapster/images/v3.10.0-0.47.0.0
metrics-hawkular-metrics/images/v3.10.0-0.47.0.0
metrics-cassandra/images/v3.10.0-0.47.0.0
metrics-schema-installer/images/v3.10.0-0.47.0.0

Comment 10 errata-xmlrpc 2018-07-30 19:11:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:1816

Comment 11 Red Hat Bugzilla 2023-09-15 00:07:08 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days