Bug 1560695 - Implement standalone schema installer
Summary: Implement standalone schema installer
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Hawkular
Version: 3.10.0
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 3.10.0
Assignee: John Sanda
QA Contact: Junqi Zhao
URL:
Whiteboard:
Depends On:
Blocks: 1469423 1540413 1590449 1590451 1592966
TreeView+ depends on / blocked
 
Reported: 2018-03-26 19:08 UTC by John Sanda
Modified: 2023-09-15 00:07 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1590449 1590451 1592966 (view as bug list)
Environment:
Last Closed: 2018-07-30 19:11:31 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker HWKMETRICS-756 0 Major Resolved Create stand alone schema installer 2020-03-31 12:32:32 UTC
Red Hat Issue Tracker HWKMETRICS-783 0 Major Closed Schema installer should not use git hash for schema version 2020-03-31 12:32:36 UTC
Red Hat Product Errata RHBA-2018:1816 0 None None None 2018-07-30 19:12:11 UTC

Description John Sanda 2018-03-26 19:08:30 UTC
Description of problem:
At start up hawkular-metrics applies schema updates to Cassandra if necessary. Schema updates should generally be done serially in Cassandra so as to avoid inconsistencies between Cassandra nodes. In theory concurrent schema updates to a Cassandra cluster should not be a problem. In reality, they often are a source of problems. 

If the replica count for hawkular-metrics is greater than one, there is a possibility of concurrent schema updates. We use an infinispan cache at start up in hawkular-metrics for coordination with schema updates. On the one hand, this seem like overkill to introduce infinispan just for this one small use case. At the time it seemed like a reasonable approach because it could be used in environments other than OpenShift.

As it turns out now, OpenShift is the only environment we need to worry about for hawkular-metrics. The Infinispan integration has been a source of some problems (see bug 1469423). Most importantly, I do not think it has prevented concurrent schema updates.

To properly address (or prevent) the issue of concurrent schema updates and the problems with infinispan/jgroups, we will move schema updates out of the hawkular-metrics server and into a separate standalone installer that will run as a kubernetes job. Running schema updates in a kubernetes job ensures we do not have to worry about concurrent updates; therefore, there is no longer a need to use infinispan and jgroups.

There is no guarantee about start up order of pods; so, hawkular-metrics will poll cassandra for a property that is to be set by the installer. The installer will set the property only after all schema updates are done, at which point hawkular-metrics can proceed with its start up.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 4 Junqi Zhao 2018-05-17 03:42:41 UTC
Tested with default/non-default value for openshift_metrics_hawkular_replicas and openshift_metrics_cassandra_replicas, all metrics pods were running well, and sanity testing was passed.

openshift-ansible vesion
openshift-ansible-3.10.0-0.47.0.git.0.c018c8f.el7.noarch

Images:
metrics-heapster/images/v3.10.0-0.47.0.0
metrics-hawkular-metrics/images/v3.10.0-0.47.0.0
metrics-cassandra/images/v3.10.0-0.47.0.0
metrics-schema-installer/images/v3.10.0-0.47.0.0

Comment 10 errata-xmlrpc 2018-07-30 19:11:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:1816

Comment 11 Red Hat Bugzilla 2023-09-15 00:07:08 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days


Note You need to log in before you can comment on or make changes to this bug.