Bug 1306678 - handle Cluster Metrics updates
handle Cluster Metrics updates
Status: CLOSED ERRATA
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer (Show other bugs)
3.1.0
Unspecified Unspecified
medium Severity high
: ---
: ---
Assigned To: Jeff Cantrill
Peng Li
: NeedsTestCase
Depends On: 1420229
Blocks: 1267746
  Show dependency treegraph
 
Reported: 2016-02-11 09:58 EST by Evgheni Dereveanchin
Modified: 2017-07-24 10 EDT (History)
11 users (show)

See Also:
Fixed In Version: openshift-ansible-3.0.40-1.git.1.4385281.el7aos
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-04-12 14:47:12 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Evgheni Dereveanchin 2016-02-11 09:58:22 EST
Description of problem:
Currently cluster metrics image version is hard-coded in the template, so during deployment a specific version (currently 3.1.0, see bz#1306665) is created and stays at this version since there's no related imageStream/deploymentConfig/etc

Version-Release number of selected component (if applicable):
OpenShift Enterprise 3.1.1

How reproducible:
always

Steps to Reproduce:
1. Install OpenShift 3.1.0
2. Install Cluster Metrics following the official documentation:
https://access.redhat.com/documentation/en/openshift-enterprise/3.1/installation-and-configuration/chapter-18-enabling-cluster-metrics#creating-the-deployer-template
3. version 3.1.0 is deployed
4. Upgrade to 3.1.1

Actual results:
Metrics stay at version 3.1.0

Expected results:
Metrics updated just like Registry and Router

Additional info:
Documentation suggests using image version "latest" as a workaround (and that's default in Origin) yet this may lead to inconsistent results due to node restarts, etc and different pods from the Metrics deployment running various versions at the same time. So we need to handle this systematically.
Comment 4 Brenton Leanhardt 2016-02-11 15:37:12 EST
For now the plan is to have the default for all logging and metrics deployments be the 3.1.1 images.  We've verified it works in development and we'll have QE make sure there aren't any regressions for 3.1.0 environments.

In the medium-term we've added a task to https://trello.com/c/pQ2cmWhG/123-8-openshift-ansible-playbook-for-logging-metrics-installation to solve this in a cleaner way in ansible.  That's one of our highest priority backlog items.
Comment 6 Brenton Leanhardt 2016-02-11 15:49:01 EST
Actually, I was referring to installing 3.1.1 and getting the 3.1.0 logging/metrics images.  I'm going to move this back to assigned and market it upcoming release since it's technically the same thing as the card I mentioned.
Comment 12 Jeff Cantrill 2017-02-01 14:40:43 EST
Assuming 3.4 updates the template based on: 
https://github.com/openshift/openshift-ansible/blob/release-1.4/roles/openshift_hosted_templates/files/v1.4/enterprise/metrics-deployer.yaml#L108

and/or 

https://github.com/openshift/openshift-ansible/blob/openshift-ansible-3.4.58-1/roles/openshift_hosted_templates/files/v1.4/enterprise/metrics-deployer.yaml

I would expect metrics to be upgraded to the correct version.  Additionally, in 3.5, using the openshift_metrics ansible role will allow you to pass the right value from your host inventory file.
Comment 13 Peng Li 2017-02-20 04:21:51 EST
@jcantril, I test below scenario, not sure it's sufficient to verify this bug.

1. deploy a previous version(3.4.1) Metrics using deployer

2. use ansible to deploy 3.5.0 Metrics.

[oo_first_master]
$MASTER  ansible_user=root ansible_ssh_user=root ansible_ssh_private_key_file="~/.ssh/libra.pem" openshift_public_hostname=MASTER

[oo_first_master:vars]
deployment_type=openshift-enterprise
openshift_release=v3.5.0

openshift_metrics_install_metrics=true

openshift_metrics_hawkular_hostname=hawkular-metrics.$SUBDOMAIN
openshift_metrics_project=openshift-infra

openshift_metrics_image_prefix=registry.ops.openshift.com/openshift3/
openshift_metrics_image_version=3.5.0

openshift_metrics_cassandra_storage_type=pv
openshift_metrics_cassandra_pvc_size=10Gi

3. check the pods are updated to 3.5.0, pvc is there, and previous metrics data(metrics gathered by 3.4.1 Metrics) is there.

# oc get pod
NAME                         READY     STATUS      RESTARTS   AGE
hawkular-cassandra-1-tnk3t   1/1       Running     0          1m
hawkular-metrics-g5svn       1/1       Running     0          1m
heapster-m2nng               1/1       Running     0          1m
metrics-deployer-djdbc       0/1       Completed   0          13m

# oc get pvc
NAME                  STATUS    VOLUME    CAPACITY   ACCESSMODES   AGE
metrics-cassandra-1   Bound     pv1       10Gi       RWO           13m
Comment 14 Jeff Cantrill 2017-02-20 15:18:10 EST
@Wei this seems a reasonable test to me.
Comment 15 Peng Li 2017-02-20 19:05:28 EST
set to verified base on comment #12 and comment #14
Comment 17 errata-xmlrpc 2017-04-12 14:47:12 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:0903

Note You need to log in before you can comment on or make changes to this bug.