Bug 1306678

Summary: handle Cluster Metrics updates
Product: OpenShift Container Platform Reporter: Evgheni Dereveanchin <ederevea>
Component: InstallerAssignee: Jeff Cantrill <jcantril>
Status: CLOSED ERRATA QA Contact: Peng Li <penli>
Severity: high Docs Contact:
Priority: medium    
Version: 3.1.0CC: aos-bugs, bleanhar, erich, jcantril, jdetiber, jokerman, juzhao, mmccomas, myllynen, tdawson, wsun, xiazhao
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openshift-ansible-3.0.40-1.git.1.4385281.el7aos Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-04-12 18:47:12 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1420229    
Bug Blocks: 1267746    

Description Evgheni Dereveanchin 2016-02-11 14:58:22 UTC
Description of problem:
Currently cluster metrics image version is hard-coded in the template, so during deployment a specific version (currently 3.1.0, see bz#1306665) is created and stays at this version since there's no related imageStream/deploymentConfig/etc

Version-Release number of selected component (if applicable):
OpenShift Enterprise 3.1.1

How reproducible:
always

Steps to Reproduce:
1. Install OpenShift 3.1.0
2. Install Cluster Metrics following the official documentation:
https://access.redhat.com/documentation/en/openshift-enterprise/3.1/installation-and-configuration/chapter-18-enabling-cluster-metrics#creating-the-deployer-template
3. version 3.1.0 is deployed
4. Upgrade to 3.1.1

Actual results:
Metrics stay at version 3.1.0

Expected results:
Metrics updated just like Registry and Router

Additional info:
Documentation suggests using image version "latest" as a workaround (and that's default in Origin) yet this may lead to inconsistent results due to node restarts, etc and different pods from the Metrics deployment running various versions at the same time. So we need to handle this systematically.

Comment 4 Brenton Leanhardt 2016-02-11 20:37:12 UTC
For now the plan is to have the default for all logging and metrics deployments be the 3.1.1 images.  We've verified it works in development and we'll have QE make sure there aren't any regressions for 3.1.0 environments.

In the medium-term we've added a task to https://trello.com/c/pQ2cmWhG/123-8-openshift-ansible-playbook-for-logging-metrics-installation to solve this in a cleaner way in ansible.  That's one of our highest priority backlog items.

Comment 6 Brenton Leanhardt 2016-02-11 20:49:01 UTC
Actually, I was referring to installing 3.1.1 and getting the 3.1.0 logging/metrics images.  I'm going to move this back to assigned and market it upcoming release since it's technically the same thing as the card I mentioned.

Comment 12 Jeff Cantrill 2017-02-01 19:40:43 UTC
Assuming 3.4 updates the template based on: 
https://github.com/openshift/openshift-ansible/blob/release-1.4/roles/openshift_hosted_templates/files/v1.4/enterprise/metrics-deployer.yaml#L108

and/or 

https://github.com/openshift/openshift-ansible/blob/openshift-ansible-3.4.58-1/roles/openshift_hosted_templates/files/v1.4/enterprise/metrics-deployer.yaml

I would expect metrics to be upgraded to the correct version.  Additionally, in 3.5, using the openshift_metrics ansible role will allow you to pass the right value from your host inventory file.

Comment 13 Peng Li 2017-02-20 09:21:51 UTC
@jcantril, I test below scenario, not sure it's sufficient to verify this bug.

1. deploy a previous version(3.4.1) Metrics using deployer

2. use ansible to deploy 3.5.0 Metrics.

[oo_first_master]
$MASTER  ansible_user=root ansible_ssh_user=root ansible_ssh_private_key_file="~/.ssh/libra.pem" openshift_public_hostname=MASTER

[oo_first_master:vars]
deployment_type=openshift-enterprise
openshift_release=v3.5.0

openshift_metrics_install_metrics=true

openshift_metrics_hawkular_hostname=hawkular-metrics.$SUBDOMAIN
openshift_metrics_project=openshift-infra

openshift_metrics_image_prefix=registry.ops.openshift.com/openshift3/
openshift_metrics_image_version=3.5.0

openshift_metrics_cassandra_storage_type=pv
openshift_metrics_cassandra_pvc_size=10Gi

3. check the pods are updated to 3.5.0, pvc is there, and previous metrics data(metrics gathered by 3.4.1 Metrics) is there.

# oc get pod
NAME                         READY     STATUS      RESTARTS   AGE
hawkular-cassandra-1-tnk3t   1/1       Running     0          1m
hawkular-metrics-g5svn       1/1       Running     0          1m
heapster-m2nng               1/1       Running     0          1m
metrics-deployer-djdbc       0/1       Completed   0          13m

# oc get pvc
NAME                  STATUS    VOLUME    CAPACITY   ACCESSMODES   AGE
metrics-cassandra-1   Bound     pv1       10Gi       RWO           13m

Comment 14 Jeff Cantrill 2017-02-20 20:18:10 UTC
@Wei this seems a reasonable test to me.

Comment 15 Peng Li 2017-02-21 00:05:28 UTC
set to verified base on comment #12 and comment #14

Comment 17 errata-xmlrpc 2017-04-12 18:47:12 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:0903