Bug 1590748 - Metrics pods are deployed to openshift-metrics project instead of openshift-infra
Summary: Metrics pods are deployed to openshift-metrics project instead of openshift-i...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Hawkular
Version: 3.10.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 3.10.0
Assignee: Ruben Vargas Palma
QA Contact: Junqi Zhao
URL:
Whiteboard:
Depends On: 1570583
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-06-13 10:21 UTC by Weinan Liu
Modified: 2018-11-26 16:09 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-11-26 16:09:30 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Weinan Liu 2018-06-13 10:21:56 UTC
Description of problem:
Metrics pods are deployed to openshift-metrics project instead of openshift-infra, this will make HPA fails to get metrcis data, and console fails to get data too

Version-Release number of selected component (if applicable):
Last login: Wed Jun 13 02:59:47 2018 from 119.254.120.72
[root@ip-172-18-7-72 ~]# oc version
oc v3.10.0-0.66.0
kubernetes v1.10.0+b81c8f8
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://ip-172-18-7-72.ec2.internal:8443
openshift v3.10.0-0.66.0
kubernetes v1.10.0+b81c8f8
[root@ip-172-18-7-72 ~]# cat /etc/redhat-release 
Red Hat Enterprise Linux Server release 7.5 (Maipo)

[nathan@localhost openshift-ansible]$ git log
commit a1634c352a0ebc4476c9d961a74f2c3817ad35e8
Merge: 31696e5 f69b1aa
Author: Paul Weil <pweil>
Date:   Tue Jun 12 15:00:38 2018 -0500

    Merge pull request #8732 from kwoodson/azure_ci_url_fix
    
    Adding etcd image variables to fix deployments.

How reproducible:
always

Steps to Reproduce:
1. Update qe-inventory-host-file to include metrics install parameters:
openshift_metrics_install_metrics=True
openshift_metrics_image_version="v3.10.0-0.28.0"
2. Install metrics by
$ ansible-playbook -i qe-inventory-host-file playbooks/openshift-metrics/config.yml 

Actual results:
1. metrics pods deployed to the project of openshift-infra
2. metrics data fails to be retrieved
[root@ip-172-18-7-72 ~]# oc adm top node
Error from server (NotFound): the server could not find the requested resource (get services https:heapster:)


Expected results:
metrics pods deployed to the project of openshift-metrics
metrics data can be retrieved

Additional info:
N/A

Comment 2 John Sanda 2018-06-13 14:08:08 UTC
What am I supposed to do with this. As discussed in bug 1570583, metrics can no longer be deployed in openshift-infra.

Comment 3 DeShuai Ma 2018-06-13 14:48:48 UTC
Before we move metrics to different namespace we need change some component(like hpa, not sure if console get metrics from openshift-infra too) first which depends on metrics. 
Also we need verify if it can work well for upgrade or not.

Comment 4 Solly Ross 2018-06-14 16:43:24 UTC
Heapster has to be deployed into openshift-infra (at least, at the moment).  When we switch to metrics-server, that won't be the case any more.  For the mean time, we'd have to change where the HPA controller looked for Heapster, but that would cause HPA outages during upgrade.

Comment 5 Jessica Forrester 2018-06-14 16:52:03 UTC
For the console it shouldn't matter what namespace its in, its just a URL that gets dumped into the console's config. Make sure you separately accepted the cert in the browser for the hawkular URL.

Comment 12 Weinan Liu 2018-06-21 05:58:17 UTC
cassandra is still deployed to the namespace of openshift-metrics on openshift v3.10.2



[root@qe-weinliu-310-2-master-etcd-1 ~]# oc get pod -n openshift-metrics
NAME                            READY     STATUS      RESTARTS   AGE
hawkular-cassandra-1-4jvx4      1/1       Running     0          4m
hawkular-metrics-4bfld          1/1       Running     0          4m
hawkular-metrics-schema-l58g6   0/1       Completed   0          5m
heapster-wfbpp                  1/1       Running     0          4m
[root@qe-weinliu-310-2-master-etcd-1 ~]# oc version
oc v3.10.2
kubernetes v1.10.0+b81c8f8
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://qe-weinliu-310-2-master-etcd-1:8443
openshift v3.10.2
kubernetes v1.10.0+b81c8f8

Comment 13 Weinan Liu 2018-06-21 11:29:42 UTC
Verified to be fixed on branch below

$  git checkout openshift-ansible-3.10.2-1
$ git branch
* (HEAD detached at openshift-ansible-3.10.2-1)
  master
  release-3.9

$git log
<...snip...>
[root@qe-weinliu-310-2-master-etcd-1 ~]# oc get pod -n openshift-infra
NAME                            READY     STATUS      RESTARTS   AGE
hawkular-cassandra-1-hc42l      1/1       Running     0          2m
hawkular-metrics-lmg25          1/1       Running     0          2m
hawkular-metrics-schema-j98qv   0/1       Completed   0          3m
heapster-27cpd                  1/1       Running     0          2m



commit eb744428280460b8b5ca5a80625aba03e14baf21
Merge: 4f45f04 3817da3
Author: Scott Dodson <sdodson>
Date:   Wed Jun 20 08:37:37 2018 -0400

    Merge pull request #8861 from openshift-cherrypick-robot/cherry-pick-8850-to-release-3.10
...skipping...
    Revert "Migrate hawkular metrics to a new namespace"
    
    This reverts commit 125d8f3d922bae71482f317d3664a52595c53ec0.

<...snip...>

Comment 14 Anping Li 2018-07-06 11:15:21 UTC
@jsanda.  @jliggitt
A following question, What shall we do during OCP upgrading (from v3.9 to v3.10)?

Options A: deploy v3.10 metrics on OCP v3.9 prior OCP Upgrade.

Options B: Provide document/scripts to fix the Permssion issue prior OCP upgraded.  and then deploy metrics v3.10 once OCP was upgraded.

Options C: Warning the Gap, and ask Customer to redeploy metrics immediately once OCP was updated.

Comment 15 Anping Li 2018-07-09 08:11:48 UTC
Options A: failed with the following errors.

We could not deploy v3.10 metrics on v3.9.
TASK [openshift_version : assert openshift_release in openshift_image_tag] *****
Monday 09 July 2018  05:30:05 +0000 (0:00:00.057)       0:00:25.980 *********** 
fatal: [qe-anlimaster-etcd-1.0709-rwi.qe.rhcloud.com]: FAILED! => {
    "assertion": "openshift_release in openshift_image_tag", 
    "changed": false, 
    "evaluated_to": false, 
    "failed": true, 
    "msg": "openshift_image_tag must match same major version as openshift_release. You provided: 3.10 and v3.9.31\n"
}

I will open a doc bug to guide user during upgrade. So close session here.

Comment 19 Scott Dodson 2018-11-26 16:09:30 UTC
This bug was fixed in openshift-ansible-3.10.2


Note You need to log in before you can comment on or make changes to this bug.