Description of problem: Metrics pods are deployed to openshift-metrics project instead of openshift-infra, this will make HPA fails to get metrcis data, and console fails to get data too Version-Release number of selected component (if applicable): Last login: Wed Jun 13 02:59:47 2018 from 119.254.120.72 [root@ip-172-18-7-72 ~]# oc version oc v3.10.0-0.66.0 kubernetes v1.10.0+b81c8f8 features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://ip-172-18-7-72.ec2.internal:8443 openshift v3.10.0-0.66.0 kubernetes v1.10.0+b81c8f8 [root@ip-172-18-7-72 ~]# cat /etc/redhat-release Red Hat Enterprise Linux Server release 7.5 (Maipo) [nathan@localhost openshift-ansible]$ git log commit a1634c352a0ebc4476c9d961a74f2c3817ad35e8 Merge: 31696e5 f69b1aa Author: Paul Weil <pweil> Date: Tue Jun 12 15:00:38 2018 -0500 Merge pull request #8732 from kwoodson/azure_ci_url_fix Adding etcd image variables to fix deployments. How reproducible: always Steps to Reproduce: 1. Update qe-inventory-host-file to include metrics install parameters: openshift_metrics_install_metrics=True openshift_metrics_image_version="v3.10.0-0.28.0" 2. Install metrics by $ ansible-playbook -i qe-inventory-host-file playbooks/openshift-metrics/config.yml Actual results: 1. metrics pods deployed to the project of openshift-infra 2. metrics data fails to be retrieved [root@ip-172-18-7-72 ~]# oc adm top node Error from server (NotFound): the server could not find the requested resource (get services https:heapster:) Expected results: metrics pods deployed to the project of openshift-metrics metrics data can be retrieved Additional info: N/A
What am I supposed to do with this. As discussed in bug 1570583, metrics can no longer be deployed in openshift-infra.
Before we move metrics to different namespace we need change some component(like hpa, not sure if console get metrics from openshift-infra too) first which depends on metrics. Also we need verify if it can work well for upgrade or not.
Heapster has to be deployed into openshift-infra (at least, at the moment). When we switch to metrics-server, that won't be the case any more. For the mean time, we'd have to change where the HPA controller looked for Heapster, but that would cause HPA outages during upgrade.
For the console it shouldn't matter what namespace its in, its just a URL that gets dumped into the console's config. Make sure you separately accepted the cert in the browser for the hawkular URL.
PR https://github.com/openshift/openshift-ansible/pull/8831
cassandra is still deployed to the namespace of openshift-metrics on openshift v3.10.2 [root@qe-weinliu-310-2-master-etcd-1 ~]# oc get pod -n openshift-metrics NAME READY STATUS RESTARTS AGE hawkular-cassandra-1-4jvx4 1/1 Running 0 4m hawkular-metrics-4bfld 1/1 Running 0 4m hawkular-metrics-schema-l58g6 0/1 Completed 0 5m heapster-wfbpp 1/1 Running 0 4m [root@qe-weinliu-310-2-master-etcd-1 ~]# oc version oc v3.10.2 kubernetes v1.10.0+b81c8f8 features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://qe-weinliu-310-2-master-etcd-1:8443 openshift v3.10.2 kubernetes v1.10.0+b81c8f8
Verified to be fixed on branch below $ git checkout openshift-ansible-3.10.2-1 $ git branch * (HEAD detached at openshift-ansible-3.10.2-1) master release-3.9 $git log <...snip...> [root@qe-weinliu-310-2-master-etcd-1 ~]# oc get pod -n openshift-infra NAME READY STATUS RESTARTS AGE hawkular-cassandra-1-hc42l 1/1 Running 0 2m hawkular-metrics-lmg25 1/1 Running 0 2m hawkular-metrics-schema-j98qv 0/1 Completed 0 3m heapster-27cpd 1/1 Running 0 2m commit eb744428280460b8b5ca5a80625aba03e14baf21 Merge: 4f45f04 3817da3 Author: Scott Dodson <sdodson> Date: Wed Jun 20 08:37:37 2018 -0400 Merge pull request #8861 from openshift-cherrypick-robot/cherry-pick-8850-to-release-3.10 ...skipping... Revert "Migrate hawkular metrics to a new namespace" This reverts commit 125d8f3d922bae71482f317d3664a52595c53ec0. <...snip...>
@jsanda. @jliggitt A following question, What shall we do during OCP upgrading (from v3.9 to v3.10)? Options A: deploy v3.10 metrics on OCP v3.9 prior OCP Upgrade. Options B: Provide document/scripts to fix the Permssion issue prior OCP upgraded. and then deploy metrics v3.10 once OCP was upgraded. Options C: Warning the Gap, and ask Customer to redeploy metrics immediately once OCP was updated.
Options A: failed with the following errors. We could not deploy v3.10 metrics on v3.9. TASK [openshift_version : assert openshift_release in openshift_image_tag] ***** Monday 09 July 2018 05:30:05 +0000 (0:00:00.057) 0:00:25.980 *********** fatal: [qe-anlimaster-etcd-1.0709-rwi.qe.rhcloud.com]: FAILED! => { "assertion": "openshift_release in openshift_image_tag", "changed": false, "evaluated_to": false, "failed": true, "msg": "openshift_image_tag must match same major version as openshift_release. You provided: 3.10 and v3.9.31\n" } I will open a doc bug to guide user during upgrade. So close session here.
This bug was fixed in openshift-ansible-3.10.2