Bug 1372614 - OpenShift metrics does not work after upgrade from OpenShift 3.2 to OpenShift 3.3
Summary: OpenShift metrics does not work after upgrade from OpenShift 3.2 to OpenShift...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Hawkular
Version: 3.3.1
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: ---
Assignee: Matt Wringe
QA Contact: Peng Li
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-09-02 08:32 UTC by Elvir Kuric
Modified: 2018-03-26 11:06 UTC (History)
4 users (show)

Fixed In Version: tstclair@redhat.com
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-10-27 16:14:04 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Elvir Kuric 2016-09-02 08:32:51 UTC
Description of problem:

OpenShift Metric fails to populate data in graphs after upgrading OpenShift cluster from OpenShift v.3.2 to OpenShift v.3.3. 

Worked fine on OpenShift 3.2 cluster with below packages 

atomic-openshift-clients-3.2.1.12-1.git.0.516a127.el7.x86_64
atomic-openshift-3.2.1.12-1.git.0.516a127.el7.x86_64
tuned-profiles-atomic-openshift-node-3.2.1.12-1.git.0.516a127.el7.x86_64
atomic-openshift-node-3.2.1.12-1.git.0.516a127.el7.x86_64
atomic-openshift-sdn-ovs-3.2.1.12-1.git.0.516a127.el7.x86_64
atomic-openshift-master-3.2.1.12-1.git.0.516a127.el7.x86_64

Issue visible after upgrade to below packages 

Version-Release number of selected component (if applicable):


tuned-profiles-atomic-openshift-node-3.3.0.28-1.git.0.c6f1247.el7.x86_64
atomic-openshift-clients-3.3.0.28-1.git.0.c6f1247.el7.x86_64
atomic-openshift-3.3.0.28-1.git.0.c6f1247.el7.x86_64
atomic-openshift-node-3.3.0.28-1.git.0.c6f1247.el7.x86_64
atomic-openshift-sdn-ovs-3.3.0.28-1.git.0.c6f1247.el7.x86_64
atomic-openshift-master-3.3.0.28-1.git.0.c6f1247.el7.x86_64

and 

--> Latest Upstream OpenShift Metrics images 


How reproducible:

case 1) 

Existing metrics pods in cluster : 

On OpenShift v3.2 cluster with running OpenShift Metrics run upgrade of 
- atomic-openshift-master 
- atomic-openshift-node 
- restart services 
-> check heapster pod and check OpenShift Metrics tab in OpenShift web console ( network/cpu/memory graphs will be empty ) 

or 
case 2)
Create metrics pods after upgrade, eg 

run upgrade of 
- atomic-openshift-master 
- atomic-openshift-node 
- restart services 

-> create metrics pods following https://github.com/openshift/origin-metrics, I was doing that specifically running 

--- 
oc create -f metrics-deployer-setup.yaml -n openshift-infra
oadm policy add-role-to-user edit system:serviceaccount:openshift-infra:metrics-deployer -n openshift-infra
oadm policy add-cluster-role-to-user cluster-reader system:serviceaccount:openshift-infra:heapster -n openshift-infra
oc secrets new metrics-deployer nothing=/dev/null -n openshift-infra
oc process -f metrics.yaml -v HAWKULAR_METRICS_HOSTNAME=<hostname>,USE_PERSISTENT_STORAGE=false |  oc create -n openshift-infra -f -
---- 
After Metrics pods starts 
-> check heapster pod and check OpenShift Metrics tab in OpenShift web console ( network/cpu/memory graphs will be empty  - after 1d of waiting ) 

Steps to Reproduce:
See above 

Actual results:
After upgrading OpenShift v3.2 cluster to OpenShift v.3.3 metrics will not show data in graphs ]

-> Openshift Metrics pods are running in openshift-infra project
-> for Heapster pod error message in logs will be visible 
---
E0902 08:21:05.098752       1 kubelet.go:230] error while getting containers from Kubelet: failed to get all container stats from Kubelet URL "https://172.31.63.12:10250/stats/container/": request failed - "401 Unauthorized", response: "Unauthorized"
E0902 08:21:05.099193       1 kubelet.go:230] error while getting containers from Kubelet: failed to get all container stats from Kubelet URL "https://172.31.60.26:10250/stats/container/": request failed - "401 Unauthorized", response: "Unauthorized"
E0902 08:21:05.103503       1 kubelet.go:230] error while getting containers from Kubelet: failed to get all container stats from Kubelet URL "https://172.31.63.15:10250/stats/container/": request failed - "401 Unauthorized", response: "Unauthorized"
E0902 08:21:05.106422       1 kubelet.go:230] error while getting containers from Kubelet: failed to get all container stats from Kubelet URL "https://172.31.63.14:10250/stats/container/": request failed - "401 Unauthorized", response: "Unauthorized"
--- 

I got this issue on two different clusters where Openshift metrics fails to work after upgrade from OpenShift v.3.2 to OpenShift v.3.3 
Please note that this was not new installation of OpenShift v3.3, but upgrade to OpenShift v.3.3

Expected results:

OpenShift Metrics to work after upgrade of OpenShift cluster from v.3.2 -> v.3.3 

Additional info:

# oc get pods -n openshift-infra -o wide
NAME                         READY     STATUS      RESTARTS   AGE       IP            NODE
hawkular-cassandra-1-d8swe   1/1       Running     0          11m       172.20.11.2   ip-172-31-63-8.us-west-2.compute.internal
hawkular-metrics-evqzj       1/1       Running     0          11m       172.20.13.3   ip-172-31-63-7.us-west-2.compute.internal
heapster-f1kgn               1/1       Running     0          11m       172.20.4.2    ip-172-31-63-16.us-west-2.compute.internal
metrics-deployer-k53y9       0/1       Completed   0          12m       172.20.13.2   ip-172-31-63-7.us-west-2.compute.internal


-> tested upgrade on two different OpenShift Clusters 
-> metrics installation was done using following instructions from https://github.com/openshift/origin-metrics

Comment 1 Matt Wringe 2016-09-02 14:46:24 UTC
Are you also updating Metrics to version 3.3? The console is using apis which are only available with the Metrics meant for 3.3, which could explain why the console is failing.

When updating version of OpenShift you must also update the metrics components. Please see the docs: https://docs.openshift.com/enterprise/3.2/install_config/upgrading/automated_upgrades.html#automated-upgrading-cluster-metrics

[those are the 3.2 docs, but its the same update procedure for 3.3]

Comment 3 Pete MacKinnon 2016-09-30 12:33:16 UTC
Seeing the same thing with a yum update to 3.3. Noticed that it grabbed the docker hub metrics/cassandra/heapster images for some reason. Is that correct?

e2d419c54f56 openshift/origin-metrics-heapster:latest "heapster-wrapper.sh " 14 hours ago Up 14 hours k8s_heapster.4ce7cca2_heapster-xjpe1_openshift-infra_c8e6ba62-868e-11e6-8311-246e960f19fc_baa53e21

# docker images | grep heapster
docker.io/openshift/origin-metrics-heapster latest d10568760a84 3 days ago 994.8 MB

Comment 4 Pete MacKinnon 2016-09-30 13:42:57 UTC
Sorry, guess I had grabbed the origin version of metrics-deployer.yaml. Corrected that and followed updated docs for 3.3 but same issue.

oadm diagnostics MetricsApiProxy reports no warnings or errors.

Comment 5 Pete MacKinnon 2016-09-30 20:05:55 UTC
Looks like cluster roles needed to be fixed up this way post yum upgrade.

# oadm policy reconcile-cluster-roles --confirm -o name
clusterrole/sudoer
clusterrole/cluster-reader
clusterrole/system:build-strategy-jenkinspipeline
clusterrole/admin
clusterrole/edit
clusterrole/view
clusterrole/basic-user
clusterrole/self-access-reviewer
clusterrole/cluster-status
clusterrole/system:image-builder
clusterrole/system:image-pruner
clusterrole/system:image-signer
clusterrole/system:deployer
clusterrole/system:router
clusterrole/system:registry
clusterrole/system:node
clusterrole/system:sdn-reader
clusterrole/system:discovery
clusterrole/registry-admin
clusterrole/registry-editor

Comment 6 Matt Wringe 2016-10-27 16:14:04 UTC
Closing this as not a bug as it appears the cluster role update step was skipped during the update. When running this command, things appear to function again.


Note You need to log in before you can comment on or make changes to this bug.