Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1372614

Summary:	OpenShift metrics does not work after upgrade from OpenShift 3.2 to OpenShift 3.3
Product:	OpenShift Container Platform	Reporter:	Elvir Kuric <ekuric>
Component:	Hawkular	Assignee:	Matt Wringe <mwringe>
Status:	CLOSED NOTABUG	QA Contact:	Peng Li <penli>
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	3.3.1	CC:	aos-bugs, ekuric, jeder, pmackinn
Target Milestone:	---
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:	tstclair@redhat.com	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2016-10-27 16:14:04 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Elvir Kuric 2016-09-02 08:32:51 UTC

Description of problem:

OpenShift Metric fails to populate data in graphs after upgrading OpenShift cluster from OpenShift v.3.2 to OpenShift v.3.3. 

Worked fine on OpenShift 3.2 cluster with below packages 

atomic-openshift-clients-3.2.1.12-1.git.0.516a127.el7.x86_64
atomic-openshift-3.2.1.12-1.git.0.516a127.el7.x86_64
tuned-profiles-atomic-openshift-node-3.2.1.12-1.git.0.516a127.el7.x86_64
atomic-openshift-node-3.2.1.12-1.git.0.516a127.el7.x86_64
atomic-openshift-sdn-ovs-3.2.1.12-1.git.0.516a127.el7.x86_64
atomic-openshift-master-3.2.1.12-1.git.0.516a127.el7.x86_64

Issue visible after upgrade to below packages 

Version-Release number of selected component (if applicable):


tuned-profiles-atomic-openshift-node-3.3.0.28-1.git.0.c6f1247.el7.x86_64
atomic-openshift-clients-3.3.0.28-1.git.0.c6f1247.el7.x86_64
atomic-openshift-3.3.0.28-1.git.0.c6f1247.el7.x86_64
atomic-openshift-node-3.3.0.28-1.git.0.c6f1247.el7.x86_64
atomic-openshift-sdn-ovs-3.3.0.28-1.git.0.c6f1247.el7.x86_64
atomic-openshift-master-3.3.0.28-1.git.0.c6f1247.el7.x86_64

and 

--> Latest Upstream OpenShift Metrics images 


How reproducible:

case 1) 

Existing metrics pods in cluster : 

On OpenShift v3.2 cluster with running OpenShift Metrics run upgrade of 
- atomic-openshift-master 
- atomic-openshift-node 
- restart services 
-> check heapster pod and check OpenShift Metrics tab in OpenShift web console ( network/cpu/memory graphs will be empty ) 

or 
case 2)
Create metrics pods after upgrade, eg 

run upgrade of 
- atomic-openshift-master 
- atomic-openshift-node 
- restart services 

-> create metrics pods following https://github.com/openshift/origin-metrics, I was doing that specifically running 

--- 
oc create -f metrics-deployer-setup.yaml -n openshift-infra
oadm policy add-role-to-user edit system:serviceaccount:openshift-infra:metrics-deployer -n openshift-infra
oadm policy add-cluster-role-to-user cluster-reader system:serviceaccount:openshift-infra:heapster -n openshift-infra
oc secrets new metrics-deployer nothing=/dev/null -n openshift-infra
oc process -f metrics.yaml -v HAWKULAR_METRICS_HOSTNAME=<hostname>,USE_PERSISTENT_STORAGE=false |  oc create -n openshift-infra -f -
---- 
After Metrics pods starts 
-> check heapster pod and check OpenShift Metrics tab in OpenShift web console ( network/cpu/memory graphs will be empty  - after 1d of waiting ) 

Steps to Reproduce:
See above 

Actual results:
After upgrading OpenShift v3.2 cluster to OpenShift v.3.3 metrics will not show data in graphs ]

-> Openshift Metrics pods are running in openshift-infra project
-> for Heapster pod error message in logs will be visible 
---
E0902 08:21:05.098752       1 kubelet.go:230] error while getting containers from Kubelet: failed to get all container stats from Kubelet URL "https://172.31.63.12:10250/stats/container/": request failed - "401 Unauthorized", response: "Unauthorized"
E0902 08:21:05.099193       1 kubelet.go:230] error while getting containers from Kubelet: failed to get all container stats from Kubelet URL "https://172.31.60.26:10250/stats/container/": request failed - "401 Unauthorized", response: "Unauthorized"
E0902 08:21:05.103503       1 kubelet.go:230] error while getting containers from Kubelet: failed to get all container stats from Kubelet URL "https://172.31.63.15:10250/stats/container/": request failed - "401 Unauthorized", response: "Unauthorized"
E0902 08:21:05.106422       1 kubelet.go:230] error while getting containers from Kubelet: failed to get all container stats from Kubelet URL "https://172.31.63.14:10250/stats/container/": request failed - "401 Unauthorized", response: "Unauthorized"
--- 

I got this issue on two different clusters where Openshift metrics fails to work after upgrade from OpenShift v.3.2 to OpenShift v.3.3 
Please note that this was not new installation of OpenShift v3.3, but upgrade to OpenShift v.3.3

Expected results:

OpenShift Metrics to work after upgrade of OpenShift cluster from v.3.2 -> v.3.3 

Additional info:

# oc get pods -n openshift-infra -o wide
NAME                         READY     STATUS      RESTARTS   AGE       IP            NODE
hawkular-cassandra-1-d8swe   1/1       Running     0          11m       172.20.11.2   ip-172-31-63-8.us-west-2.compute.internal
hawkular-metrics-evqzj       1/1       Running     0          11m       172.20.13.3   ip-172-31-63-7.us-west-2.compute.internal
heapster-f1kgn               1/1       Running     0          11m       172.20.4.2    ip-172-31-63-16.us-west-2.compute.internal
metrics-deployer-k53y9       0/1       Completed   0          12m       172.20.13.2   ip-172-31-63-7.us-west-2.compute.internal


-> tested upgrade on two different OpenShift Clusters 
-> metrics installation was done using following instructions from https://github.com/openshift/origin-metrics

Comment 1 Matt Wringe 2016-09-02 14:46:24 UTC

Are you also updating Metrics to version 3.3? The console is using apis which are only available with the Metrics meant for 3.3, which could explain why the console is failing.

When updating version of OpenShift you must also update the metrics components. Please see the docs: https://docs.openshift.com/enterprise/3.2/install_config/upgrading/automated_upgrades.html#automated-upgrading-cluster-metrics

[those are the 3.2 docs, but its the same update procedure for 3.3]

Comment 3 Pete MacKinnon 2016-09-30 12:33:16 UTC

Seeing the same thing with a yum update to 3.3. Noticed that it grabbed the docker hub metrics/cassandra/heapster images for some reason. Is that correct?

e2d419c54f56 openshift/origin-metrics-heapster:latest "heapster-wrapper.sh " 14 hours ago Up 14 hours k8s_heapster.4ce7cca2_heapster-xjpe1_openshift-infra_c8e6ba62-868e-11e6-8311-246e960f19fc_baa53e21

# docker images | grep heapster
docker.io/openshift/origin-metrics-heapster latest d10568760a84 3 days ago 994.8 MB

Comment 4 Pete MacKinnon 2016-09-30 13:42:57 UTC

Sorry, guess I had grabbed the origin version of metrics-deployer.yaml. Corrected that and followed updated docs for 3.3 but same issue.

oadm diagnostics MetricsApiProxy reports no warnings or errors.

Comment 5 Pete MacKinnon 2016-09-30 20:05:55 UTC

Looks like cluster roles needed to be fixed up this way post yum upgrade.

# oadm policy reconcile-cluster-roles --confirm -o name
clusterrole/sudoer
clusterrole/cluster-reader
clusterrole/system:build-strategy-jenkinspipeline
clusterrole/admin
clusterrole/edit
clusterrole/view
clusterrole/basic-user
clusterrole/self-access-reviewer
clusterrole/cluster-status
clusterrole/system:image-builder
clusterrole/system:image-pruner
clusterrole/system:image-signer
clusterrole/system:deployer
clusterrole/system:router
clusterrole/system:registry
clusterrole/system:node
clusterrole/system:sdn-reader
clusterrole/system:discovery
clusterrole/registry-admin
clusterrole/registry-editor

Comment 6 Matt Wringe 2016-10-27 16:14:04 UTC

Closing this as not a bug as it appears the cluster role update step was skipped during the update. When running this command, things appear to function again.