Hide Forgot
Description of problem: OpenShift Metric fails to populate data in graphs after upgrading OpenShift cluster from OpenShift v.3.2 to OpenShift v.3.3. Worked fine on OpenShift 3.2 cluster with below packages atomic-openshift-clients-3.2.1.12-1.git.0.516a127.el7.x86_64 atomic-openshift-3.2.1.12-1.git.0.516a127.el7.x86_64 tuned-profiles-atomic-openshift-node-3.2.1.12-1.git.0.516a127.el7.x86_64 atomic-openshift-node-3.2.1.12-1.git.0.516a127.el7.x86_64 atomic-openshift-sdn-ovs-3.2.1.12-1.git.0.516a127.el7.x86_64 atomic-openshift-master-3.2.1.12-1.git.0.516a127.el7.x86_64 Issue visible after upgrade to below packages Version-Release number of selected component (if applicable): tuned-profiles-atomic-openshift-node-3.3.0.28-1.git.0.c6f1247.el7.x86_64 atomic-openshift-clients-3.3.0.28-1.git.0.c6f1247.el7.x86_64 atomic-openshift-3.3.0.28-1.git.0.c6f1247.el7.x86_64 atomic-openshift-node-3.3.0.28-1.git.0.c6f1247.el7.x86_64 atomic-openshift-sdn-ovs-3.3.0.28-1.git.0.c6f1247.el7.x86_64 atomic-openshift-master-3.3.0.28-1.git.0.c6f1247.el7.x86_64 and --> Latest Upstream OpenShift Metrics images How reproducible: case 1) Existing metrics pods in cluster : On OpenShift v3.2 cluster with running OpenShift Metrics run upgrade of - atomic-openshift-master - atomic-openshift-node - restart services -> check heapster pod and check OpenShift Metrics tab in OpenShift web console ( network/cpu/memory graphs will be empty ) or case 2) Create metrics pods after upgrade, eg run upgrade of - atomic-openshift-master - atomic-openshift-node - restart services -> create metrics pods following https://github.com/openshift/origin-metrics, I was doing that specifically running --- oc create -f metrics-deployer-setup.yaml -n openshift-infra oadm policy add-role-to-user edit system:serviceaccount:openshift-infra:metrics-deployer -n openshift-infra oadm policy add-cluster-role-to-user cluster-reader system:serviceaccount:openshift-infra:heapster -n openshift-infra oc secrets new metrics-deployer nothing=/dev/null -n openshift-infra oc process -f metrics.yaml -v HAWKULAR_METRICS_HOSTNAME=<hostname>,USE_PERSISTENT_STORAGE=false | oc create -n openshift-infra -f - ---- After Metrics pods starts -> check heapster pod and check OpenShift Metrics tab in OpenShift web console ( network/cpu/memory graphs will be empty - after 1d of waiting ) Steps to Reproduce: See above Actual results: After upgrading OpenShift v3.2 cluster to OpenShift v.3.3 metrics will not show data in graphs ] -> Openshift Metrics pods are running in openshift-infra project -> for Heapster pod error message in logs will be visible --- E0902 08:21:05.098752 1 kubelet.go:230] error while getting containers from Kubelet: failed to get all container stats from Kubelet URL "https://172.31.63.12:10250/stats/container/": request failed - "401 Unauthorized", response: "Unauthorized" E0902 08:21:05.099193 1 kubelet.go:230] error while getting containers from Kubelet: failed to get all container stats from Kubelet URL "https://172.31.60.26:10250/stats/container/": request failed - "401 Unauthorized", response: "Unauthorized" E0902 08:21:05.103503 1 kubelet.go:230] error while getting containers from Kubelet: failed to get all container stats from Kubelet URL "https://172.31.63.15:10250/stats/container/": request failed - "401 Unauthorized", response: "Unauthorized" E0902 08:21:05.106422 1 kubelet.go:230] error while getting containers from Kubelet: failed to get all container stats from Kubelet URL "https://172.31.63.14:10250/stats/container/": request failed - "401 Unauthorized", response: "Unauthorized" --- I got this issue on two different clusters where Openshift metrics fails to work after upgrade from OpenShift v.3.2 to OpenShift v.3.3 Please note that this was not new installation of OpenShift v3.3, but upgrade to OpenShift v.3.3 Expected results: OpenShift Metrics to work after upgrade of OpenShift cluster from v.3.2 -> v.3.3 Additional info: # oc get pods -n openshift-infra -o wide NAME READY STATUS RESTARTS AGE IP NODE hawkular-cassandra-1-d8swe 1/1 Running 0 11m 172.20.11.2 ip-172-31-63-8.us-west-2.compute.internal hawkular-metrics-evqzj 1/1 Running 0 11m 172.20.13.3 ip-172-31-63-7.us-west-2.compute.internal heapster-f1kgn 1/1 Running 0 11m 172.20.4.2 ip-172-31-63-16.us-west-2.compute.internal metrics-deployer-k53y9 0/1 Completed 0 12m 172.20.13.2 ip-172-31-63-7.us-west-2.compute.internal -> tested upgrade on two different OpenShift Clusters -> metrics installation was done using following instructions from https://github.com/openshift/origin-metrics
Are you also updating Metrics to version 3.3? The console is using apis which are only available with the Metrics meant for 3.3, which could explain why the console is failing. When updating version of OpenShift you must also update the metrics components. Please see the docs: https://docs.openshift.com/enterprise/3.2/install_config/upgrading/automated_upgrades.html#automated-upgrading-cluster-metrics [those are the 3.2 docs, but its the same update procedure for 3.3]
Seeing the same thing with a yum update to 3.3. Noticed that it grabbed the docker hub metrics/cassandra/heapster images for some reason. Is that correct? e2d419c54f56 openshift/origin-metrics-heapster:latest "heapster-wrapper.sh " 14 hours ago Up 14 hours k8s_heapster.4ce7cca2_heapster-xjpe1_openshift-infra_c8e6ba62-868e-11e6-8311-246e960f19fc_baa53e21 # docker images | grep heapster docker.io/openshift/origin-metrics-heapster latest d10568760a84 3 days ago 994.8 MB
Sorry, guess I had grabbed the origin version of metrics-deployer.yaml. Corrected that and followed updated docs for 3.3 but same issue. oadm diagnostics MetricsApiProxy reports no warnings or errors.
Looks like cluster roles needed to be fixed up this way post yum upgrade. # oadm policy reconcile-cluster-roles --confirm -o name clusterrole/sudoer clusterrole/cluster-reader clusterrole/system:build-strategy-jenkinspipeline clusterrole/admin clusterrole/edit clusterrole/view clusterrole/basic-user clusterrole/self-access-reviewer clusterrole/cluster-status clusterrole/system:image-builder clusterrole/system:image-pruner clusterrole/system:image-signer clusterrole/system:deployer clusterrole/system:router clusterrole/system:registry clusterrole/system:node clusterrole/system:sdn-reader clusterrole/system:discovery clusterrole/registry-admin clusterrole/registry-editor
Closing this as not a bug as it appears the cluster role update step was skipped during the update. When running this command, things appear to function again.