Description of problem: Heapster uses the InternalIP key value to retrieve metrics. When using the vsphere cloud provider. We upgraded from latest async 3.6 release and then upgraded to 3.7.9. Heapster broke as it was not able to retrieve the InternalIP key from the node object. Turns out the root cause is... https://github.com/kubernetes/kubernetes/issues/48760 (Description from gh issue) After upgrading the kubernetes cluster from 1.6.4 with the VSphere Cloud Provider enabled to 1.6.5/1.6.6 or 1.6.7, the cluster nodes don't have anymore InternalIPs This is a request to backport the fix into 3.7.9 Version-Release number of selected component (if applicable): OpenShift Master : v3.7.9 Kubernetes Master : v1.7.5+a08f5ee Expected behaviour: Heapster to work
Basically the issue is the kubelet calls out to the cloud provider to get the addresses of the instance on which it runs. That call is allowed to multiple addresses of different types: // These are valid address type of node. const ( NodeHostName NodeAddressType = "Hostname" NodeExternalIP NodeAddressType = "ExternalIP" NodeInternalIP NodeAddressType = "InternalIP" NodeExternalDNS NodeAddressType = "ExternalDNS" NodeInternalDNS NodeAddressType = "InternalDNS" ) The issue is that setNodeAddress() is only using the first address returned, and if that address is not the InternalIP, then it is skipped. vSphere's method returns the ExternalIP first, thus the InternalIP is not set. The only workaround would be if the vSphere instance.NodeAddresses() call could be modified to return the InternalIP first. Then the ExternalIP would not be set, but I'm not sure if that is used for anything other that user friendly metadata.
allowed to *return multiple addresses...
Target release was incorrect. This PR is for 3.7. It has merged. Going to QE.
Checked with # openshift version openshift v3.7.31 kubernetes v1.7.6+a08f5eeb62 etcd 3.2.8 on vsphere with cloudprovider enabled. and heapster work well. # oc get pod NAME READY STATUS RESTARTS AGE hawkular-cassandra-1-krr87 1/1 Running 0 13m hawkular-metrics-4pnmr 1/1 Running 0 13m heapster-g78wb 1/1 Running 0 9m # oc adm top node NAME CPU(cores) CPU% MEMORY(bytes) MEMORY% dhcp-66-146-181.nay.redhat.com 527m 13% 4188Mi 73% # oc adm top pod NAME CPU(cores) MEMORY(bytes) heapster-g78wb 2m 22Mi hawkular-metrics-4pnmr 29m 1396Mi hawkular-cassandra-1-krr87 322m 1754Mi
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:0636
Can someone tell me in which release of 3.9 would this fix have first appeared?