Description of problem:
Heapster uses the InternalIP key value to retrieve metrics. When using the vsphere cloud provider.
We upgraded from latest async 3.6 release and then upgraded to 3.7.9. Heapster broke as it was not able to retrieve the InternalIP key from the node object.
Turns out the root cause is... https://github.com/kubernetes/kubernetes/issues/48760
(Description from gh issue)
After upgrading the kubernetes cluster from 1.6.4 with the VSphere Cloud Provider enabled to 1.6.5/1.6.6 or 1.6.7, the cluster nodes don't have anymore InternalIPs
This is a request to backport the fix into 3.7.9
Version-Release number of selected component (if applicable):
OpenShift Master : v3.7.9
Kubernetes Master : v1.7.5+a08f5ee
Heapster to work
Basically the issue is the kubelet calls out to the cloud provider to get the addresses of the instance on which it runs. That call is allowed to multiple addresses of different types:
// These are valid address type of node.
NodeHostName NodeAddressType = "Hostname"
NodeExternalIP NodeAddressType = "ExternalIP"
NodeInternalIP NodeAddressType = "InternalIP"
NodeExternalDNS NodeAddressType = "ExternalDNS"
NodeInternalDNS NodeAddressType = "InternalDNS"
The issue is that setNodeAddress() is only using the first address returned, and if that address is not the InternalIP, then it is skipped. vSphere's method returns the ExternalIP first, thus the InternalIP is not set.
The only workaround would be if the vSphere instance.NodeAddresses() call could be modified to return the InternalIP first. Then the ExternalIP would not be set, but I'm not sure if that is used for anything other that user friendly metadata.
allowed to *return multiple addresses...
Target release was incorrect. This PR is for 3.7. It has merged. Going to QE.
# openshift version
on vsphere with cloudprovider enabled.
and heapster work well.
# oc get pod
NAME READY STATUS RESTARTS AGE
hawkular-cassandra-1-krr87 1/1 Running 0 13m
hawkular-metrics-4pnmr 1/1 Running 0 13m
heapster-g78wb 1/1 Running 0 9m
# oc adm top node
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
dhcp-66-146-181.nay.redhat.com 527m 13% 4188Mi 73%
# oc adm top pod
NAME CPU(cores) MEMORY(bytes)
heapster-g78wb 2m 22Mi
hawkular-metrics-4pnmr 29m 1396Mi
hawkular-cassandra-1-krr87 322m 1754Mi
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
Can someone tell me in which release of 3.9 would this fix have first appeared?