Bug 1527315 - Heapster is unable to connect to nodes due to InternalIP missing from node object [NEEDINFO]
Summary: Heapster is unable to connect to nodes due to InternalIP missing from node ob...
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 3.7.1
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
: 3.7.z
Assignee: Seth Jennings
QA Contact: DeShuai Ma
Depends On:
TreeView+ depends on / blocked
Reported: 2017-12-19 08:02 UTC by Takeshi Larsson
Modified: 2018-10-31 21:34 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Fixes an issue when using the vsphere cloud provider where the InternalIP information is not populated for Nodes. This issue lead to problems with Heapster since it uses the InternalIP for gathering metrics.
Clone Of:
Last Closed: 2018-04-05 09:34:33 UTC
Target Upstream Version:
jack.ottofaro: needinfo?

Attachments (Terms of Use)

System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:0636 0 None None None 2018-04-05 09:35:02 UTC

Description Takeshi Larsson 2017-12-19 08:02:48 UTC
Description of problem:

Heapster uses the InternalIP key value to retrieve metrics. When using the vsphere cloud provider.

We upgraded from latest async 3.6 release and then upgraded to 3.7.9. Heapster broke as it was not able to retrieve the InternalIP key from the node object.

Turns out the root cause is... https://github.com/kubernetes/kubernetes/issues/48760

(Description from gh issue)
After upgrading the kubernetes cluster from 1.6.4 with the VSphere Cloud Provider enabled to 1.6.5/1.6.6 or 1.6.7, the cluster nodes don't have anymore InternalIPs

This is a request to backport the fix into 3.7.9

Version-Release number of selected component (if applicable):
OpenShift Master  : v3.7.9
Kubernetes Master : v1.7.5+a08f5ee

Expected behaviour:
Heapster to work

Comment 3 Seth Jennings 2018-02-05 15:26:12 UTC
Basically the issue is the kubelet calls out to the cloud provider to get the addresses of the instance on which it runs.  That call is allowed to multiple addresses of different types:

// These are valid address type of node.
const (
	NodeHostName    NodeAddressType = "Hostname"
	NodeExternalIP  NodeAddressType = "ExternalIP"
	NodeInternalIP  NodeAddressType = "InternalIP"
	NodeExternalDNS NodeAddressType = "ExternalDNS"
	NodeInternalDNS NodeAddressType = "InternalDNS"

The issue is that setNodeAddress() is only using the first address returned, and if that address is not the InternalIP, then it is skipped.  vSphere's method returns the ExternalIP first, thus the InternalIP is not set.

The only workaround would be if the vSphere instance.NodeAddresses() call could be modified to return the InternalIP first.  Then the ExternalIP would not be set, but I'm not sure if that is used for anything other that user friendly metadata.

Comment 4 Seth Jennings 2018-02-05 15:32:13 UTC
allowed to *return multiple addresses...

Comment 6 Seth Jennings 2018-02-09 02:41:56 UTC
Target release was incorrect.  This PR is for 3.7.  It has merged.  Going to QE.

Comment 8 weiwei jiang 2018-02-23 10:40:00 UTC
Checked with 
# openshift version 
openshift v3.7.31
kubernetes v1.7.6+a08f5eeb62
etcd 3.2.8

on vsphere with cloudprovider enabled.

and heapster work well.

# oc get pod 
NAME                         READY     STATUS    RESTARTS   AGE
hawkular-cassandra-1-krr87   1/1       Running   0          13m
hawkular-metrics-4pnmr       1/1       Running   0          13m
heapster-g78wb               1/1       Running   0          9m
# oc adm top node
NAME                             CPU(cores)   CPU%      MEMORY(bytes)   MEMORY%   
dhcp-66-146-181.nay.redhat.com   527m         13%       4188Mi          73%       
# oc adm top pod
NAME                         CPU(cores)   MEMORY(bytes)   
heapster-g78wb               2m           22Mi            
hawkular-metrics-4pnmr       29m          1396Mi          
hawkular-cassandra-1-krr87   322m         1754Mi

Comment 13 errata-xmlrpc 2018-04-05 09:34:33 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Comment 14 Jack Ottofaro 2018-10-31 21:34:38 UTC
Can someone tell me in which release of 3.9 would this fix have first appeared?

Note You need to log in before you can comment on or make changes to this bug.