Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1527315 - Heapster is unable to connect to nodes due to InternalIP missing from node object [NEEDINFO]
Heapster is unable to connect to nodes due to InternalIP missing from node ob...
Status: CLOSED ERRATA
Product: OpenShift Container Platform
Classification: Red Hat
Component: Pod (Show other bugs)
3.7.1
Unspecified Unspecified
unspecified Severity high
: ---
: 3.7.z
Assigned To: Seth Jennings
DeShuai Ma
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-12-19 03:02 EST by Takeshi Larsson
Modified: 2018-10-31 17:34 EDT (History)
9 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Fixes an issue when using the vsphere cloud provider where the InternalIP information is not populated for Nodes. This issue lead to problems with Heapster since it uses the InternalIP for gathering metrics.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2018-04-05 05:34:33 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
jack.ottofaro: needinfo?


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:0636 None None None 2018-04-05 05:35 EDT

  None (edit)
Description Takeshi Larsson 2017-12-19 03:02:48 EST
Description of problem:

Heapster uses the InternalIP key value to retrieve metrics. When using the vsphere cloud provider.

We upgraded from latest async 3.6 release and then upgraded to 3.7.9. Heapster broke as it was not able to retrieve the InternalIP key from the node object.

Turns out the root cause is... https://github.com/kubernetes/kubernetes/issues/48760

(Description from gh issue)
After upgrading the kubernetes cluster from 1.6.4 with the VSphere Cloud Provider enabled to 1.6.5/1.6.6 or 1.6.7, the cluster nodes don't have anymore InternalIPs

This is a request to backport the fix into 3.7.9


Version-Release number of selected component (if applicable):
OpenShift Master  : v3.7.9
Kubernetes Master : v1.7.5+a08f5ee


Expected behaviour:
Heapster to work
Comment 3 Seth Jennings 2018-02-05 10:26:12 EST
Basically the issue is the kubelet calls out to the cloud provider to get the addresses of the instance on which it runs.  That call is allowed to multiple addresses of different types:

// These are valid address type of node.
const (
	NodeHostName    NodeAddressType = "Hostname"
	NodeExternalIP  NodeAddressType = "ExternalIP"
	NodeInternalIP  NodeAddressType = "InternalIP"
	NodeExternalDNS NodeAddressType = "ExternalDNS"
	NodeInternalDNS NodeAddressType = "InternalDNS"
)

The issue is that setNodeAddress() is only using the first address returned, and if that address is not the InternalIP, then it is skipped.  vSphere's method returns the ExternalIP first, thus the InternalIP is not set.

The only workaround would be if the vSphere instance.NodeAddresses() call could be modified to return the InternalIP first.  Then the ExternalIP would not be set, but I'm not sure if that is used for anything other that user friendly metadata.
Comment 4 Seth Jennings 2018-02-05 10:32:13 EST
allowed to *return multiple addresses...
Comment 6 Seth Jennings 2018-02-08 21:41:56 EST
Target release was incorrect.  This PR is for 3.7.  It has merged.  Going to QE.
Comment 8 weiwei jiang 2018-02-23 05:40:00 EST
Checked with 
# openshift version 
openshift v3.7.31
kubernetes v1.7.6+a08f5eeb62
etcd 3.2.8

on vsphere with cloudprovider enabled.

and heapster work well.

# oc get pod 
NAME                         READY     STATUS    RESTARTS   AGE
hawkular-cassandra-1-krr87   1/1       Running   0          13m
hawkular-metrics-4pnmr       1/1       Running   0          13m
heapster-g78wb               1/1       Running   0          9m
# oc adm top node
NAME                             CPU(cores)   CPU%      MEMORY(bytes)   MEMORY%   
dhcp-66-146-181.nay.redhat.com   527m         13%       4188Mi          73%       
# oc adm top pod
NAME                         CPU(cores)   MEMORY(bytes)   
heapster-g78wb               2m           22Mi            
hawkular-metrics-4pnmr       29m          1396Mi          
hawkular-cassandra-1-krr87   322m         1754Mi
Comment 13 errata-xmlrpc 2018-04-05 05:34:33 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0636
Comment 14 Jack Ottofaro 2018-10-31 17:34:38 EDT
Can someone tell me in which release of 3.9 would this fix have first appeared?

Note You need to log in before you can comment on or make changes to this bug.