Bug 1643348 - [vsphere] The "Internal IP/Host IP" of the infra nodes starts changing to the VIPs, and changes constantly/randomly all on its own, to any of these VIPs on eth0 ( confirmed by oc get hostsubnet output).
Summary: [vsphere] The "Internal IP/Host IP" of the infra nodes starts changing to the...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cloud Compute
Version: 3.11.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.1.0
Assignee: Dan Winship
QA Contact: Meng Bo
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-10-26 04:49 UTC by Miheer Salunke
Modified: 2019-06-04 10:40 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: A changed introduced in Kubernetes 1.11 affected nodes with many IP addresses in vSphere deployments. Consequence: Under vSphere, a node hosting several Egress IPs or Router HA addresses would sporadically "forget" which of the IPs was its official "node IP" (even if that node IP had been explicitly specified in the node configuration) and start using one of the other ones, causing networking problems. Fix: If a "node IP" is specified in the node configuration, it will be used correctly, regardless of how many other IPs the node has. Result: Networking should work reliably.
Clone Of:
: 1666820 (view as bug list)
Environment:
Last Closed: 2019-06-04 10:40:52 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github kubernetes kubernetes pull 70805 0 None closed Fix a CloudProvider-vs-nodeIP edge case 2020-04-20 11:07:34 UTC
Github openshift origin pull 21807 0 None closed UPSTREAM: 70805: Fix a CloudProvider-vs-nodeIP edge case 2020-04-20 11:07:34 UTC
Red Hat Product Errata RHBA-2019:0758 0 None None None 2019-06-04 10:40:58 UTC

Comment 11 Dan Winship 2018-11-01 15:35:55 UTC
Can you get "oc get node NODENAME -o yaml; oc get hostsubnet NODENAME -o yaml" for one of the nodes, both before and after a "flap". (There's lots of "oc get hostsubnet" output here, but no "oc get node" as far as I've seen.)

It seems like probably the kubelet code is getting confused about what the node's real IP is, and incorrectly updating its Node resource, which then causes other things to be updated incorrectly based on that.

Comment 14 Dan Winship 2018-11-05 20:19:54 UTC
OK... I'm provisionally calling this a kubelet bug (which I think corresponds to "Pod" in bugzilla?), though one could argue it was a vSphere CloudProvider bug instead.

attachment 1500320 [details] shows that the Node resource is being updated with an incorrect InternalIP (while still keeping the node's correct IP as its ExternalIP):

    status:
      addresses:
      - address: x.y.z.17  # Right
        type: ExternalIP
      - address: x.y.z.30  # Wrong
        type: InternalIP
      - address: ...
        type: Hostname

What is happening is that the kubelet periodically calls setNodeAddress() to update its node address. Since the node in question has a CloudProvider, setNodeAddress() first calls the cloud NodeAddresses() method, then keeps the first Address in the returned list that matches kl.nodeIP (the configmap-specified node IP), along with the first Address of each other type.

The vSphere provider's NodeAddresses() just pulls all of the IP addresses off the default interface, and then returns each one as both an ExternalIP and an InternalIP. For each IP address, the ExternalIP always appears first, which means it will always be the one that matches kubelet.setNodeAddress()'s "first Address that matches kl.nodeIP" rule. But that means that kubelet will only end picking the right InternalIP if kl.nodeIP is the first IP in the list returned by NodeAddresses(). Apparently, due to the vagaries of either pkg/net or the kernel APIs, the node's oldest IP gets returned first up until there are more than 5 IPs on the interface, at which point the return value gets reordered for some reason and a different IP is listed first, throwing things into chaos.


I'm not sure if vSphere's behavior here is correct: most other cloud providers do not return the same IP as both InternalIP and ExternalIP. (AFAICT only ovirt does.) However, the docs do not appear to forbid this, and https://kubernetes.io/docs/concepts/architecture/nodes/#addresses outright declares that "The usage of these fields varies depending on your cloud provider".

So given that, I think that kubelet's logic should be changed so that instead of taking "the first address of any type that matches kl.nodeIP, followed by the first address of each other type", it should take "*every* address that matches kl.nodeIP, followed by the first address of each other type". And then in this case it would always return the kl.nodeIP-based ExternalIP and InternalIP, regardless of the order that the CloudProvider returned the addresses in.

Alternatively, vSphere could be changed to not claim the IPs as both internal and external, but that would require doc updates to explain what it *should* be doing...

Comment 20 Miheer Salunke 2018-11-07 04:12:35 UTC
(In reply to Dan Winship from comment #14)
> OK... I'm provisionally calling this a kubelet bug (which I think
> corresponds to "Pod" in bugzilla?), though one could argue it was a vSphere
> CloudProvider bug instead.
> 
> attachment 1500320 [details] shows that the Node resource is being updated
> with an incorrect InternalIP (while still keeping the node's correct IP as
> its ExternalIP):
> 
>     status:
>       addresses:
>       - address: x.y.z.17  # Right
>         type: ExternalIP
>       - address: x.y.z.30  # Wrong
>         type: InternalIP
>       - address: ...
>         type: Hostname
> 
> What is happening is that the kubelet periodically calls setNodeAddress() to
> update its node address. Since the node in question has a CloudProvider,
> setNodeAddress() first calls the cloud NodeAddresses() method, then keeps
> the first Address in the returned list that matches kl.nodeIP (the
> configmap-specified node IP), along with the first Address of each other
> type.
> 
> The vSphere provider's NodeAddresses() just pulls all of the IP addresses
> off the default interface, and then returns each one as both an ExternalIP
> and an InternalIP. For each IP address, the ExternalIP always appears first,
> which means it will always be the one that matches
> kubelet.setNodeAddress()'s "first Address that matches kl.nodeIP" rule. But
> that means that kubelet will only end picking the right InternalIP if
> kl.nodeIP is the first IP in the list returned by NodeAddresses().
> Apparently, due to the vagaries of either pkg/net or the kernel APIs, the
> node's oldest IP gets returned first up until there are more than 5 IPs on
> the interface, at which point the return value gets reordered for some
> reason and a different IP is listed first, throwing things into chaos.
> 
> 
> I'm not sure if vSphere's behavior here is correct: most other cloud
> providers do not return the same IP as both InternalIP and ExternalIP.
> (AFAICT only ovirt does.) However, the docs do not appear to forbid this,
> and https://kubernetes.io/docs/concepts/architecture/nodes/#addresses
> outright declares that "The usage of these fields varies depending on your
> cloud provider".
> 
> So given that, I think that kubelet's logic should be changed so that
> instead of taking "the first address of any type that matches kl.nodeIP,
> followed by the first address of each other type", it should take "*every*
> address that matches kl.nodeIP, followed by the first address of each other
> type". And then in this case it would always return the kl.nodeIP-based
> ExternalIP and InternalIP, regardless of the order that the CloudProvider
> returned the addresses in.
> 

I think this will need a fix in the kublet code which might need some time.

> Alternatively, vSphere could be changed to not claim the IPs as both
> internal and external, but that would require doc updates to explain what it
> *should* be doing...

How can we achieve this? Any pointers on this will be highly appreciated.

Comment 21 Dan Winship 2018-11-07 13:50:05 UTC
(In reply to Miheer Salunke from comment #20)
> (In reply to Dan Winship from comment #14)
> > Alternatively, vSphere could be changed to not claim the IPs as both
> > internal and external, but that would require doc updates to explain what it
> > *should* be doing...
> 
> How can we achieve this? Any pointers on this will be highly appreciated.

No, that would also be a code change. As I commented in the support case, there is no workaround for the customer, other than limiting the number of failover/egress IPs on each node.

Comment 22 Dan Winship 2018-11-08 15:51:24 UTC
Filed https://github.com/kubernetes/kubernetes/pull/70805

Comment 35 errata-xmlrpc 2019-06-04 10:40:52 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758


Note You need to log in before you can comment on or make changes to this bug.