Bug 1643348
Summary: | [vsphere] The "Internal IP/Host IP" of the infra nodes starts changing to the VIPs, and changes constantly/randomly all on its own, to any of these VIPs on eth0 ( confirmed by oc get hostsubnet output). | |||
---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Miheer Salunke <misalunk> | |
Component: | Cloud Compute | Assignee: | Dan Winship <danw> | |
Status: | CLOSED ERRATA | QA Contact: | Meng Bo <bmeng> | |
Severity: | high | Docs Contact: | ||
Priority: | high | |||
Version: | 3.11.0 | CC: | adeshpan, aos-bugs, danw, emahoney, jcrumple, jokerman, jrosenta, knakai, misalunk, mmccomas, openshift-bugs-escalate, wsun, zzhao | |
Target Milestone: | --- | |||
Target Release: | 4.1.0 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | Bug Fix | ||
Doc Text: |
Cause: A changed introduced in Kubernetes 1.11 affected nodes with many IP addresses in vSphere deployments.
Consequence: Under vSphere, a node hosting several Egress IPs or Router HA addresses would sporadically "forget" which of the IPs was its official "node IP" (even if that node IP had been explicitly specified in the node configuration) and start using one of the other ones, causing networking problems.
Fix: If a "node IP" is specified in the node configuration, it will be used correctly, regardless of how many other IPs the node has.
Result: Networking should work reliably.
|
Story Points: | --- | |
Clone Of: | ||||
: | 1666820 (view as bug list) | Environment: | ||
Last Closed: | 2019-06-04 10:40:52 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: |
Comment 11
Dan Winship
2018-11-01 15:35:55 UTC
OK... I'm provisionally calling this a kubelet bug (which I think corresponds to "Pod" in bugzilla?), though one could argue it was a vSphere CloudProvider bug instead. attachment 1500320 [details] shows that the Node resource is being updated with an incorrect InternalIP (while still keeping the node's correct IP as its ExternalIP): status: addresses: - address: x.y.z.17 # Right type: ExternalIP - address: x.y.z.30 # Wrong type: InternalIP - address: ... type: Hostname What is happening is that the kubelet periodically calls setNodeAddress() to update its node address. Since the node in question has a CloudProvider, setNodeAddress() first calls the cloud NodeAddresses() method, then keeps the first Address in the returned list that matches kl.nodeIP (the configmap-specified node IP), along with the first Address of each other type. The vSphere provider's NodeAddresses() just pulls all of the IP addresses off the default interface, and then returns each one as both an ExternalIP and an InternalIP. For each IP address, the ExternalIP always appears first, which means it will always be the one that matches kubelet.setNodeAddress()'s "first Address that matches kl.nodeIP" rule. But that means that kubelet will only end picking the right InternalIP if kl.nodeIP is the first IP in the list returned by NodeAddresses(). Apparently, due to the vagaries of either pkg/net or the kernel APIs, the node's oldest IP gets returned first up until there are more than 5 IPs on the interface, at which point the return value gets reordered for some reason and a different IP is listed first, throwing things into chaos. I'm not sure if vSphere's behavior here is correct: most other cloud providers do not return the same IP as both InternalIP and ExternalIP. (AFAICT only ovirt does.) However, the docs do not appear to forbid this, and https://kubernetes.io/docs/concepts/architecture/nodes/#addresses outright declares that "The usage of these fields varies depending on your cloud provider". So given that, I think that kubelet's logic should be changed so that instead of taking "the first address of any type that matches kl.nodeIP, followed by the first address of each other type", it should take "*every* address that matches kl.nodeIP, followed by the first address of each other type". And then in this case it would always return the kl.nodeIP-based ExternalIP and InternalIP, regardless of the order that the CloudProvider returned the addresses in. Alternatively, vSphere could be changed to not claim the IPs as both internal and external, but that would require doc updates to explain what it *should* be doing... (In reply to Dan Winship from comment #14) > OK... I'm provisionally calling this a kubelet bug (which I think > corresponds to "Pod" in bugzilla?), though one could argue it was a vSphere > CloudProvider bug instead. > > attachment 1500320 [details] shows that the Node resource is being updated > with an incorrect InternalIP (while still keeping the node's correct IP as > its ExternalIP): > > status: > addresses: > - address: x.y.z.17 # Right > type: ExternalIP > - address: x.y.z.30 # Wrong > type: InternalIP > - address: ... > type: Hostname > > What is happening is that the kubelet periodically calls setNodeAddress() to > update its node address. Since the node in question has a CloudProvider, > setNodeAddress() first calls the cloud NodeAddresses() method, then keeps > the first Address in the returned list that matches kl.nodeIP (the > configmap-specified node IP), along with the first Address of each other > type. > > The vSphere provider's NodeAddresses() just pulls all of the IP addresses > off the default interface, and then returns each one as both an ExternalIP > and an InternalIP. For each IP address, the ExternalIP always appears first, > which means it will always be the one that matches > kubelet.setNodeAddress()'s "first Address that matches kl.nodeIP" rule. But > that means that kubelet will only end picking the right InternalIP if > kl.nodeIP is the first IP in the list returned by NodeAddresses(). > Apparently, due to the vagaries of either pkg/net or the kernel APIs, the > node's oldest IP gets returned first up until there are more than 5 IPs on > the interface, at which point the return value gets reordered for some > reason and a different IP is listed first, throwing things into chaos. > > > I'm not sure if vSphere's behavior here is correct: most other cloud > providers do not return the same IP as both InternalIP and ExternalIP. > (AFAICT only ovirt does.) However, the docs do not appear to forbid this, > and https://kubernetes.io/docs/concepts/architecture/nodes/#addresses > outright declares that "The usage of these fields varies depending on your > cloud provider". > > So given that, I think that kubelet's logic should be changed so that > instead of taking "the first address of any type that matches kl.nodeIP, > followed by the first address of each other type", it should take "*every* > address that matches kl.nodeIP, followed by the first address of each other > type". And then in this case it would always return the kl.nodeIP-based > ExternalIP and InternalIP, regardless of the order that the CloudProvider > returned the addresses in. > I think this will need a fix in the kublet code which might need some time. > Alternatively, vSphere could be changed to not claim the IPs as both > internal and external, but that would require doc updates to explain what it > *should* be doing... How can we achieve this? Any pointers on this will be highly appreciated. (In reply to Miheer Salunke from comment #20) > (In reply to Dan Winship from comment #14) > > Alternatively, vSphere could be changed to not claim the IPs as both > > internal and external, but that would require doc updates to explain what it > > *should* be doing... > > How can we achieve this? Any pointers on this will be highly appreciated. No, that would also be a code change. As I commented in the support case, there is no workaround for the customer, other than limiting the number of failover/egress IPs on each node. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0758 |