Hide Forgot
Can you get "oc get node NODENAME -o yaml; oc get hostsubnet NODENAME -o yaml" for one of the nodes, both before and after a "flap". (There's lots of "oc get hostsubnet" output here, but no "oc get node" as far as I've seen.) It seems like probably the kubelet code is getting confused about what the node's real IP is, and incorrectly updating its Node resource, which then causes other things to be updated incorrectly based on that.
OK... I'm provisionally calling this a kubelet bug (which I think corresponds to "Pod" in bugzilla?), though one could argue it was a vSphere CloudProvider bug instead. attachment 1500320 [details] shows that the Node resource is being updated with an incorrect InternalIP (while still keeping the node's correct IP as its ExternalIP): status: addresses: - address: x.y.z.17 # Right type: ExternalIP - address: x.y.z.30 # Wrong type: InternalIP - address: ... type: Hostname What is happening is that the kubelet periodically calls setNodeAddress() to update its node address. Since the node in question has a CloudProvider, setNodeAddress() first calls the cloud NodeAddresses() method, then keeps the first Address in the returned list that matches kl.nodeIP (the configmap-specified node IP), along with the first Address of each other type. The vSphere provider's NodeAddresses() just pulls all of the IP addresses off the default interface, and then returns each one as both an ExternalIP and an InternalIP. For each IP address, the ExternalIP always appears first, which means it will always be the one that matches kubelet.setNodeAddress()'s "first Address that matches kl.nodeIP" rule. But that means that kubelet will only end picking the right InternalIP if kl.nodeIP is the first IP in the list returned by NodeAddresses(). Apparently, due to the vagaries of either pkg/net or the kernel APIs, the node's oldest IP gets returned first up until there are more than 5 IPs on the interface, at which point the return value gets reordered for some reason and a different IP is listed first, throwing things into chaos. I'm not sure if vSphere's behavior here is correct: most other cloud providers do not return the same IP as both InternalIP and ExternalIP. (AFAICT only ovirt does.) However, the docs do not appear to forbid this, and https://kubernetes.io/docs/concepts/architecture/nodes/#addresses outright declares that "The usage of these fields varies depending on your cloud provider". So given that, I think that kubelet's logic should be changed so that instead of taking "the first address of any type that matches kl.nodeIP, followed by the first address of each other type", it should take "*every* address that matches kl.nodeIP, followed by the first address of each other type". And then in this case it would always return the kl.nodeIP-based ExternalIP and InternalIP, regardless of the order that the CloudProvider returned the addresses in. Alternatively, vSphere could be changed to not claim the IPs as both internal and external, but that would require doc updates to explain what it *should* be doing...
(In reply to Dan Winship from comment #14) > OK... I'm provisionally calling this a kubelet bug (which I think > corresponds to "Pod" in bugzilla?), though one could argue it was a vSphere > CloudProvider bug instead. > > attachment 1500320 [details] shows that the Node resource is being updated > with an incorrect InternalIP (while still keeping the node's correct IP as > its ExternalIP): > > status: > addresses: > - address: x.y.z.17 # Right > type: ExternalIP > - address: x.y.z.30 # Wrong > type: InternalIP > - address: ... > type: Hostname > > What is happening is that the kubelet periodically calls setNodeAddress() to > update its node address. Since the node in question has a CloudProvider, > setNodeAddress() first calls the cloud NodeAddresses() method, then keeps > the first Address in the returned list that matches kl.nodeIP (the > configmap-specified node IP), along with the first Address of each other > type. > > The vSphere provider's NodeAddresses() just pulls all of the IP addresses > off the default interface, and then returns each one as both an ExternalIP > and an InternalIP. For each IP address, the ExternalIP always appears first, > which means it will always be the one that matches > kubelet.setNodeAddress()'s "first Address that matches kl.nodeIP" rule. But > that means that kubelet will only end picking the right InternalIP if > kl.nodeIP is the first IP in the list returned by NodeAddresses(). > Apparently, due to the vagaries of either pkg/net or the kernel APIs, the > node's oldest IP gets returned first up until there are more than 5 IPs on > the interface, at which point the return value gets reordered for some > reason and a different IP is listed first, throwing things into chaos. > > > I'm not sure if vSphere's behavior here is correct: most other cloud > providers do not return the same IP as both InternalIP and ExternalIP. > (AFAICT only ovirt does.) However, the docs do not appear to forbid this, > and https://kubernetes.io/docs/concepts/architecture/nodes/#addresses > outright declares that "The usage of these fields varies depending on your > cloud provider". > > So given that, I think that kubelet's logic should be changed so that > instead of taking "the first address of any type that matches kl.nodeIP, > followed by the first address of each other type", it should take "*every* > address that matches kl.nodeIP, followed by the first address of each other > type". And then in this case it would always return the kl.nodeIP-based > ExternalIP and InternalIP, regardless of the order that the CloudProvider > returned the addresses in. > I think this will need a fix in the kublet code which might need some time. > Alternatively, vSphere could be changed to not claim the IPs as both > internal and external, but that would require doc updates to explain what it > *should* be doing... How can we achieve this? Any pointers on this will be highly appreciated.
(In reply to Miheer Salunke from comment #20) > (In reply to Dan Winship from comment #14) > > Alternatively, vSphere could be changed to not claim the IPs as both > > internal and external, but that would require doc updates to explain what it > > *should* be doing... > > How can we achieve this? Any pointers on this will be highly appreciated. No, that would also be a code change. As I commented in the support case, there is no workaround for the customer, other than limiting the number of failover/egress IPs on each node.
Filed https://github.com/kubernetes/kubernetes/pull/70805
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0758