1797655 – In IPv6 bare metal deployment kubelet binds on a VIP instead of the local address

Bug 1797655 - In IPv6 bare metal deployment kubelet binds on a VIP instead of the local address

Summary: In IPv6 bare metal deployment kubelet binds on a VIP instead of the local add...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Machine Config Operator
Sub Component:
Version:	4.3.z
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Target Release:	4.3.z
Assignee:	Antoni Segura Puimedon
QA Contact:	Marius Cornea
Docs Contact:
URL:
Whiteboard:
Depends On:	1798788
Blocks:	1771572
TreeView+	depends on / blocked

Reported:	2020-02-03 15:05 UTC by Marius Cornea
Modified:	2020-03-10 23:53 UTC (History)
CC List:	12 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-03-10 23:53:17 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift machine-config-operator pull 1466	0	None	closed	Bug 1797647: [release-4.3] [baremetal] Ipv6 non virtual ip fix	2020-05-12 19:35:26 UTC
Red Hat Product Errata	RHBA-2020:0676	0	None	None	None	2020-03-10 23:53:44 UTC

Description Marius Cornea 2020-02-03 15:05:00 UTC

Description of problem:
In IPv6 bare metal deployment kubelet binds on a VIP instead of the local address.

[root@master-1 core]# ip a s dev ens4
3: ens4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 52:54:00:50:be:71 brd ff:ff:ff:ff:ff:ff
    inet6 fd2e:6f44:5dd8:c956::5/128 scope global nodad deprecated noprefixroute 
       valid_lft forever preferred_lft 0sec
    inet6 fd2e:6f44:5dd8:c956::2/128 scope global nodad deprecated noprefixroute 
       valid_lft forever preferred_lft 0sec
    inet6 fd2e:6f44:5dd8:c956::134/128 scope global dynamic noprefixroute 
       valid_lft 3378sec preferred_lft 3378sec
    inet6 fe80::5054:ff:fe50:be71/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever

fd2e:6f44:5dd8:c956::5 and fd2e:6f44:5dd8:c956::2 are VIPs while fd2e:6f44:5dd8:c956::134 is the local address.

root      3013  7.3  0.5 3525436 169596 ?      Ssl  Jan31 289:18 /usr/bin/hyperkube kubelet --config=/etc/kubernetes/kubelet.conf --bootstrap-kubeconfig=/etc/kubernetes/kubeconfig --kubeconfig=/var/lib/kubelet/kubeconfig --container-runtime=remote --container-runtime-endpoint=/var/run/crio/crio.sock --node-labels=node-role.kubernetes.io/master,node.openshift.io/os_id=rhcos --node-ip :: --minimum-container-ttl-duration=6m0s --cloud-provider= --volume-plugin-dir=/etc/kubernetes/kubelet-plugins/volume/exec --v=3

kuberenets endpoints include the VIP:
[kni@provisionhost-0 ~]$ oc get endpoints
NAME         ENDPOINTS                       AGE
kubernetes   [fd2e:6f44:5dd8:c956::5]:6443   2d17h


Version-Release number of selected component (if applicable):
4.3.0-0.nightly-2020-01-29-114541-ipv6.1

How reproducible:
100%

Steps to Reproduce:
1. Deploy bare metal env with IPv6 control plane
2. Check kubernetes endpoint

Actual results:
Include VIPs instead of interface local addresses.

Expected results:
Kubelet should bind on the local address.

Additional info:

Comment 1 Dan Winship 2020-02-04 18:36:18 UTC

kubelet is behaving as requested here (selecting the first IPv6 address). There was some discussion of this on Slack the other day. I'm not sure if the decision was made to change the kubelet config or to tweak whatever is adding the addresses to keep them in the right order.

(The other possibility is to make kubelet recognize that the addresses with the "deprecated" flag are secondary and shouldn't be chosen, but that requires information that isn't available in the golang net API and so would require rewriting the k8s utilnet code to use netlink directly.)

Comment 2 Russell Bryant 2020-02-05 15:47:18 UTC

asegurap is looking at resolving this by specifying the right IP directly to kubelet with a change in MCO

Comment 3 Russell Bryant 2020-02-07 16:14:47 UTC

It turns out this bug shows two separate issues.

kubelet binding to a VIP is one problem and we almost have a fix for that.  You can see that looking at `oc get nodes -o wide` or `oc get nodes -o yaml`.

When you see VIPs in `oc get endpoints`, that's actually kube-apiserver using the wrong IP.  We have to fix that separately.

Comment 4 Russell Bryant 2020-02-07 21:35:29 UTC

4.3.0-0.nightly-2020-02-06-120247-ipv6.6 should have fixes for both kubelet and kube-apiserver

Comment 8 errata-xmlrpc 2020-03-10 23:53:17 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0676

Note You need to log in before you can comment on or make changes to this bug.