Description of problem: on a baremetal, single-stack IPv6 deployment, testing any other CNI is impossible because kube-proxy gets configured to bind on "0.0.0.0" which fails the deployment Version-Release number of selected component (if applicable): OpenShift 4.4 How reproducible: deploy with any other networkType than OVNKubernetes on a single-stack IPv6 deployment will result in a failed deployment when kube-proxy tries to bind to 0.0.0.0 Steps to Reproduce: 1. single-stack IPv6 deployment 2. networkType = Calico 3. observe kube-prody that cannot start because its configuration points to 0.0.0.0 Actual results: deployment fails Expected results: we should be able to TEST other CNI with a single-stack IPv6 deployment just like we can TEST them with IPv4 Additional info:
this is the kube-proxy logs we get when trying another CNI than OVNKubernetes on an IPv6 single-stack, bare metal deployment: 2020-05-04T13:43:49.547255619+00:00 stderr F I0504 13:43:49.547248 1 server.go:536] Neither kubeconfig file nor master URL was specified. Falling back to in-cluster config. 2020-05-04T13:43:49.555761603+00:00 stderr F I0504 13:43:49.555662 1 node.go:135] Successfully retrieved node IP: 2001:4958:a:3e00:0:1:2:10 2020-05-04T13:43:49.555761603+00:00 stderr F I0504 13:43:49.555679 1 server_others.go:145] Using iptables Proxier. 2020-05-04T13:43:49.555761603+00:00 stderr F F0504 13:43:49.555715 1 server.go:485] unable to create proxier: clusterCIDR 2001:4958:a:3e00:0:1:100:0/104 has incorrect IP version: expect isIPv6=false and this is the generated kube-proxy configuration that wrongfully tries to bind on "0.0.0.0" apiVersion: kubeproxy.config.k8s.io/v1alpha1 bindAddress: 0.0.0.0 clientConnection: acceptContentTypes: "" burst: 0 contentType: "" kubeconfig: "" qps: 0 clusterCIDR: 2001:4958:a:3e00:0:1:100:0/104 configSyncPeriod: 0s conntrack: maxPerCore: null min: null tcpCloseWaitTimeout: null tcpEstablishedTimeout: null enableProfiling: false healthzBindAddress: 0.0.0.0:10256 hostnameOverride: "" iptables: masqueradeAll: false masqueradeBit: null minSyncPeriod: 0s syncPeriod: 0s ipvs: excludeCIDRs: null minSyncPeriod: 0s scheduler: "" strictARP: false syncPeriod: 0s kind: KubeProxyConfiguration metricsBindAddress: 0.0.0.0:9101 mode: iptables nodePortAddresses: null oomScoreAdj: null portRange: "" udpIdleTimeout: 0s winkernel: enableDSR: false networkName: "" sourceVip: ""
Hi, Marius Could you help verify this bug, it's need ipv6 and calico cluster?
just some more information: so basically we're in a lab in the biggest canadian telco, trying to prove that openshift can be used to build an open source edge platform for them, we're running single-stack IPv6 successfully but they need to "try" and "test" calico to integrate the pods into their network fabric.. We had the same blocker on OCS where the config generated by the operator was simply IPv4-only.. but OCS refused to fix so the customer is now using rook/ceph instead of OCS. Please, please consider this bug as important and keep in mind that it could be as simple as generating a configuration that binds to "::/0" in case of IPv6 and "0.0.0.0" for IPv4.. it was as simple as that in the similar OCS issue (BZ 1831693, same principle, operator generating IPv4-only config).
(In reply to Boris Deschenes from comment #6) > just some more information: so basically we're in a lab in the biggest > canadian telco, trying to prove that openshift can be used to build an open > source edge platform for them, we're running single-stack IPv6 successfully > but they need to "try" and "test" calico to integrate the pods into their > network fabric.. We had the same blocker on OCS where the config generated > by the operator was simply IPv4-only.. but OCS refused to fix so the > customer is now using rook/ceph instead of OCS. > > Please, please consider this bug as important and keep in mind that it could > be as simple as generating a configuration that binds to "::/0" in case of > IPv6 and "0.0.0.0" for IPv4.. it was as simple as that in the similar OCS > issue (BZ 1831693, same principle, operator generating IPv4-only config). Hi, Boris, I'm not sure if you can help verified this issue with ipv6 and Calico plugin, since for now QE do not any experiences for ipv6 and Calico plugin. thanks
I had noticed that there's actually a configurable `bindAddress` for kube proxy in the Network CRD for the cluster-network-operator. Had a chat with Boris and sent him along the parameters for kube proxy in the Network CRD from https://github.com/openshift/cluster-network-operator/#configuring-kube-proxy Also mocked up a sample usage of it (untested) in this pastebin @ https://pastebin.com/Yf1qQbMK
Hi, Boris have you try this comment 10 ? if this works for you and if so we can verified this bug. thanks.
Hi, yes we've tried adding the kube-proxy config parameters in the network CRD but it looks like these parameter are simply ignored by the operator, we do not see any change in the configuration of kube-proxy, whether we pass these additional parameters or not. I think Doug had the same result in his lab.. So, although this overriding of kube-proxy configuration through the network CRD could really be our way out, it seems the mechanism is not currently working.
I think this is a bug in kube-proxy, it should use the nodeIP of the node if the bind address is an unspecified address, actually for golang there is no difference when binding between using 0.0.0.0 or :: https://github.com/kubernetes/kubernetes/issues/88458 In this case we can see in the log that it successfully retrieved an IPv6 node IP, hence it should work in IPv6 mode. > 2020-05-04T13:43:49.555761603+00:00 stderr F I0504 13:43:49.555662 1 node.go:135] Successfully retrieved node IP: 2001:4958:a:3e00:0:1:2:10 I'm new here, I'll fix it upstream and then follow up here for guinace. I don't know how the bugzilla assignments/workflow works though or if I should/can take over the ticket :-)
Submitted PR to fix upstream https://github.com/kubernetes/kubernetes/pull/91725 I have to check now why https://bugzilla.redhat.com/show_bug.cgi?id=1831006#c10 didn't work
In the meantime, as a workaround, the cluster-network-operator will configure the bind-address with the same family that the ClusterCIDR. So, omitting the bindAddress in the configuration should make it work in IPv6 mode, if the ClusterCIDR is IPv6
As this PR is in open https://github.com/openshift/sdn/pull/152, move this bug to 'post' for now.
Moving to 4.6 so we can track the backport.
*** Bug 1847969 has been marked as a duplicate of this bug. ***
could you help verified this bug with calico and ipv6 cluster?
hi guys, ok her is the result of the same deeployment with the patch https://github.com/kubernetes/kubernetes/pull/91725 in place: as we can see below, kube-proxy no longer tries to bind to 0.0.0.0 in an IPv6 environment and correctly assumes IPv6 operations. I still see "incorrect IP version" messages but those could be coming in from calico configuration. The deployment stalls without much errors in the logs.. I end up with masters running 4 pods: * calico-node * calico-typha * kube-prody * kube-multus since it stalls at this point an my only "errors" are in kube-proxy logs, I'll investigate those.. kube-proxy logs: 2020-08-03T14:41:16.021578841+00:00 stderr F I0803 14:41:16.021371 1 server_others.go:96] IPv6 bind address (::), assume IPv6 operation 2020-08-03T14:41:16.025356057+00:00 stderr F W0803 14:41:16.025326 1 proxier.go:625] Failed to read file /lib/modules/4.18.0-193.13.2.el8_2.x86_64/modules.builtin with error open /lib/modules/4.18.0-193.13.2.el8_2.x86_64/modules.builtin: no such file or directory. You can ignore this message when kube-proxy is running inside container without mounting /lib/modules 2020-08-03T14:41:16.026602419+00:00 stderr F W0803 14:41:16.026571 1 proxier.go:635] Failed to load kernel module ip_vs with modprobe. You can ignore this message when kube-proxy is running inside container without mounting /lib/modules 2020-08-03T14:41:16.027429499+00:00 stderr F W0803 14:41:16.027410 1 proxier.go:635] Failed to load kernel module ip_vs_rr with modprobe. You can ignore this message when kube-proxy is running inside container without mounting /lib/modules 2020-08-03T14:41:16.028390299+00:00 stderr F W0803 14:41:16.028368 1 proxier.go:635] Failed to load kernel module ip_vs_wrr with modprobe. You can ignore this message when kube-proxy is running inside container without mounting /lib/modules 2020-08-03T14:41:16.029178918+00:00 stderr F W0803 14:41:16.029160 1 proxier.go:635] Failed to load kernel module ip_vs_sh with modprobe. You can ignore this message when kube-proxy is running inside container without mounting /lib/modules 2020-08-03T14:41:16.030092526+00:00 stderr F W0803 14:41:16.030073 1 proxier.go:635] Failed to load kernel module nf_conntrack_ipv4 with modprobe. You can ignore this message when kube-proxy is running inside container without mounting /lib/modules 2020-08-03T14:41:16.030165502+00:00 stderr F I0803 14:41:16.030148 1 server.go:548] Neither kubeconfig file nor master URL was specified. Falling back to in-cluster config. 2020-08-03T14:41:16.042493177+00:00 stderr F I0803 14:41:16.042464 1 node.go:136] Successfully retrieved node IP: 2001:4958:a:3e00:0:1:2:30 2020-08-03T14:41:16.042513420+00:00 stderr F I0803 14:41:16.042491 1 server_others.go:186] Using iptables Proxier. 2020-08-03T14:41:16.042765223+00:00 stderr F I0803 14:41:16.042742 1 server.go:583] Version: v0.0.0-master+$Format:%h$ 2020-08-03T14:41:16.043081454+00:00 stderr F I0803 14:41:16.043063 1 conntrack.go:52] Setting nf_conntrack_max to 262144 2020-08-03T14:41:16.043172109+00:00 stderr F I0803 14:41:16.043155 1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_established' to 86400 2020-08-03T14:41:16.043224015+00:00 stderr F I0803 14:41:16.043208 1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_close_wait' to 3600 2020-08-03T14:41:16.043442180+00:00 stderr F I0803 14:41:16.043401 1 config.go:133] Starting endpoints config controller 2020-08-03T14:41:16.043455412+00:00 stderr F I0803 14:41:16.043439 1 shared_informer.go:223] Waiting for caches to sync for endpoints config 2020-08-03T14:41:16.043495117+00:00 stderr F I0803 14:41:16.043479 1 config.go:315] Starting service config controller 2020-08-03T14:41:16.043505359+00:00 stderr F I0803 14:41:16.043494 1 shared_informer.go:223] Waiting for caches to sync for service config 2020-08-03T14:41:16.047663280+00:00 stderr F E0803 14:41:16.047635 1 utils.go:223] 192.0.2.2 in endpoints has incorrect IP version (service openshift-etcd/host-etcd). 2020-08-03T14:41:16.047695848+00:00 stderr F E0803 14:41:16.047678 1 utils.go:223] 192.0.2.200 in endpoints has incorrect IP version (service openshift-etcd/host-etcd). 2020-08-03T14:41:16.047695848+00:00 stderr F E0803 14:41:16.047691 1 utils.go:223] 192.0.2.3 in endpoints has incorrect IP version (service openshift-etcd/host-etcd). 2020-08-03T14:41:16.047706967+00:00 stderr F E0803 14:41:16.047698 1 utils.go:223] 192.0.2.4 in endpoints has incorrect IP version (service openshift-etcd/host-etcd). 2020-08-03T14:41:16.143769916+00:00 stderr F I0803 14:41:16.143660 1 shared_informer.go:230] Caches are synced for service config 2020-08-03T14:41:16.143769916+00:00 stderr F I0803 14:41:16.143672 1 shared_informer.go:230] Caches are synced for endpoints config
thanks Boris Deschenes Can we move this bug to 'verified' since the original issue already be fixed.
agreed, verified it is
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196