Hide Forgot
Created attachment 1758890 [details] install-config.yaml Description of problem: When installing clusters with IPv6 networking and OVNKubernetes, operators like etcd are unable to access api's endpoint because of some kind of routing loop. install-config.yaml relevant sections: networking: networkType: OVNKubernetes clusterNetwork: - cidr: 2002:db8::/53 hostPrefix: 64 machineNetwork: - cidr: 1001:db8::/120 serviceNetwork: - 2003:db8::/112 Any pod which tries to access the api, will try to use 2003:db8::1. For example: # oc -n assisted-installer logs assisted-installer-controller-8xg7h W0222 07:25:15.306683 1 client_config.go:608] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work. 2021/02/22 07:25:45 Failed to create k8 client failed to create runtime client: Get "https://[2003:db8::1]:443/api?timeout=32s": dial tcp [2003:db8::1]:443: i/o timeout Routing: $ ip -6 r ::1 dev lo proto kernel metric 256 pref medium 1001:db8::58 dev br-ex proto kernel metric 100 pref medium 1001:db8::64 dev br-ex proto kernel metric 256 pref medium 1001:db8::65 dev br-ex proto kernel metric 256 pref medium 1001:db8::/120 dev br-ex proto ra metric 100 pref medium 1001:db8:0:200::16 dev ens4 proto kernel metric 101 pref medium 1001:db8:0:200::/120 dev ens4 proto ra metric 101 pref medium 2002:db8:0:1::/64 dev ovn-k8s-mp0 proto kernel metric 256 pref medium 2002:db8::/53 via 2002:db8:0:1::1 dev ovn-k8s-mp0 metric 1024 pref medium 2003:db8::/112 via 2002:db8:0:1::1 dev ovn-k8s-mp0 metric 1024 pref medium fd99::/64 dev ovn-k8s-gw0 proto kernel metric 256 pref medium fe80::/64 dev br-ex proto kernel metric 100 pref medium fe80::/64 dev ens4 proto kernel metric 101 pref medium fe80::/64 dev ovn-k8s-mp0 proto kernel metric 256 pref medium fe80::/64 dev br-local proto kernel metric 256 pref medium fe80::/64 dev ovn-k8s-gw0 proto kernel metric 256 pref medium fe80::/64 dev genev_sys_6081 proto kernel metric 256 pref medium fe80::/64 dev 5989f2c92208697 proto kernel metric 256 pref medium fe80::/64 dev 42b70f241959acf proto kernel metric 256 pref medium fe80::/64 dev c13535f2eedf462 proto kernel metric 256 pref medium fe80::/64 dev 47e6bc22263d67c proto kernel metric 256 pref medium fe80::/64 dev b2632219bce6aee proto kernel metric 256 pref medium default via fe80::5054:ff:fed8:2c97 dev br-ex proto ra metric 100 pref medium default via fe80::5054:ff:fe9d:ef27 dev ens4 proto ra metric 101 pref medium Version-Release number of selected component (if applicable): # oc adm release info $OPENSHIFT_INSTALL_RELEASE_IMAGE | grep ovn-ku ovn-kubernetes sha256:8a8467d768151f1070e76916b9fcc2c9e05ecf98b5517db558e8eaba615e5447 # oc adm release info $OPENSHIFT_INSTALL_RELEASE_IMAGE --pullspecs | grep ovn-ku ovn-kubernetes quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:8a8467d768151f1070e76916b9fcc2c9e05ecf98b5517db558e8eaba615e5447 Tracepath output: $ tracepath6 2003:db8::1 -n 1?: [LOCALHOST] 0.018ms pmtu 1400 1: 2002:db8:0:1::1 0.751ms asymm 2 1: 2002:db8:0:1::1 0.392ms asymm 2 2: fd99::1 0.681ms 3: 2002:db8:0:1::1 0.884ms asymm 4 4: fd99::1 0.680ms 5: 2002:db8:0:1::1 0.947ms asymm 6 6: fd99::1 0.687ms 7: 2002:db8:0:1::1 1.157ms asymm 8 8: fd99::1 0.898ms 9: 2002:db8:0:1::1 1.344ms asymm 10 10: fd99::1 0.954ms 11: 2002:db8:0:1::1 1.178ms asymm 12 12: fd99::1 0.932ms 13: 2002:db8:0:1::1 1.417ms asymm 14 14: fd99::1 0.682ms 15: 2002:db8:0:1::1 1.459ms asymm 16 16: fd99::1 1.073ms 17: 2002:db8:0:1::1 1.202ms asymm 18 18: fd99::1 0.984ms 19: 2002:db8:0:1::1 1.341ms asymm 20 20: fd99::1 1.056ms 21: 2002:db8:0:1::1 1.251ms asymm 22 22: fd99::1 0.777ms 23: 2002:db8:0:1::1 1.779ms asymm 24 24: fd99::1 1.090ms 25: 2002:db8:0:1::1 1.292ms asymm 26 26: fd99::1 1.013ms 27: 2002:db8:0:1::1 1.387ms asymm 28 28: fd99::1 1.135ms 29: 2002:db8:0:1::1 1.466ms asymm 30 30: fd99::1 0.589ms Too many hops: pmtu 1400 Resume: pmtu 1400 How reproducible: 100% Steps to Reproduce: 1. Define install-config.yaml with the provided file or with assisted-installer 2. Wait for the two masters to join the cluster 3. See the etcd operator logs, or any other operator which access the kubeapi endpoint. Actual results: Operators and other pods are not able to reach API endpoint. Expected results: Full installation of the cluster.
Hi Could you provide me with a kubeconfig / must-gather to the cluster in exhibiting the problems stated? Thanks in advance, Alexander
Also, please retry with the latest 4.8 version. We've had some IPv6 fixes coming in with the latest downstream merge: https://github.com/openshift/ovn-kubernetes/pull/440
Hi So, I've had a look at this with the helper of a reproducer that Osher provided me with, and I saw the following: [root@f13-h23-b04-5039ms assisted-test-infra]# oc get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE kubernetes ClusterIP 2003:db8::1 <none> 443/TCP 126m [root@f13-h23-b04-5039ms assisted-test-infra]# oc get ep NAME ENDPOINTS AGE kubernetes 10.88.0.1:6443 126m I.e: the kube-apiserver running on the bootstrap node during the cluster creation phase has an IPv4 address on this IPv6 single-stack cluster, this is the reason ovnkube-master does not add the service to its load balancer items. I've had a look at that kube-apiserver container and I saw the following: $ crictl inspect $KUBE_APISERVER_CONTAINER "info": { "sandboxID": "b4f5090d01424f7629e97198176a6d473a1f7fdc1b6e561b57e6b91ebb8fed11", "pid": 38933, "runtimeSpec": { "ociVersion": "1.0.2-dev", "process": { "user": { "uid": 0, "gid": 0 }, "args": [ "/bin/bash", "-ec", "hyperkube kube-apiserver --openshift-config=/etc/kubernetes/config/kube-apiserver-config.yaml --logtostderr=false --alsologtostderr --v=2 --log-file=/var/log/bootstrap-control-plane/kube-apiserver.log --advertise-address=${HOST_IP}\n" ], "env": [ "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin", "TERM=xterm", "HOSTNAME=random-hostname-a49e0432-ee46-4e67-8f36-b6548cfe3c84", "HOST_IP=10.88.0.1", So, presumably there's a mis-configuration of the ENV: `HOST_IP` which results in it advertising an IPv4 address. I am thus re-assigning to the KNI team. /Alex
kubelet is incorrectly picking the podman0 IPv4 bridge address. We'll need to add the nodeip-configuration service to it.
Verified using assisted-installer to install 4.8.0-0.nightly-2021-04-18-203506 oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.8.0-0.nightly-2021-04-18-203506 True False 105s Cluster version is 4.8.0-0.nightly-2021-04-18-203506 oc get ep NAME ENDPOINTS AGE kubernetes [1001:db8::4f]:6443,[1001:db8::55]:6443 41m oc get ep NAME ENDPOINTS AGE kubernetes [1001:db8::4f]:6443,[1001:db8::55]:6443 42m
https://github.com/openshift/installer/pull/4756 refers to https://github.com/openshift/cluster-kube-apiserver-operator/pull/1042 (which is a release-4.7 backport), so do we need to backport the installer PR to 4.7? This is also under discussion via https://github.com/openshift/installer/pull/5013 where we need the TemplateData, I'm wondering if it makes sense to just backport 4756
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438