Description of problem: The cluster uses ovn-k8s-mp0 ip as node Internal IP in baremetal platform, cluster cannot work well in this condition. Version-Release number of selected component (if applicable): 4.6.0-0.nightly-2020-08-25-222652 How reproducible: Always 1. Lanched an OVN baremetal cluster. Actual Result: oc get nodes -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME huir-0826-6q5g9-compute-0 Ready,SchedulingDisabled worker 4h10m v1.19.0-rc.2+aaf4ce1-dirty 10.0.99.77 <none> Red Hat Enterprise Linux CoreOS 46.82.202008251840-0 (Ootpa) 4.18.0-211.el8.x86_64 cri-o://1.19.0-90.rhaos4.6.git4a0ac05.el8-rc.1 huir-0826-6q5g9-compute-1 Ready worker 4h10m v1.19.0-rc.2+aaf4ce1-dirty 10.128.2.2 <none> Red Hat Enterprise Linux CoreOS 46.82.202008251840-0 (Ootpa) 4.18.0-211.el8.x86_64 cri-o://1.19.0-90.rhaos4.6.git4a0ac05.el8-rc.1 huir-0826-6q5g9-compute-2 Ready worker 4h11m v1.19.0-rc.2+aaf4ce1-dirty 10.131.0.2 <none> Red Hat Enterprise Linux CoreOS 46.82.202008251840-0 (Ootpa) 4.18.0-211.el8.x86_64 cri-o://1.19.0-90.rhaos4.6.git4a0ac05.el8-rc.1 huir-0826-6q5g9-control-plane-0 Ready master 4h24m v1.19.0-rc.2+aaf4ce1-dirty 10.129.0.2 <none> Red Hat Enterprise Linux CoreOS 46.82.202008251840-0 (Ootpa) 4.18.0-211.el8.x86_64 cri-o://1.19.0-90.rhaos4.6.git4a0ac05.el8-rc.1 huir-0826-6q5g9-control-plane-1 Ready master 4h24m v1.19.0-rc.2+aaf4ce1-dirty 10.130.0.2 <none> Red Hat Enterprise Linux CoreOS 46.82.202008251840-0 (Ootpa) 4.18.0-211.el8.x86_64 cri-o://1.19.0-90.rhaos4.6.git4a0ac05.el8-rc.1 huir-0826-6q5g9-control-plane-2 Ready master 4h24m v1.19.0-rc.2+aaf4ce1-dirty 10.128.0.2 <none> Red Hat Enterprise Linux CoreOS 46.82.202008251840-0 (Ootpa) 4.18.0-211.el8.x86_64 cri-o://1.19.0-90.rhaos4.6.git4a0ac05.el8-rc.1 We can see for node huir-0826-6q5g9-compute-1, the INTERNAL-IP ip is 10.128.2.2, not 10.0.96.111. [core@huir-0826-6q5g9-compute-1 ~]$ ip a show ovn-k8s-mp0 6: ovn-k8s-mp0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue state UNKNOWN group default qlen 1000 link/ether 06:95:73:d4:8b:99 brd ff:ff:ff:ff:ff:ff inet 10.128.2.2/23 brd 10.128.3.255 scope global ovn-k8s-mp0 valid_lft forever preferred_lft forever [core@huir-0826-6q5g9-compute-1 ~]$ ip a show br-ex 9: br-ex: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000 link/ether fa:16:3e:7a:67:d1 brd ff:ff:ff:ff:ff:ff inet 10.0.96.111/22 brd 10.0.99.255 scope global dynamic noprefixroute br-ex valid_lft 64060sec preferred_lft 64060sec inet6 2620:52:0:60:520a:2628:349e:2e36/64 scope global dynamic noprefixroute valid_lft 2591919sec preferred_lft 604719sec inet6 fe80::9ee7:8f6e:a96f:a26b/64 scope link noprefixroute valid_lft forever preferred_lft forever Some OVN pods also use ovn-k8s-mp0 IP than br-ex IP. oc get pods -n openshift-ovn-kubernetes -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES ovnkube-master-metrics-859sn 1/1 Running 0 6h49m 10.0.98.220 huir-0826-6q5g9-control-plane-1 <none> <none> ovnkube-master-metrics-csjck 1/1 Running 0 6h49m 10.0.97.89 huir-0826-6q5g9-control-plane-2 <none> <none> ovnkube-master-metrics-p8vjr 1/1 Running 0 6h49m 10.0.98.14 huir-0826-6q5g9-control-plane-0 <none> <none> ovnkube-master-rsvm7 2/4 Running 0 6h11m 10.129.0.2 huir-0826-6q5g9-control-plane-0 <none> <none> ovnkube-master-t8h47 4/4 Running 0 6h14m 10.0.97.89 huir-0826-6q5g9-control-plane-2 <none> <none> ovnkube-master-xjs95 4/4 Running 0 6h17m 10.0.98.220 huir-0826-6q5g9-control-plane-1 <none> <none> ovnkube-node-fbp4v 2/2 Running 0 6h12m 10.129.0.2 huir-0826-6q5g9-control-plane-0 <none> <none> ovnkube-node-ghq5l 1/2 CrashLoopBackOff 61 6h11m 10.0.99.77 huir-0826-6q5g9-compute-0 <none> <none> ovnkube-node-lrc95 2/2 Running 0 6h12m 10.131.0.2 huir-0826-6q5g9-compute-2 <none> <none> ovnkube-node-metrics-4gxgn 1/1 Running 0 6h34m 10.0.96.111 huir-0826-6q5g9-compute-1 <none> <none> ovnkube-node-metrics-4n7bd 1/1 Running 0 6h49m 10.0.97.89 huir-0826-6q5g9-control-plane-2 <none> <none> ovnkube-node-metrics-6d8jk 1/1 Running 0 6h49m 10.0.98.14 huir-0826-6q5g9-control-plane-0 <none> <none> ovnkube-node-metrics-dfp7q 1/1 Running 0 6h49m 10.0.98.220 huir-0826-6q5g9-control-plane-1 <none> <none> ovnkube-node-metrics-v4qzz 1/1 Running 0 6h35m 10.0.99.77 huir-0826-6q5g9-compute-0 <none> <none> ovnkube-node-metrics-x5g6k 1/1 Running 0 6h35m 10.0.97.10 huir-0826-6q5g9-compute-2 <none> <none> ovnkube-node-qws9s 2/2 Running 0 125m 10.128.2.2 huir-0826-6q5g9-compute-1 <none> <none> ovnkube-node-w596b 2/2 Running 0 6h11m 10.130.0.2 huir-0826-6q5g9-control-plane-1 <none> <none> ovnkube-node-xr9p5 2/2 Running 1 6h14m 10.0.97.89 huir-0826-6q5g9-control-plane-2 <none> <none> ovs-node-9z7nz 1/1 Running 0 6h35m 10.0.99.77 huir-0826-6q5g9-compute-0 <none> <none> ovs-node-c4p9p 1/1 Running 0 6h49m 10.0.98.220 huir-0826-6q5g9-control-plane-1 <none> <none> ovs-node-h7snd 1/1 Running 0 6h34m 10.0.96.111 huir-0826-6q5g9-compute-1 <none> <none> ovs-node-jbwtm 1/1 Running 0 6h35m 10.0.97.10 huir-0826-6q5g9-compute-2 <none> <none> ovs-node-rn77c 1/1 Running 0 6h49m 10.0.98.14 huir-0826-6q5g9-control-plane-0 <none> <none> ovs-node-vcn4s 1/1 Running 0 6h49m 10.0.97.89 huir-0826-6q5g9-control-plane-2 <none> <none> Expected Result: The Internal IP and OVN pods IP should be br-ex IP of the nodes.
It seems we have a big problem In shared gateway mode we attach br-ex to the primary NIC, ex: 9: br-ex: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000 link/ether fa:16:3e:5e:5a:d7 brd ff:ff:ff:ff:ff:ff inet 10.0.97.89/22 brd 10.0.99.255 scope global noprefixroute dynamic br-ex valid_lft 52609sec preferred_lft 52609sec inet6 2620:52:0:60:1932:bc68:2020:af5d/64 scope global noprefixroute dynamic valid_lft 2591995sec preferred_lft 604795sec inet6 fe80::5064:6d88:5776:2c54/64 scope link noprefixroute valid_lft forever preferred_lft forever The IP:10.0.97.89 is not the InternalIP address of the node, it's considered the ExternalIP as seen in the AWS console (see attached picture) The problem is that the CNO bootstraps OVN with the list of master nodes according to their InternalIP address representation (as retrieved from the API server), here: https://github.com/openshift/cluster-network-operator/blob/92e466db53cc9e741084c3697fe893e7496ba61d/pkg/network/ovn_kubernetes.go#L260 This would not be a problem in itself, it would be a mere change in the CNO to bootrstrap with the ExternalIP instead. However, those fields are not set in the API servers node representation: oc get node -o yaml huir-0827-mnfkm-control-plane-0 apiVersion: v1 kind: Node metadata: annotations: k8s.ovn.org/l3-gateway-config: '{"default":{"mode":"shared","interface-id":"br-ex_huir-0827-mnfkm-control-plane-0","mac-address":"fa:16:3e:5e:5a:d7","ip-addresses":["10.0.97.89/22"],"ip-address":"10.0.97.89/22","next-hops":["10.0.99.254"],"next-hop":"10.0.99.254","node-port-enable":"true","vlan-id":"0"}}' k8s.ovn.org/node-chassis-id: 273c77f9-6f1f-4747-8f3d-542e1a8724f6 k8s.ovn.org/node-join-subnets: '{"default":"100.64.2.0/29"}' k8s.ovn.org/node-local-nat-ip: '{"default":["169.254.12.13"]}' k8s.ovn.org/node-mgmt-port-mac-address: ce:fe:f3:93:76:22 k8s.ovn.org/node-primary-ifaddr: '{"ipv4":"10.0.97.89/22","ipv6":"2620:52:0:60:1932:bc68:2020:af5d/64"}' k8s.ovn.org/node-subnets: '{"default":"10.129.0.0/23"}' machineconfiguration.openshift.io/currentConfig: rendered-master-bda270c04531b48aef1e5493c3b78844 machineconfiguration.openshift.io/desiredConfig: rendered-master-bda270c04531b48aef1e5493c3b78844 machineconfiguration.openshift.io/reason: "" machineconfiguration.openshift.io/state: Done volumes.kubernetes.io/controller-managed-attach-detach: "true" creationTimestamp: "2020-08-27T00:50:55Z" labels: beta.kubernetes.io/arch: amd64 beta.kubernetes.io/os: linux kubernetes.io/arch: amd64 kubernetes.io/hostname: huir-0827-mnfkm-control-plane-0 kubernetes.io/os: linux node-role.kubernetes.io/master: "" node.openshift.io/os_id: rhcos name: huir-0827-mnfkm-control-plane-0 resourceVersion: "770952" selfLink: /api/v1/nodes/huir-0827-mnfkm-control-plane-0 uid: 1b11603e-b37a-481c-8705-d05f22b29f55 spec: podCIDR: 10.128.1.0/24 podCIDRs: - 10.128.1.0/24 taints: - effect: NoSchedule key: node-role.kubernetes.io/master status: addresses: - address: 10.129.0.2 type: InternalIP - address: huir-0827-mnfkm-control-plane-0 type: Hostname I need to discuss this with Tim to check what he thinks we should do about this. We could have a look at changing the API server to set the ExternalIP field on all nodes, but I am afraid that it might be too complex/close to the final freeze.
Created attachment 1712808 [details] AWS console with cluster nodes
Excuse me, that's not the AWS console. It's Openstack
FYI: I just created a cluster on GCP (which works fine), it seems we attach the InternalIP to br-ex on that platform. I have attached the output of ovs-configuration.service for both cases for comparison. But there is no error in the Openstack case, so nothing evident to me as to what causes the difference. GCP journalctl -u ovs-configuration.service > tmp sh-4.4# cat tmp -- Logs begin at Thu 2020-08-27 12:00:24 UTC, end at Thu 2020-08-27 13:13:40 UTC. -- Aug 27 12:06:23 ci-ln-crxcz5b-f76d1-wmk6z-master-2 systemd[1]: Starting Configures OVS with proper host networking configuration... Aug 27 12:06:23 ci-ln-crxcz5b-f76d1-wmk6z-master-2 configure-ovs.sh[1419]: + iface= Aug 27 12:06:23 ci-ln-crxcz5b-f76d1-wmk6z-master-2 configure-ovs.sh[1419]: + counter=0 Aug 27 12:06:23 ci-ln-crxcz5b-f76d1-wmk6z-master-2 configure-ovs.sh[1419]: + '[' 0 -lt 12 ']' Aug 27 12:06:23 ci-ln-crxcz5b-f76d1-wmk6z-master-2 configure-ovs.sh[1419]: ++ ip -j route show default Aug 27 12:06:23 ci-ln-crxcz5b-f76d1-wmk6z-master-2 configure-ovs.sh[1419]: ++ jq -r '.[0].dev' Aug 27 12:06:23 ci-ln-crxcz5b-f76d1-wmk6z-master-2 configure-ovs.sh[1419]: + iface=ens4 Aug 27 12:06:23 ci-ln-crxcz5b-f76d1-wmk6z-master-2 configure-ovs.sh[1419]: + [[ -n ens4 ]] Aug 27 12:06:23 ci-ln-crxcz5b-f76d1-wmk6z-master-2 configure-ovs.sh[1419]: + [[ ens4 != \n\u\l\l ]] Aug 27 12:06:23 ci-ln-crxcz5b-f76d1-wmk6z-master-2 configure-ovs.sh[1419]: + echo 'IPv4 Default gateway interface found: ens4' Aug 27 12:06:23 ci-ln-crxcz5b-f76d1-wmk6z-master-2 configure-ovs.sh[1419]: IPv4 Default gateway interface found: ens4 Aug 27 12:06:23 ci-ln-crxcz5b-f76d1-wmk6z-master-2 configure-ovs.sh[1419]: + break Aug 27 12:06:23 ci-ln-crxcz5b-f76d1-wmk6z-master-2 configure-ovs.sh[1419]: + '[' ens4 = br-ex ']' Aug 27 12:06:23 ci-ln-crxcz5b-f76d1-wmk6z-master-2 configure-ovs.sh[1419]: + '[' -z ens4 ']' Aug 27 12:06:23 ci-ln-crxcz5b-f76d1-wmk6z-master-2 configure-ovs.sh[1419]: + iface_mac=42:01:0a:00:00:03 Aug 27 12:06:23 ci-ln-crxcz5b-f76d1-wmk6z-master-2 configure-ovs.sh[1419]: + echo 'MAC address found for iface: ens4: 42:01:0a:00:00:03' Aug 27 12:06:23 ci-ln-crxcz5b-f76d1-wmk6z-master-2 configure-ovs.sh[1419]: MAC address found for iface: ens4: 42:01:0a:00:00:03 Aug 27 12:06:23 ci-ln-crxcz5b-f76d1-wmk6z-master-2 configure-ovs.sh[1419]: ++ ip -j link show ens4 Aug 27 12:06:23 ci-ln-crxcz5b-f76d1-wmk6z-master-2 configure-ovs.sh[1419]: ++ jq -r '.[0].mtu' Aug 27 12:06:23 ci-ln-crxcz5b-f76d1-wmk6z-master-2 configure-ovs.sh[1419]: + iface_mtu=1460 Aug 27 12:06:23 ci-ln-crxcz5b-f76d1-wmk6z-master-2 configure-ovs.sh[1419]: + [[ -z 1460 ]] Aug 27 12:06:23 ci-ln-crxcz5b-f76d1-wmk6z-master-2 configure-ovs.sh[1419]: + [[ 1460 == \n\u\l\l ]] Aug 27 12:06:23 ci-ln-crxcz5b-f76d1-wmk6z-master-2 configure-ovs.sh[1419]: + echo 'MTU found for iface: ens4: 1460' Aug 27 12:06:23 ci-ln-crxcz5b-f76d1-wmk6z-master-2 configure-ovs.sh[1419]: MTU found for iface: ens4: 1460 Aug 27 12:06:23 ci-ln-crxcz5b-f76d1-wmk6z-master-2 configure-ovs.sh[1419]: + nmcli connection show br-ex Aug 27 12:06:23 ci-ln-crxcz5b-f76d1-wmk6z-master-2 configure-ovs.sh[1419]: + nmcli c add type ovs-bridge conn.interface br-ex con-name br-ex 802-3-ethernet.mtu 1460 802-3-ethernet.cloned-mac-address 42:01:0a:00:00:03 Aug 27 12:06:24 ci-ln-crxcz5b-f76d1-wmk6z-master-2 configure-ovs.sh[1419]: Connection 'br-ex' (4ed3ad4b-17d5-4e6b-87ad-eddd8d3eaccb) successfully added. Aug 27 12:06:24 ci-ln-crxcz5b-f76d1-wmk6z-master-2 configure-ovs.sh[1419]: ++ nmcli --fields UUID,DEVICE conn show --active Aug 27 12:06:24 ci-ln-crxcz5b-f76d1-wmk6z-master-2 configure-ovs.sh[1419]: ++ grep ens4 Aug 27 12:06:24 ci-ln-crxcz5b-f76d1-wmk6z-master-2 configure-ovs.sh[1419]: ++ awk '{print $1}' Aug 27 12:06:24 ci-ln-crxcz5b-f76d1-wmk6z-master-2 configure-ovs.sh[1419]: + old_conn=67de8da7-74d3-4af6-b30d-659ed36212d0 Aug 27 12:06:24 ci-ln-crxcz5b-f76d1-wmk6z-master-2 configure-ovs.sh[1419]: + nmcli connection show ovs-port-phys0 Aug 27 12:06:24 ci-ln-crxcz5b-f76d1-wmk6z-master-2 configure-ovs.sh[1419]: + nmcli c add type ovs-port conn.interface ens4 master br-ex con-name ovs-port-phys0 Aug 27 12:06:24 ci-ln-crxcz5b-f76d1-wmk6z-master-2 configure-ovs.sh[1419]: Connection 'ovs-port-phys0' (e3f63661-f500-4885-83af-a2401fa8613f) successfully added. Aug 27 12:06:24 ci-ln-crxcz5b-f76d1-wmk6z-master-2 configure-ovs.sh[1419]: + nmcli connection show ovs-port-br-ex Aug 27 12:06:24 ci-ln-crxcz5b-f76d1-wmk6z-master-2 configure-ovs.sh[1419]: + nmcli c add type ovs-port conn.interface br-ex master br-ex con-name ovs-port-br-ex Aug 27 12:06:24 ci-ln-crxcz5b-f76d1-wmk6z-master-2 configure-ovs.sh[1419]: Connection 'ovs-port-br-ex' (155140bc-2a94-4058-94bc-a5abf78d5b1f) successfully added. Aug 27 12:06:24 ci-ln-crxcz5b-f76d1-wmk6z-master-2 configure-ovs.sh[1419]: + nmcli device disconnect ens4 Aug 27 12:06:24 ci-ln-crxcz5b-f76d1-wmk6z-master-2 configure-ovs.sh[1419]: Device 'ens4' successfully disconnected. Aug 27 12:06:24 ci-ln-crxcz5b-f76d1-wmk6z-master-2 configure-ovs.sh[1419]: + nmcli connection show ovs-if-phys0 Aug 27 12:06:24 ci-ln-crxcz5b-f76d1-wmk6z-master-2 configure-ovs.sh[1419]: + nmcli c add type 802-3-ethernet conn.interface ens4 master ovs-port-phys0 con-name ovs-if-phys0 connection.autoconnect-priority 100 802-3-ethernet.mtu 1460 Aug 27 12:06:24 ci-ln-crxcz5b-f76d1-wmk6z-master-2 configure-ovs.sh[1419]: Connection 'ovs-if-phys0' (a91fafd9-c4cf-4730-b2e0-15ff81a71723) successfully added. Aug 27 12:06:24 ci-ln-crxcz5b-f76d1-wmk6z-master-2 configure-ovs.sh[1419]: + nmcli conn up ovs-if-phys0 Aug 27 12:06:24 ci-ln-crxcz5b-f76d1-wmk6z-master-2 configure-ovs.sh[1419]: Connection successfully activated (D-Bus active path: /org/freedesktop/NetworkManager/ActiveConnection/5) Aug 27 12:06:24 ci-ln-crxcz5b-f76d1-wmk6z-master-2 configure-ovs.sh[1419]: + nmcli connection show ovs-if-br-ex Aug 27 12:06:24 ci-ln-crxcz5b-f76d1-wmk6z-master-2 configure-ovs.sh[1419]: + nmcli c add type ovs-interface slave-type ovs-port conn.interface br-ex master ovs-port-br-ex con-name ovs-if-br-ex 802-3-ethernet.mtu 1460 802-3-ethernet.cloned-mac-address 42:01:0a:00:00:03 Aug 27 12:06:24 ci-ln-crxcz5b-f76d1-wmk6z-master-2 configure-ovs.sh[1419]: Connection 'ovs-if-br-ex' (acff4245-05dc-444d-8d9f-b9755bbcdd3f) successfully added. Aug 27 12:06:24 ci-ln-crxcz5b-f76d1-wmk6z-master-2 configure-ovs.sh[1419]: + counter=0 Aug 27 12:06:24 ci-ln-crxcz5b-f76d1-wmk6z-master-2 configure-ovs.sh[1419]: + '[' 0 -lt 5 ']' Aug 27 12:06:24 ci-ln-crxcz5b-f76d1-wmk6z-master-2 configure-ovs.sh[1419]: + sleep 5 Aug 27 12:06:29 ci-ln-crxcz5b-f76d1-wmk6z-master-2 configure-ovs.sh[1419]: + nmcli --fields GENERAL.STATE conn show ovs-if-br-ex Aug 27 12:06:29 ci-ln-crxcz5b-f76d1-wmk6z-master-2 configure-ovs.sh[1419]: + grep -i activated Aug 27 12:06:29 ci-ln-crxcz5b-f76d1-wmk6z-master-2 configure-ovs.sh[1419]: GENERAL.STATE: activated Aug 27 12:06:29 ci-ln-crxcz5b-f76d1-wmk6z-master-2 configure-ovs.sh[1419]: + echo 'OVS successfully configured' Aug 27 12:06:29 ci-ln-crxcz5b-f76d1-wmk6z-master-2 configure-ovs.sh[1419]: OVS successfully configured Aug 27 12:06:29 ci-ln-crxcz5b-f76d1-wmk6z-master-2 configure-ovs.sh[1419]: + ip a show br-ex Aug 27 12:06:29 ci-ln-crxcz5b-f76d1-wmk6z-master-2 configure-ovs.sh[1419]: 4: br-ex: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1460 qdisc noqueue state UNKNOWN group default qlen 1000 Aug 27 12:06:29 ci-ln-crxcz5b-f76d1-wmk6z-master-2 configure-ovs.sh[1419]: link/ether 42:01:0a:00:00:03 brd ff:ff:ff:ff:ff:ff Aug 27 12:06:29 ci-ln-crxcz5b-f76d1-wmk6z-master-2 configure-ovs.sh[1419]: inet 10.0.0.3/32 scope global dynamic noprefixroute br-ex Aug 27 12:06:29 ci-ln-crxcz5b-f76d1-wmk6z-master-2 configure-ovs.sh[1419]: valid_lft 86396sec preferred_lft 86396sec Aug 27 12:06:29 ci-ln-crxcz5b-f76d1-wmk6z-master-2 configure-ovs.sh[1419]: inet6 fe80::8839:f9c0:a4ac:3964/64 scope link noprefixroute Aug 27 12:06:29 ci-ln-crxcz5b-f76d1-wmk6z-master-2 configure-ovs.sh[1419]: valid_lft forever preferred_lft forever Aug 27 12:06:29 ci-ln-crxcz5b-f76d1-wmk6z-master-2 configure-ovs.sh[1419]: + exit 0 Aug 27 12:06:29 ci-ln-crxcz5b-f76d1-wmk6z-master-2 systemd[1]: Started Configures OVS with proper host networking configuration. Aug 27 12:06:29 ci-ln-crxcz5b-f76d1-wmk6z-master-2 systemd[1]: ovs-configuration.service: Consumed 378ms CPU time Openstack cat tmp -- Logs begin at Thu 2020-08-27 00:46:53 UTC, end at Thu 2020-08-27 12:52:15 UTC. -- Aug 27 00:50:21 huir-0827-mnfkm-control-plane-0 systemd[1]: Starting Configures OVS with proper host networking configuration... Aug 27 00:50:21 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1448]: + iface= Aug 27 00:50:21 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1448]: + counter=0 Aug 27 00:50:21 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1448]: + '[' 0 -lt 12 ']' Aug 27 00:50:21 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1448]: ++ ip -j route show default Aug 27 00:50:21 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1448]: ++ jq -r '.[0].dev' Aug 27 00:50:21 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1448]: + iface=ens3 Aug 27 00:50:21 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1448]: + [[ -n ens3 ]] Aug 27 00:50:21 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1448]: + [[ ens3 != \n\u\l\l ]] Aug 27 00:50:21 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1448]: + echo 'IPv4 Default gateway interface found: ens3' Aug 27 00:50:21 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1448]: IPv4 Default gateway interface found: ens3 Aug 27 00:50:21 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1448]: + break Aug 27 00:50:21 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1448]: + '[' ens3 = br-ex ']' Aug 27 00:50:21 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1448]: + '[' -z ens3 ']' Aug 27 00:50:21 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1448]: + iface_mac=fa:16:3e:5e:5a:d7 Aug 27 00:50:21 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1448]: + echo 'MAC address found for iface: ens3: fa:16:3e:5e:5a:d7' Aug 27 00:50:21 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1448]: MAC address found for iface: ens3: fa:16:3e:5e:5a:d7 Aug 27 00:50:21 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1448]: ++ ip -j link show ens3 Aug 27 00:50:21 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1448]: ++ jq -r '.[0].mtu' Aug 27 00:50:21 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1448]: + iface_mtu=1500 Aug 27 00:50:21 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1448]: + [[ -z 1500 ]] Aug 27 00:50:21 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1448]: + [[ 1500 == \n\u\l\l ]] Aug 27 00:50:21 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1448]: + echo 'MTU found for iface: ens3: 1500' Aug 27 00:50:21 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1448]: MTU found for iface: ens3: 1500 Aug 27 00:50:21 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1448]: + nmcli connection show br-ex Aug 27 00:50:21 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1448]: + nmcli c add type ovs-bridge conn.interface br-ex con-name br-ex 802-3-ethernet.mtu 1500 802-3-ethernet.cloned-mac-address fa:16:3e:5e:5a:d7 Aug 27 00:50:21 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1448]: Connection 'br-ex' (385fc9ac-c2a2-45fe-8e79-cd042153ae1d) successfully added. Aug 27 00:50:21 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1448]: ++ nmcli --fields UUID,DEVICE conn show --active Aug 27 00:50:21 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1448]: ++ grep ens3 Aug 27 00:50:21 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1448]: ++ awk '{print $1}' Aug 27 00:50:21 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1448]: + old_conn=21d47e65-8523-1a06-af22-6f121086f085 Aug 27 00:50:21 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1448]: + nmcli connection show ovs-port-phys0 Aug 27 00:50:21 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1448]: + nmcli c add type ovs-port conn.interface ens3 master br-ex con-name ovs-port-phys0 Aug 27 00:50:21 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1448]: Connection 'ovs-port-phys0' (c78224c9-4357-41fc-9627-7b44fecc3a87) successfully added. Aug 27 00:50:21 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1448]: + nmcli connection show ovs-port-br-ex Aug 27 00:50:21 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1448]: + nmcli c add type ovs-port conn.interface br-ex master br-ex con-name ovs-port-br-ex Aug 27 00:50:21 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1448]: Connection 'ovs-port-br-ex' (d8b99bc6-d7f1-4cc5-8a04-3ebb271c5f83) successfully added. Aug 27 00:50:21 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1448]: + nmcli device disconnect ens3 Aug 27 00:50:21 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1448]: Device 'ens3' successfully disconnected. Aug 27 00:50:21 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1448]: + nmcli connection show ovs-if-phys0 Aug 27 00:50:21 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1448]: + nmcli c add type 802-3-ethernet conn.interface ens3 master ovs-port-phys0 con-name ovs-if-phys0 connection.autoconnect-priority 100 802-3-ethernet.mtu 1500 Aug 27 00:50:21 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1448]: Connection 'ovs-if-phys0' (86644bf1-7ee0-4370-aea7-20a60688f63f) successfully added. Aug 27 00:50:21 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1448]: + nmcli conn up ovs-if-phys0 Aug 27 00:50:21 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1448]: Connection successfully activated (D-Bus active path: /org/freedesktop/NetworkManager/ActiveConnection/5) Aug 27 00:50:21 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1448]: + nmcli connection show ovs-if-br-ex Aug 27 00:50:21 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1448]: + nmcli c add type ovs-interface slave-type ovs-port conn.interface br-ex master ovs-port-br-ex con-name ovs-if-br-ex 802-3-ethernet.mtu 1500 802-3-ethernet.cloned-mac-address fa:16:3e:5e:5a:d7 Aug 27 00:50:21 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1448]: Connection 'ovs-if-br-ex' (3521dd05-3d8a-4251-8466-f88d7db84209) successfully added. Aug 27 00:50:21 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1448]: + counter=0 Aug 27 00:50:21 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1448]: + '[' 0 -lt 5 ']' Aug 27 00:50:21 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1448]: + sleep 5 Aug 27 00:50:28 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1448]: + nmcli --fields GENERAL.STATE conn show ovs-if-br-ex Aug 27 00:50:28 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1448]: + grep -i activated Aug 27 00:50:28 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1448]: GENERAL.STATE: activated Aug 27 00:50:28 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1448]: + echo 'OVS successfully configured' Aug 27 00:50:28 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1448]: OVS successfully configured Aug 27 00:50:28 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1448]: + ip a show br-ex Aug 27 00:50:28 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1448]: 4: br-ex: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000 Aug 27 00:50:28 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1448]: link/ether fa:16:3e:5e:5a:d7 brd ff:ff:ff:ff:ff:ff Aug 27 00:50:28 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1448]: inet 10.0.97.89/22 brd 10.0.99.255 scope global dynamic noprefixroute br-ex Aug 27 00:50:28 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1448]: valid_lft 86396sec preferred_lft 86396sec Aug 27 00:50:28 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1448]: inet6 2620:52:0:60:1932:bc68:2020:af5d/64 scope global tentative dynamic noprefixroute Aug 27 00:50:28 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1448]: valid_lft 2592000sec preferred_lft 604800sec Aug 27 00:50:28 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1448]: inet6 fe80::5064:6d88:5776:2c54/64 scope link noprefixroute Aug 27 00:50:28 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1448]: valid_lft forever preferred_lft forever Aug 27 00:50:28 huir-0827-mnfkm-control-plane-0 systemd[1]: Started Configures OVS with proper host networking configuration. Aug 27 00:50:28 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1448]: + configure_driver_options ens3 Aug 27 00:50:28 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1448]: + intf=ens3 Aug 27 00:50:28 huir-0827-mnfkm-control-plane-0 systemd[1]: ovs-configuration.service: Consumed 309ms CPU time Aug 27 00:50:28 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1448]: ++ cat /sys/class/net/ens3/device/uevent Aug 27 00:50:28 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1448]: ++ grep DRIVER Aug 27 00:50:28 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1448]: ++ awk -F = '{print $2}' Aug 27 00:50:28 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1448]: + driver=virtio_net Aug 27 00:50:28 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1448]: + echo 'Driver name is' virtio_net Aug 27 00:50:28 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1448]: Driver name is virtio_net Aug 27 00:50:28 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1448]: + '[' virtio_net = vmxnet3 ']' Aug 27 00:50:28 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1448]: + exit 0 -- Reboot -- Aug 27 01:19:09 huir-0827-mnfkm-control-plane-0 systemd[1]: Starting Configures OVS with proper host networking configuration... Aug 27 01:19:09 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1573]: + iface= Aug 27 01:19:09 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1573]: + counter=0 Aug 27 01:19:09 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1573]: + '[' 0 -lt 12 ']' Aug 27 01:19:09 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1573]: ++ ip -j route show default Aug 27 01:19:09 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1573]: ++ jq -r '.[0].dev' Aug 27 01:19:09 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1573]: + iface=br-ex Aug 27 01:19:09 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1573]: + [[ -n br-ex ]] Aug 27 01:19:09 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1573]: + [[ br-ex != \n\u\l\l ]] Aug 27 01:19:09 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1573]: + echo 'IPv4 Default gateway interface found: br-ex' Aug 27 01:19:09 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1573]: IPv4 Default gateway interface found: br-ex Aug 27 01:19:09 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1573]: + break Aug 27 01:19:09 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1573]: + '[' br-ex = br-ex ']' Aug 27 01:19:09 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1573]: ++ ovs-vsctl list-ifaces br-ex Aug 27 01:19:09 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1573]: + ifaces='ens3 Aug 27 01:19:09 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1573]: patch-br-ex_huir-0827-mnfkm-control-plane-0-to-br-int' Aug 27 01:19:09 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1573]: + for intf in $ifaces Aug 27 01:19:09 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1573]: + configure_driver_options ens3 Aug 27 01:19:09 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1573]: + intf=ens3 Aug 27 01:19:10 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1573]: ++ cat /sys/class/net/ens3/device/uevent Aug 27 01:19:09 huir-0827-mnfkm-control-plane-0 systemd[1]: Started Configures OVS with proper host networking configuration. Aug 27 01:19:10 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1573]: ++ grep DRIVER Aug 27 01:19:09 huir-0827-mnfkm-control-plane-0 systemd[1]: ovs-configuration.service: Consumed 69ms CPU time Aug 27 01:19:10 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1573]: ++ awk -F = '{print $2}' Aug 27 01:19:10 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1573]: + driver=virtio_net Aug 27 01:19:10 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1573]: + echo 'Driver name is' virtio_net Aug 27 01:19:10 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1573]: Driver name is virtio_net Aug 27 01:19:10 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1573]: + '[' virtio_net = vmxnet3 ']' Aug 27 01:19:10 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1573]: + for intf in $ifaces Aug 27 01:19:10 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1573]: + configure_driver_options patch-br-ex_huir-0827-mnfkm-control-plane-0-to-br-int Aug 27 01:19:10 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1573]: + intf=patch-br-ex_huir-0827-mnfkm-control-plane-0-to-br-int Aug 27 01:19:10 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1573]: ++ cat /sys/class/net/patch-br-ex_huir-0827-mnfkm-control-plane-0-to-br-int/device/uevent Aug 27 01:19:10 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1573]: ++ grep DRIVER Aug 27 01:19:10 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1573]: ++ awk -F = '{print $2}' Aug 27 01:19:10 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1573]: cat: /sys/class/net/patch-br-ex_huir-0827-mnfkm-control-plane-0-to-br-int/device/uevent: No such file or directory Aug 27 01:19:10 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1573]: + driver= Aug 27 01:19:10 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1573]: + echo 'Driver name is' Aug 27 01:19:10 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1573]: Driver name is Aug 27 01:19:10 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1573]: + '[' '' = vmxnet3 ']' Aug 27 01:19:10 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1573]: + echo 'Networking already configured and up for br-ex!' Aug 27 01:19:10 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1573]: Networking already configured and up for br-ex! Aug 27 01:19:10 huir-0827-mnfkm-control-plane-0 configure-ovs.sh[1573]: + exit 0
Going further, logs from NetworkManager seems to point to DHCP being done differently between the providers. Openstack Aug 27 00:49:08 huir-0827-mnfkm-control-plane-0 systemd[1]: Starting Network Manager... Aug 27 00:49:08 huir-0827-mnfkm-control-plane-0 NetworkManager[1706]: <info> [1598489348.2051] NetworkManager (version 1.22.8-6.el8_2) is starting... (for the first time) Aug 27 00:49:08 huir-0827-mnfkm-control-plane-0 NetworkManager[1706]: <info> [1598489348.2055] Read config: /etc/NetworkManager/NetworkManager.conf (lib: 10-disable-default-plugins.conf, 20-client-id-from-mac.conf) (etc: sdn.conf) Aug 27 00:49:08 huir-0827-mnfkm-control-plane-0 systemd[1]: Started Network Manager. Aug 27 00:49:08 huir-0827-mnfkm-control-plane-0 NetworkManager[1706]: <info> [1598489348.2084] bus-manager: acquired D-Bus service "org.freedesktop.NetworkManager" Aug 27 00:49:08 huir-0827-mnfkm-control-plane-0 NetworkManager[1706]: <info> [1598489348.2204] manager[0x563b86bcb090]: monitoring kernel firmware directory '/lib/firmware'. Aug 27 00:49:08 huir-0827-mnfkm-control-plane-0 NetworkManager[1706]: <info> [1598489348.4862] hostname: hostname: using hostnamed Aug 27 00:49:08 huir-0827-mnfkm-control-plane-0 NetworkManager[1706]: <info> [1598489348.4865] hostname: hostname changed from (none) to "huir-0827-mnfkm-control-plane-0" Aug 27 00:49:08 huir-0827-mnfkm-control-plane-0 NetworkManager[1706]: <info> [1598489348.4874] dns-mgr[0x563b86baf250]: init: dns=default,systemd-resolved rc-manager=symlink Aug 27 00:49:08 huir-0827-mnfkm-control-plane-0 NetworkManager[1706]: <info> [1598489348.4989] Loaded device plugin: NMOvsFactory (/usr/lib64/NetworkManager/1.22.8-6.el8_2/libnm-device-plugin-ovs.so) Aug 27 00:49:08 huir-0827-mnfkm-control-plane-0 NetworkManager[1706]: <info> [1598489348.5031] Loaded device plugin: NMTeamFactory (/usr/lib64/NetworkManager/1.22.8-6.el8_2/libnm-device-plugin-team.so) Aug 27 00:49:08 huir-0827-mnfkm-control-plane-0 NetworkManager[1706]: <info> [1598489348.5032] manager: rfkill: Wi-Fi enabled by radio killswitch; enabled by state file Aug 27 00:49:08 huir-0827-mnfkm-control-plane-0 NetworkManager[1706]: <info> [1598489348.5033] manager: rfkill: WWAN enabled by radio killswitch; enabled by state file Aug 27 00:49:08 huir-0827-mnfkm-control-plane-0 NetworkManager[1706]: <info> [1598489348.5034] manager: Networking is enabled by state file Aug 27 00:49:08 huir-0827-mnfkm-control-plane-0 NetworkManager[1706]: <info> [1598489348.5035] dhcp-init: Using DHCP client 'internal' Aug 27 00:49:08 huir-0827-mnfkm-control-plane-0 NetworkManager[1706]: <info> [1598489348.5036] settings: Loaded settings plugin: keyfile (internal) Aug 27 00:49:08 huir-0827-mnfkm-control-plane-0 NetworkManager[1706]: <info> [1598489348.5099] settings: Loaded settings plugin: ifcfg-rh ("/usr/lib64/NetworkManager/1.22.8-6.el8_2/libnm-settings-plugin-ifcfg-rh.so") Aug 27 00:49:08 huir-0827-mnfkm-control-plane-0 NetworkManager[1706]: <info> [1598489348.5185] device (lo): carrier: link connected Aug 27 00:49:08 huir-0827-mnfkm-control-plane-0 NetworkManager[1706]: <info> [1598489348.5189] manager: (lo): new Generic device (/org/freedesktop/NetworkManager/Devices/1) Aug 27 00:49:08 huir-0827-mnfkm-control-plane-0 NetworkManager[1706]: <info> [1598489348.5202] manager: (ens3): new Ethernet device (/org/freedesktop/NetworkManager/Devices/2) Aug 27 00:49:08 huir-0827-mnfkm-control-plane-0 NetworkManager[1706]: <info> [1598489348.5210] device (ens3): state change: unmanaged -> unavailable (reason 'managed', sys-iface-state: 'external') Aug 27 00:49:08 huir-0827-mnfkm-control-plane-0 NetworkManager[1706]: <info> [1598489348.5248] device (ens3): carrier: link connected Aug 27 00:49:08 huir-0827-mnfkm-control-plane-0 NetworkManager[1706]: <info> [1598489348.5312] device (ens3): state change: unavailable -> disconnected (reason 'none', sys-iface-state: 'managed') Aug 27 00:49:08 huir-0827-mnfkm-control-plane-0 NetworkManager[1706]: <info> [1598489348.5321] policy: auto-activating connection 'ens3' (21d47e65-8523-1a06-af22-6f121086f085) Aug 27 00:49:08 huir-0827-mnfkm-control-plane-0 NetworkManager[1706]: <info> [1598489348.5328] device (ens3): Activation: starting connection 'ens3' (21d47e65-8523-1a06-af22-6f121086f085) Aug 27 00:49:08 huir-0827-mnfkm-control-plane-0 NetworkManager[1706]: <info> [1598489348.5330] device (ens3): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'managed') Aug 27 00:49:08 huir-0827-mnfkm-control-plane-0 NetworkManager[1706]: <info> [1598489348.5333] manager: NetworkManager state is now CONNECTING Aug 27 00:49:08 huir-0827-mnfkm-control-plane-0 NetworkManager[1706]: <info> [1598489348.5336] device (ens3): state change: prepare -> config (reason 'none', sys-iface-state: 'managed') Aug 27 00:49:08 huir-0827-mnfkm-control-plane-0 NetworkManager[1706]: <info> [1598489348.5342] device (ens3): state change: config -> ip-config (reason 'none', sys-iface-state: 'managed') Aug 27 00:49:08 huir-0827-mnfkm-control-plane-0 NetworkManager[1706]: <info> [1598489348.5346] dhcp4 (ens3): activation: beginning transaction (timeout in 45 seconds) Aug 27 00:49:08 huir-0827-mnfkm-control-plane-0 NetworkManager[1706]: <info> [1598489348.5419] dhcp4 (ens3): option dhcp_lease_time => '86400' Aug 27 00:49:08 huir-0827-mnfkm-control-plane-0 NetworkManager[1706]: <info> [1598489348.5419] dhcp4 (ens3): option domain_name => 'openstacklocal' Aug 27 00:49:08 huir-0827-mnfkm-control-plane-0 NetworkManager[1706]: <info> [1598489348.5419] dhcp4 (ens3): option domain_name_servers => '10.11.5.19 10.5.30.45' Aug 27 00:49:08 huir-0827-mnfkm-control-plane-0 NetworkManager[1706]: <info> [1598489348.5419] dhcp4 (ens3): option expiry => '1598575748' Aug 27 00:49:08 huir-0827-mnfkm-control-plane-0 NetworkManager[1706]: <info> [1598489348.5419] dhcp4 (ens3): option host_name => 'host-10-0-97-89' Aug 27 00:49:08 huir-0827-mnfkm-control-plane-0 NetworkManager[1706]: <info> [1598489348.5420] dhcp4 (ens3): option interface_mtu => '1500' Aug 27 00:49:08 huir-0827-mnfkm-control-plane-0 NetworkManager[1706]: <info> [1598489348.5420] dhcp4 (ens3): option ip_address => '10.0.97.89' Aug 27 00:49:08 huir-0827-mnfkm-control-plane-0 NetworkManager[1706]: <info> [1598489348.5420] dhcp4 (ens3): option next_server => '10.0.96.161' GCP Aug 27 12:05:12 localhost systemd[1]: Starting Network Manager... Aug 27 12:05:12 localhost NetworkManager[1652]: <info> [1598529912.3449] NetworkManager (version 1.22.8-6.el8_2) is starting... (for the first time) Aug 27 12:05:12 localhost NetworkManager[1652]: <info> [1598529912.3453] Read config: /etc/NetworkManager/NetworkManager.conf (lib: 10-disable-default-plugins.conf, 20-client-id-from-mac.conf) (etc: sdn.conf) Aug 27 12:05:12 localhost systemd[1]: Started Network Manager. Aug 27 12:05:12 localhost NetworkManager[1652]: <info> [1598529912.3491] bus-manager: acquired D-Bus service "org.freedesktop.NetworkManager" Aug 27 12:05:12 localhost NetworkManager[1652]: <info> [1598529912.3552] manager[0x56205677d090]: monitoring kernel firmware directory '/lib/firmware'. Aug 27 12:05:12 localhost NetworkManager[1652]: <info> [1598529912.6767] hostname: hostname: using hostnamed Aug 27 12:05:12 localhost NetworkManager[1652]: <info> [1598529912.6771] dns-mgr[0x562056761250]: init: dns=default,systemd-resolved rc-manager=symlink Aug 27 12:05:12 localhost NetworkManager[1652]: <info> [1598529912.6833] Loaded device plugin: NMOvsFactory (/usr/lib64/NetworkManager/1.22.8-6.el8_2/libnm-device-plugin-ovs.so) Aug 27 12:05:12 localhost NetworkManager[1652]: <info> [1598529912.6862] Loaded device plugin: NMTeamFactory (/usr/lib64/NetworkManager/1.22.8-6.el8_2/libnm-device-plugin-team.so) Aug 27 12:05:12 localhost NetworkManager[1652]: <info> [1598529912.6863] manager: rfkill: Wi-Fi enabled by radio killswitch; enabled by state file Aug 27 12:05:12 localhost NetworkManager[1652]: <info> [1598529912.6864] manager: rfkill: WWAN enabled by radio killswitch; enabled by state file Aug 27 12:05:12 localhost NetworkManager[1652]: <info> [1598529912.6866] manager: Networking is enabled by state file Aug 27 12:05:12 localhost NetworkManager[1652]: <info> [1598529912.6867] dhcp-init: Using DHCP client 'internal' Aug 27 12:05:12 localhost NetworkManager[1652]: <info> [1598529912.6869] settings: Loaded settings plugin: keyfile (internal) Aug 27 12:05:12 localhost NetworkManager[1652]: <info> [1598529912.6905] settings: Loaded settings plugin: ifcfg-rh ("/usr/lib64/NetworkManager/1.22.8-6.el8_2/libnm-settings-plugin-ifcfg-rh.so") Aug 27 12:05:12 localhost NetworkManager[1652]: <info> [1598529912.6932] device (lo): carrier: link connected Aug 27 12:05:12 localhost NetworkManager[1652]: <info> [1598529912.6935] manager: (lo): new Generic device (/org/freedesktop/NetworkManager/Devices/1) Aug 27 12:05:12 localhost NetworkManager[1652]: <info> [1598529912.6949] manager: (ens4): new Ethernet device (/org/freedesktop/NetworkManager/Devices/2) Aug 27 12:05:12 localhost NetworkManager[1652]: <info> [1598529912.6963] device (ens4): state change: unmanaged -> unavailable (reason 'managed', sys-iface-state: 'external') Aug 27 12:05:12 localhost NetworkManager[1652]: <info> [1598529912.7036] device (ens4): carrier: link connected Aug 27 12:05:12 localhost NetworkManager[1652]: <info> [1598529912.7090] device (ens4): state change: unavailable -> disconnected (reason 'none', sys-iface-state: 'managed') Aug 27 12:05:12 localhost NetworkManager[1652]: <info> [1598529912.7135] policy: auto-activating connection 'Wired Connection' (67de8da7-74d3-4af6-b30d-659ed36212d0) Aug 27 12:05:12 localhost NetworkManager[1652]: <info> [1598529912.7145] device (ens4): Activation: starting connection 'Wired Connection' (67de8da7-74d3-4af6-b30d-659ed36212d0) Aug 27 12:05:12 localhost NetworkManager[1652]: <info> [1598529912.7147] device (ens4): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'managed') Aug 27 12:05:12 localhost NetworkManager[1652]: <info> [1598529912.7152] manager: NetworkManager state is now CONNECTING Aug 27 12:05:12 localhost NetworkManager[1652]: <info> [1598529912.7155] device (ens4): state change: prepare -> config (reason 'none', sys-iface-state: 'managed') Aug 27 12:05:12 localhost NetworkManager[1652]: <info> [1598529912.7163] device (ens4): state change: config -> ip-config (reason 'none', sys-iface-state: 'managed') Aug 27 12:05:12 localhost NetworkManager[1652]: <info> [1598529912.7168] dhcp4 (ens4): activation: beginning transaction (timeout in 45 seconds) Aug 27 12:05:12 localhost NetworkManager[1652]: <info> [1598529912.7201] dhcp4 (ens4): option dhcp_lease_time => '86400' Aug 27 12:05:12 localhost NetworkManager[1652]: <info> [1598529912.7201] dhcp4 (ens4): option domain_name => 'c.openshift-gce-devel-ci.internal' Aug 27 12:05:12 localhost NetworkManager[1652]: <info> [1598529912.7201] dhcp4 (ens4): option domain_name_servers => '169.254.169.254' Aug 27 12:05:12 localhost NetworkManager[1652]: <info> [1598529912.7201] dhcp4 (ens4): option domain_search => 'c.openshift-gce-devel-ci.internal google.internal' Aug 27 12:05:12 localhost NetworkManager[1652]: <info> [1598529912.7201] dhcp4 (ens4): option expiry => '1598616312' Aug 27 12:05:12 localhost NetworkManager[1652]: <info> [1598529912.7201] dhcp4 (ens4): option host_name => 'ci-ln-crxcz5b-f76d1-wmk6z-master-2.c.openshift-gce-devel-ci.internal' Aug 27 12:05:12 localhost NetworkManager[1652]: <info> [1598529912.7201] dhcp4 (ens4): option interface_mtu => '1460' Aug 27 12:05:12 localhost NetworkManager[1652]: <info> [1598529912.7201] dhcp4 (ens4): option ip_address => '10.0.0.3' Aug 27 12:05:12 localhost NetworkManager[1652]: <info> [1598529912.7201] dhcp4 (ens4): option next_server => '10.0.0.1' Aug 27 12:05:12 localhost NetworkManager[1652]: <info> [1598529912.7201] dhcp4 (ens4): option ntp_servers => '169.254.169.254' But I am not sure, if that's normal or not.
So I kind of get the feeling here that the problem is not how our ovs-configuration script does things. I think the problem is how the Node API object sets the `node.status.addresses` on Openstack. The InternalIP address does not correspond to the primary NIC address, as on GCP/AWS received from DHCP - which I *think* it should...?
FYI, follow up on #comment 4 Here's the output from master node: huir-0827-mnfkm-control-plane-0 [root@huir-0827-mnfkm-control-plane-0 core]# ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master ovs-system state UP group default qlen 1000 link/ether fa:16:3e:5e:5a:d7 brd ff:ff:ff:ff:ff:ff 3: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether 5e:28:ab:21:c2:56 brd ff:ff:ff:ff:ff:ff 4: genev_sys_6081: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65000 qdisc noqueue master ovs-system state UNKNOWN group default qlen 1000 link/ether 6a:ea:c5:40:c2:f7 brd ff:ff:ff:ff:ff:ff inet6 fe80::68ea:c5ff:fe40:c2f7/64 scope link valid_lft forever preferred_lft forever 5: ovn-k8s-mp0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue state UNKNOWN group default qlen 1000 link/ether ce:fe:f3:93:76:22 brd ff:ff:ff:ff:ff:ff inet 10.129.0.2/23 brd 10.129.1.255 scope global ovn-k8s-mp0 valid_lft forever preferred_lft forever 6: br-int: <BROADCAST,MULTICAST> mtu 1400 qdisc noop state DOWN group default qlen 1000 link/ether 6e:ca:4c:9d:55:43 brd ff:ff:ff:ff:ff:ff 7: ovn-k8s-gw0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue state UNKNOWN group default qlen 1000 link/ether 0a:58:a9:fe:00:01 brd ff:ff:ff:ff:ff:ff inet 169.254.0.1/20 brd 169.254.15.255 scope global ovn-k8s-gw0 valid_lft forever preferred_lft forever 8: br-local: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue state UNKNOWN group default qlen 1000 link/ether 72:e6:39:92:e6:4b brd ff:ff:ff:ff:ff:ff inet6 fe80::70e6:39ff:fe92:e64b/64 scope link valid_lft forever preferred_lft forever 9: br-ex: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000 link/ether fa:16:3e:5e:5a:d7 brd ff:ff:ff:ff:ff:ff inet 10.0.97.89/22 brd 10.0.99.255 scope global dynamic noprefixroute br-ex valid_lft 80618sec preferred_lft 80618sec inet6 2620:52:0:60:1932:bc68:2020:af5d/64 scope global dynamic noprefixroute valid_lft 2591897sec preferred_lft 604697sec inet6 fe80::5064:6d88:5776:2c54/64 scope link noprefixroute valid_lft forever preferred_lft forever So the node object's InternalIP: 10.129.0.2 has been wired to ovn-k8s-mp0 not br-ex. So when the CNO bootstraps OVN it passes 10.129.0.2 to ovnkube-master / ovn-controller /etc and thus no database connection happens (because that's accessible on 10.0.97.89)
> So the node object's InternalIP: 10.129.0.2 has been wired to ovn-k8s-mp0 not br-ex. err... no, 10.129.0.2 is the correct IP for ovn-k8s-mp0; it's the gateway IP address of the pod network subnet assigned to that node. The question is why kubelet is taking the IP from ovn-k8s-mp0 and thinking it should declare that as its InternalIP...
OK, I think I've narrowed the problem down. On Openstack we run the kubelet with the flag `--cloud-priver=`, that means it's up to the kubelet to set the IP address without looking up the node's IP address from the external cloud provider. This is done here: https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/nodestatus/setters.go#L205 I've written a small program replicating that part (as the kubelet has no logging of this) ``` package main import ( "fmt" "net" "os" ) func validateNodeIP(nodeIP net.IP) error { // Honor IP limitations set in setNodeStatus() if nodeIP.To4() == nil && nodeIP.To16() == nil { return fmt.Errorf("nodeIP must be a valid IP address") } if nodeIP.IsLoopback() { return fmt.Errorf("nodeIP can't be loopback address") } if nodeIP.IsMulticast() { return fmt.Errorf("nodeIP can't be a multicast address") } if nodeIP.IsLinkLocalUnicast() { return fmt.Errorf("nodeIP can't be a link-local unicast address") } if nodeIP.IsUnspecified() { return fmt.Errorf("nodeIP can't be an all zeros address") } addrs, err := net.InterfaceAddrs() if err != nil { return err } for _, addr := range addrs { var ip net.IP switch v := addr.(type) { case *net.IPNet: ip = v.IP case *net.IPAddr: ip = v.IP } if ip != nil && ip.Equal(nodeIP) { return nil } } return fmt.Errorf("node IP: %q not found in the host's network interfaces", nodeIP.String()) } func main() { hostname, err := os.Hostname() if err != nil { fmt.Printf("unable to get hostname, err: %v", err) return } ips,err := net.LookupIP(hostname) if err != nil { fmt.Printf("An error occured, err: %v\n", err) } for _, ip := range ips { if err := validateNodeIP(ip); err == nil { fmt.Printf("IP is: %s\n", ip.String()) } else { fmt.Printf("IP: %s is skipped because: %v\n", ip.String(), err) } } } ``` On Openstack that program returns: $ ./tmp IP: fe80::c06d:abff:fe70:9a09 is skipped because: nodeIP can't be a link-local unicast address IP: fe80::58c3:b9ff:fe32:3348 is skipped because: nodeIP can't be a link-local unicast address IP: fe80::f64e:de02:c198:b6db is skipped because: nodeIP can't be a link-local unicast address IP: fe80::f4d3:1aff:fe0b:5765 is skipped because: nodeIP can't be a link-local unicast address IP: fe80::78a9:45ff:fe87:487 is skipped because: nodeIP can't be a link-local unicast address IP: fe80::4839:9dff:fe04:25d3 is skipped because: nodeIP can't be a link-local unicast address IP: fe80::7cb5:37ff:fe42:b1e5 is skipped because: nodeIP can't be a link-local unicast address IP: fe80::e81d:e3ff:fe9e:f894 is skipped because: nodeIP can't be a link-local unicast address IP: fe80::9087:99ff:fe3c:3d3 is skipped because: nodeIP can't be a link-local unicast address IP: fe80::dcfa:a5ff:feb1:3fdf is skipped because: nodeIP can't be a link-local unicast address IP: fe80::c4d5:f2ff:fec1:1acf is skipped because: nodeIP can't be a link-local unicast address IP: fe80::9828:d0ff:feca:7068 is skipped because: nodeIP can't be a link-local unicast address IP is: 2620:52:0:60:946a:c6c1:950f:c7aa IP: 169.254.0.1 is skipped because: nodeIP can't be a link-local unicast address IP is: 10.128.2.2 IP is: 10.0.97.10 As seen in the code referenced just before: the kubelet takes the first IPv4 address it finds and assigns that IP to the InternalIP address. Thus 10.128.2.2 (in this example) - which is ovn-k8s-mp0 address. This is presumably because the interface index of ovn-k8s-mp0 is lower than br-ex. $ ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master ovs-system state UP group default qlen 1000 link/ether fa:16:3e:f9:2a:63 brd ff:ff:ff:ff:ff:ff 3: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether 6e:46:b8:7e:14:ab brd ff:ff:ff:ff:ff:ff 4: genev_sys_6081: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65000 qdisc noqueue master ovs-system state UNKNOWN group default qlen 1000 link/ether c2:6d:ab:70:9a:09 brd ff:ff:ff:ff:ff:ff inet6 fe80::c06d:abff:fe70:9a09/64 scope link valid_lft forever preferred_lft forever 5: ovn-k8s-mp0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue state UNKNOWN group default qlen 1000 link/ether 3a:3d:46:4b:79:3e brd ff:ff:ff:ff:ff:ff inet 10.128.2.2/23 brd 10.128.3.255 scope global ovn-k8s-mp0 valid_lft forever preferred_lft forever 6: br-int: <BROADCAST,MULTICAST> mtu 1400 qdisc noop state DOWN group default qlen 1000 link/ether 12:87:98:62:3f:42 brd ff:ff:ff:ff:ff:ff 7: ovn-k8s-gw0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue state UNKNOWN group default qlen 1000 link/ether 0a:58:a9:fe:00:01 brd ff:ff:ff:ff:ff:ff inet 169.254.0.1/20 brd 169.254.15.255 scope global ovn-k8s-gw0 valid_lft forever preferred_lft forever 8: br-local: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue state UNKNOWN group default qlen 1000 link/ether 5a:c3:b9:32:33:48 brd ff:ff:ff:ff:ff:ff inet6 fe80::58c3:b9ff:fe32:3348/64 scope link valid_lft forever preferred_lft forever 9: br-ex: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000 link/ether fa:16:3e:f9:2a:63 brd ff:ff:ff:ff:ff:ff inet 10.0.97.10/22 brd 10.0.99.255 scope global dynamic noprefixroute br-ex valid_lft 81365sec preferred_lft 81365sec inet6 2620:52:0:60:946a:c6c1:950f:c7aa/64 scope global dynamic noprefixroute valid_lft 2592000sec preferred_lft 604800sec inet6 fe80::f64e:de02:c198:b6db/64 scope link noprefixroute valid_lft forever preferred_lft forever 10: 13746d52afb719d@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue master ovs-system state UP group default link/ether f6:d3:1a:0b:57:65 brd ff:ff:ff:ff:ff:ff link-netns 2a7a011b-a946-466b-9db5-b09c024804ad inet6 fe80::f4d3:1aff:fe0b:5765/64 scope link valid_lft forever preferred_lft forever 11: e6bcf62aaec7a73@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue master ovs-system state UP group default link/ether 7a:a9:45:87:04:87 brd ff:ff:ff:ff:ff:ff link-netns a6aba99b-d33f-47e8-8aca-8a44dd170bde inet6 fe80::78a9:45ff:fe87:487/64 scope link valid_lft forever preferred_lft forever 19: 80a85b68987e3c0@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue master ovs-system state UP group default link/ether 4a:39:9d:04:25:d3 brd ff:ff:ff:ff:ff:ff link-netns 0687789c-38f1-48aa-b3cf-fb889943f620 inet6 fe80::4839:9dff:fe04:25d3/64 scope link valid_lft forever preferred_lft forever 20: 447a23cff658fb1@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue master ovs-system state UP group default link/ether 7e:b5:37:42:b1:e5 brd ff:ff:ff:ff:ff:ff link-netns c60591ef-1255-441d-8b61-927ced05baf4 inet6 fe80::7cb5:37ff:fe42:b1e5/64 scope link valid_lft forever preferred_lft forever 21: 3bcfb45b7e5b54b@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue master ovs-system state UP group default link/ether ea:1d:e3:9e:f8:94 brd ff:ff:ff:ff:ff:ff link-netns 1687712b-b961-4811-a930-2a55e8e8a0d1 inet6 fe80::e81d:e3ff:fe9e:f894/64 scope link valid_lft forever preferred_lft forever 22: 3f4708a4fbcd6fc@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue master ovs-system state UP group default link/ether 92:87:99:3c:03:d3 brd ff:ff:ff:ff:ff:ff link-netns c6a1907f-8dc6-4eec-9fc3-e5db3ffce3d0 inet6 fe80::9087:99ff:fe3c:3d3/64 scope link valid_lft forever preferred_lft forever 23: c76d2f65086ccba@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue master ovs-system state UP group default link/ether de:fa:a5:b1:3f:df brd ff:ff:ff:ff:ff:ff link-netns 82450605-64fa-4d0d-875f-5baeb9c53ac9 inet6 fe80::dcfa:a5ff:feb1:3fdf/64 scope link valid_lft forever preferred_lft forever 24: 7b1302aa21edfa4@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue master ovs-system state UP group default link/ether c6:d5:f2:c1:1a:cf brd ff:ff:ff:ff:ff:ff link-netns 265e51ac-b4e0-4e8c-9d90-43e53a6d753c inet6 fe80::c4d5:f2ff:fec1:1acf/64 scope link valid_lft forever preferred_lft forever 25: 94579c8c8debc8c@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue master ovs-system state UP group default link/ether 9a:28:d0:ca:70:68 brd ff:ff:ff:ff:ff:ff link-netns 889bb976-d6ca-4d3e-8f8b-f4c75dca4e28 inet6 fe80::9828:d0ff:feca:7068/64 scope link valid_lft forever preferred_lft forever The question is however why that net.LookupIP(hostname) returns ALL IP addresses on the host. On GCP we have the following: $ ./tmpok IP is: 10.0.0.5 sh-4.4# ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: ens4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1460 qdisc mq master ovs-system state UP group default qlen 1000 link/ether 42:01:0a:00:00:05 brd ff:ff:ff:ff:ff:ff 3: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether 2e:b7:61:1c:26:2d brd ff:ff:ff:ff:ff:ff 4: br-ex: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1460 qdisc noqueue state UNKNOWN group default qlen 1000 link/ether 42:01:0a:00:00:05 brd ff:ff:ff:ff:ff:ff inet 10.0.0.5/32 scope global dynamic noprefixroute br-ex valid_lft 79526sec preferred_lft 79526sec inet6 fe80::ea51:ba1f:982a:a7ea/64 scope link noprefixroute valid_lft forever preferred_lft forever 5: br-int: <BROADCAST,MULTICAST> mtu 1360 qdisc noop state DOWN group default qlen 1000 link/ether 9e:9c:c0:e9:9d:49 brd ff:ff:ff:ff:ff:ff 6: genev_sys_6081: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65000 qdisc noqueue master ovs-system state UNKNOWN group default qlen 1000 link/ether da:68:16:3b:14:6f brd ff:ff:ff:ff:ff:ff inet6 fe80::d868:16ff:fe3b:146f/64 scope link valid_lft forever preferred_lft forever 7: br-local: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1360 qdisc noqueue state UNKNOWN group default qlen 1000 link/ether 96:a8:02:8f:9d:48 brd ff:ff:ff:ff:ff:ff inet6 fe80::94a8:2ff:fe8f:9d48/64 scope link valid_lft forever preferred_lft forever 8: ovn-k8s-gw0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1360 qdisc noqueue state UNKNOWN group default qlen 1000 link/ether 0a:58:a9:fe:00:01 brd ff:ff:ff:ff:ff:ff inet 169.254.0.1/20 brd 169.254.15.255 scope global ovn-k8s-gw0 valid_lft forever preferred_lft forever 9: ovn-k8s-mp0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1360 qdisc noqueue state UNKNOWN group default qlen 1000 link/ether 42:cd:38:c8:aa:6f brd ff:ff:ff:ff:ff:ff inet 10.128.4.2/23 brd 10.128.5.255 scope global ovn-k8s-mp0 valid_lft forever preferred_lft forever 10: aa23b699c337113@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1360 qdisc noqueue master ovs-system state UP group default link/ether 1e:a3:71:a5:5a:06 brd ff:ff:ff:ff:ff:ff link-netns 3f0d30d7-a923-4432-9ae1-054b5c05fc5b inet6 fe80::1ca3:71ff:fea5:5a06/64 scope link valid_lft forever preferred_lft forever 11: d0d67f8fb285180@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1360 qdisc noqueue master ovs-system state UP group default link/ether 76:f4:d9:6e:02:da brd ff:ff:ff:ff:ff:ff link-netns e061ecad-dbcc-4219-bcc3-6cb0c5f0cf00 inet6 fe80::74f4:d9ff:fe6e:2da/64 scope link valid_lft forever preferred_lft forever 12: f38aafbace1a8dd@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1360 qdisc noqueue master ovs-system state UP group default link/ether ce:5f:f1:e8:a7:9a brd ff:ff:ff:ff:ff:ff link-netns 1f2b9538-1dde-4dea-81e9-6fda3765b5a3 inet6 fe80::cc5f:f1ff:fee8:a79a/64 scope link valid_lft forever preferred_lft forever 13: f3558d80694bda7@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1360 qdisc noqueue master ovs-system state UP group default link/ether ee:96:09:9a:14:f0 brd ff:ff:ff:ff:ff:ff link-netns 0f6ff284-b395-4733-9070-d8eda907e852 inet6 fe80::ec96:9ff:fe9a:14f0/64 scope link valid_lft forever preferred_lft forever 14: 8d42766cc828067@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1360 qdisc noqueue master ovs-system state UP group default link/ether d6:3f:1c:2f:22:88 brd ff:ff:ff:ff:ff:ff link-netns 79dc4dca-b32d-439d-9ec6-e3f2e98160a5 inet6 fe80::d43f:1cff:fe2f:2288/64 scope link valid_lft forever preferred_lft forever 15: 1f32cf96774d45d@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1360 qdisc noqueue master ovs-system state UP group default link/ether 66:42:d7:66:35:ca brd ff:ff:ff:ff:ff:ff link-netns 11b2dd97-f4ed-44b0-b8c2-b4797a111284 inet6 fe80::6442:d7ff:fe66:35ca/64 scope link valid_lft forever preferred_lft forever 16: 533e310648dc506@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1360 qdisc noqueue master ovs-system state UP group default link/ether 9e:cf:08:b9:30:3b brd ff:ff:ff:ff:ff:ff link-netns 4b74495f-5acd-4a00-b7c9-29d542cdfd0d inet6 fe80::9ccf:8ff:feb9:303b/64 scope link valid_lft forever preferred_lft forever 18: d59ee44a51524f9@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1360 qdisc noqueue state UP group default link/ether 66:8c:2b:6d:18:ed brd ff:ff:ff:ff:ff:ff link-netns d14f65b8-5910-4d76-b518-758da0409bff inet6 fe80::648c:2bff:fe6d:18ed/64 scope link valid_lft forever preferred_lft forever 19: 2c5a37360eff43b@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1360 qdisc noqueue master ovs-system state UP group default link/ether 6e:5f:f8:b2:5c:47 brd ff:ff:ff:ff:ff:ff link-netns cd6d1582-b97f-4c35-b38f-6e1ee54ab89c inet6 fe80::6c5f:f8ff:feb2:5c47/64 scope link valid_lft forever preferred_lft forever 20: 896c515ba98661e@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1360 qdisc noqueue state UP group default link/ether ae:be:3b:07:26:5d brd ff:ff:ff:ff:ff:ff link-netns cdade3b6-5250-483f-8c85-a97f1a96604d inet6 fe80::acbe:3bff:fe07:265d/64 scope link valid_lft forever preferred_lft forever 21: c97564e5f2c9c1c@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1360 qdisc noqueue state UP group default link/ether 2a:79:0f:60:f0:2e brd ff:ff:ff:ff:ff:ff link-netns 1cc96cee-b56b-4c50-8745-f99442961c23 inet6 fe80::2879:fff:fe60:f02e/64 scope link valid_lft forever preferred_lft forever 22: ca9b67cf20f9d46@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1360 qdisc noqueue state UP group default link/ether 8e:43:5b:ac:a9:33 brd ff:ff:ff:ff:ff:ff link-netns 897e930f-3ab8-47fd-8a6f-8bc04cafdf03 inet6 fe80::8c43:5bff:feac:a933/64 scope link valid_lft forever preferred_lft forever 23: 5e59097cbb9f962@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1360 qdisc noqueue master ovs-system state UP group default link/ether 2a:92:a0:e7:b1:a1 brd ff:ff:ff:ff:ff:ff link-netns 90dcd4e2-40a7-453f-9ba4-5ee9fb73ed22 inet6 fe80::2892:a0ff:fee7:b1a1/64 scope link valid_lft forever preferred_lft forever 25: 94fd7748b53a0be@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1360 qdisc noqueue state UP group default link/ether c2:02:51:9e:a4:bb brd ff:ff:ff:ff:ff:ff link-netns 5b123311-03db-4c8c-b0c3-49e4435e1724 inet6 fe80::c002:51ff:fe9e:a4bb/64 scope link valid_lft forever preferred_lft forever 26: e302b091c47156a@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1360 qdisc noqueue state UP group default link/ether 8e:2f:d5:ad:dd:6f brd ff:ff:ff:ff:ff:ff link-netns 964fba77-5d9c-4599-84ce-1e6640c0c09d inet6 fe80::8c2f:d5ff:fead:dd6f/64 scope link valid_lft forever preferred_lft forever 27: 2121f4f3184ad97@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1360 qdisc noqueue state UP group default link/ether 2e:b9:5c:93:54:54 brd ff:ff:ff:ff:ff:ff link-netns 59fe64f4-39d0-407d-a653-47eafa67ca06 inet6 fe80::2cb9:5cff:fe93:5454/64 scope link valid_lft forever preferred_lft forever 28: 991de499c7f0e90@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1360 qdisc noqueue state UP group default link/ether fa:51:72:bb:2d:1d brd ff:ff:ff:ff:ff:ff link-netns d9815926-571d-4b30-baac-8726635d25b5 inet6 fe80::f851:72ff:febb:2d1d/64 scope link valid_lft forever preferred_lft forever 29: db6e32a83ad3361@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1360 qdisc noqueue state UP group default link/ether 36:8c:9e:53:6f:e6 brd ff:ff:ff:ff:ff:ff link-netns 21b22dbf-289c-4339-8a9e-80a04c43c208 inet6 fe80::348c:9eff:fe53:6fe6/64 scope link valid_lft forever preferred_lft forever 38: 02fcf25e68d242b@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1360 qdisc noqueue state UP group default link/ether 2e:f3:df:12:3e:9b brd ff:ff:ff:ff:ff:ff link-netns 6bedfea4-2cc7-4973-8418-db3a4a496fc3 inet6 fe80::2cf3:dfff:fe12:3e9b/64 scope link valid_lft forever preferred_lft forever 39: 87de57854192a25@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1360 qdisc noqueue master ovs-system state UP group default link/ether 66:a5:1a:5c:ad:f1 brd ff:ff:ff:ff:ff:ff link-netns bce20730-3a3c-467e-921d-a03e6509a6d2 inet6 fe80::64a5:1aff:fe5c:adf1/64 scope link valid_lft forever preferred_lft forever 40: 29509152075b4ca@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1360 qdisc noqueue state UP group default link/ether 92:f6:99:1c:44:c9 brd ff:ff:ff:ff:ff:ff link-netns a0ec0d4a-bdfb-439c-ba6b-b3048749f76b inet6 fe80::90f6:99ff:fe1c:44c9/64 scope link valid_lft forever preferred_lft forever 41: 87f66223949b7ab@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1360 qdisc noqueue state UP group default link/ether f2:ba:0b:26:62:cd brd ff:ff:ff:ff:ff:ff link-netns ce5daba5-9c48-46ba-98cd-721f578862b2 inet6 fe80::f0ba:bff:fe26:62cd/64 scope link valid_lft forever preferred_lft forever 42: 2a6c94645da32d8@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1360 qdisc noqueue master ovs-system state UP group default link/ether ce:46:e3:3b:89:57 brd ff:ff:ff:ff:ff:ff link-netns 7bed194a-332f-4753-a548-9c0aabfecb0d inet6 fe80::cc46:e3ff:fe3b:8957/64 scope link valid_lft forever preferred_lft forever 44: 4f2b3e2d36fd42c@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1360 qdisc noqueue state UP group default link/ether fa:28:90:60:cf:ee brd ff:ff:ff:ff:ff:ff link-netns 99aa630d-6741-495a-88dd-38276cf2e6a4 inet6 fe80::f828:90ff:fe60:cfee/64 scope link valid_lft forever preferred_lft forever 45: 23a9306132b5a77@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1360 qdisc noqueue state UP group default link/ether 6e:4c:b4:28:99:b8 brd ff:ff:ff:ff:ff:ff link-netns 92d67cea-87ac-4e71-b14c-7c48629f0f7d inet6 fe80::6c4c:b4ff:fe28:99b8/64 scope link valid_lft forever preferred_lft forever 46: 76ef5a1cdc02ec9@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1360 qdisc noqueue state UP group default link/ether 72:60:02:ef:c5:3e brd ff:ff:ff:ff:ff:ff link-netns 5ab2bd94-f5d0-41e2-9686-710843f7b798 inet6 fe80::7060:2ff:feef:c53e/64 scope link valid_lft forever preferred_lft forever 47: b288995804c5fd5@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1360 qdisc noqueue state UP group default link/ether ee:38:41:fe:ba:55 brd ff:ff:ff:ff:ff:ff link-netns dace91c9-4c34-4480-b6eb-cc548d98a755 inet6 fe80::ec38:41ff:fefe:ba55/64 scope link valid_lft forever preferred_lft forever 48: 3d2df86362b4ef7@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1360 qdisc noqueue state UP group default link/ether 06:1e:89:5f:df:f3 brd ff:ff:ff:ff:ff:ff link-netns 4b9dba63-12e1-456e-8f48-9776d9108880 inet6 fe80::41e:89ff:fe5f:dff3/64 scope link valid_lft forever preferred_lft forever 49: e487ec1aa953704@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1360 qdisc noqueue state UP group default link/ether 22:d0:5a:bb:f0:6c brd ff:ff:ff:ff:ff:ff link-netns 015edebb-b16b-4bc1-9ab4-ca3a8843e0dd inet6 fe80::20d0:5aff:febb:f06c/64 scope link valid_lft forever preferred_lft forever 50: ee0e00a84e719a5@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1360 qdisc noqueue state UP group default link/ether f2:3b:c0:12:8d:84 brd ff:ff:ff:ff:ff:ff link-netns f4ed6666-d714-4571-9bd4-a14bf0133bb1 inet6 fe80::f03b:c0ff:fe12:8d84/64 scope link valid_lft forever preferred_lft forever 51: 4613fd5c6234eed@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1360 qdisc noqueue state UP group default link/ether 92:64:38:bd:21:df brd ff:ff:ff:ff:ff:ff link-netns c28f35d5-7d36-4408-83ff-1dd98b045892 inet6 fe80::9064:38ff:febd:21df/64 scope link valid_lft forever preferred_lft forever 52: c1a27a9b6235c7a@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1360 qdisc noqueue state UP group default link/ether 6a:2a:23:7d:d6:42 brd ff:ff:ff:ff:ff:ff link-netns f5dbe6db-1cca-46a4-8a37-01f7ca266f8d inet6 fe80::682a:23ff:fe7d:d642/64 scope link valid_lft forever preferred_lft forever 53: 44a44a06dcca16e@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1360 qdisc noqueue state UP group default link/ether 02:b1:5a:cb:80:91 brd ff:ff:ff:ff:ff:ff link-netns 555a7550-f8ee-47e0-bb79-ca15df343ec5 inet6 fe80::b1:5aff:fecb:8091/64 scope link valid_lft forever preferred_lft forever 54: 40a4c03bf129ad8@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1360 qdisc noqueue master ovs-system state UP group default link/ether da:13:7c:b0:55:39 brd ff:ff:ff:ff:ff:ff link-netns 58027774-aeb3-46c8-a950-f4191b576ee1 inet6 fe80::d813:7cff:feb0:5539/64 scope link valid_lft forever preferred_lft forever 55: 1c395cddee58390@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1360 qdisc noqueue state UP group default link/ether 8e:4c:56:cc:a9:5d brd ff:ff:ff:ff:ff:ff link-netns f16ec94d-0da9-47d6-bcb0-543def0d4af9 inet6 fe80::8c4c:56ff:fecc:a95d/64 scope link valid_lft forever preferred_lft forever 56: 256eb51f3699d73@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1360 qdisc noqueue state UP group default link/ether e2:5f:2b:b3:26:8c brd ff:ff:ff:ff:ff:ff link-netns ba1b6923-cca7-41d2-ab96-a94763111aa6 inet6 fe80::e05f:2bff:feb3:268c/64 scope link valid_lft forever preferred_lft forever 57: 039dffb3684cb8e@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1360 qdisc noqueue state UP group default link/ether 9e:29:cb:4e:d8:f6 brd ff:ff:ff:ff:ff:ff link-netns b7fd9b7d-4fa5-446c-a81f-36d214b46b83 inet6 fe80::9c29:cbff:fe4e:d8f6/64 scope link valid_lft forever preferred_lft forever 58: 7dc52a9a2fef44a@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1360 qdisc noqueue state UP group default link/ether 76:ad:7d:2c:54:e1 brd ff:ff:ff:ff:ff:ff link-netns 27e9e3fb-ff2b-4968-b31f-f665d8e60e5a inet6 fe80::74ad:7dff:fe2c:54e1/64 scope link valid_lft forever preferred_lft forever 59: 004c6d102a681cb@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1360 qdisc noqueue state UP group default link/ether 6a:47:c5:f4:eb:0e brd ff:ff:ff:ff:ff:ff link-netns 5eaa270f-b791-4b10-b284-57ed19b17a74 inet6 fe80::6847:c5ff:fef4:eb0e/64 scope link valid_lft forever preferred_lft forever 60: 54f239882489211@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1360 qdisc noqueue state UP group default link/ether 06:6c:09:11:c7:44 brd ff:ff:ff:ff:ff:ff link-netns ab8bca34-284a-4672-ad21-92ed414d504f inet6 fe80::46c:9ff:fe11:c744/64 scope link valid_lft forever preferred_lft forever The answer can be found in /etc/nsswitch.conf $ cat /etc/nsswitch.conf ... hosts: files dns myhostname And specifically myhostname myhostname is a GNU plugin, see: http://0pointer.de/lennart/projects/nss-myhostname/ The important part in that link is: > nss-myhostname simply returns all locally configure public IP addresses Which means that any IP address configured on an Openstack machine is defined as "public". I.e doing: $ getent ahosts `hostname` fe80::c06d:abff:fe70:9a09 STREAM huir-0828-cf5hd-compute-0 fe80::c06d:abff:fe70:9a09 DGRAM fe80::c06d:abff:fe70:9a09 RAW fe80::58c3:b9ff:fe32:3348 STREAM fe80::58c3:b9ff:fe32:3348 DGRAM fe80::58c3:b9ff:fe32:3348 RAW fe80::f64e:de02:c198:b6db STREAM fe80::f64e:de02:c198:b6db DGRAM fe80::f64e:de02:c198:b6db RAW fe80::f4d3:1aff:fe0b:5765 STREAM fe80::f4d3:1aff:fe0b:5765 DGRAM fe80::f4d3:1aff:fe0b:5765 RAW fe80::78a9:45ff:fe87:487 STREAM fe80::78a9:45ff:fe87:487 DGRAM fe80::78a9:45ff:fe87:487 RAW fe80::4839:9dff:fe04:25d3 STREAM fe80::4839:9dff:fe04:25d3 DGRAM fe80::4839:9dff:fe04:25d3 RAW fe80::7cb5:37ff:fe42:b1e5 STREAM fe80::7cb5:37ff:fe42:b1e5 DGRAM fe80::7cb5:37ff:fe42:b1e5 RAW fe80::e81d:e3ff:fe9e:f894 STREAM fe80::e81d:e3ff:fe9e:f894 DGRAM fe80::e81d:e3ff:fe9e:f894 RAW fe80::9087:99ff:fe3c:3d3 STREAM fe80::9087:99ff:fe3c:3d3 DGRAM fe80::9087:99ff:fe3c:3d3 RAW fe80::dcfa:a5ff:feb1:3fdf STREAM fe80::dcfa:a5ff:feb1:3fdf DGRAM fe80::dcfa:a5ff:feb1:3fdf RAW fe80::c4d5:f2ff:fec1:1acf STREAM fe80::c4d5:f2ff:fec1:1acf DGRAM fe80::c4d5:f2ff:fec1:1acf RAW fe80::9828:d0ff:feca:7068 STREAM fe80::9828:d0ff:feca:7068 DGRAM fe80::9828:d0ff:feca:7068 RAW 2620:52:0:60:946a:c6c1:950f:c7aa STREAM 2620:52:0:60:946a:c6c1:950f:c7aa DGRAM 2620:52:0:60:946a:c6c1:950f:c7aa RAW 169.254.0.1 STREAM 169.254.0.1 DGRAM 169.254.0.1 RAW 10.128.2.2 STREAM 10.128.2.2 DGRAM 10.128.2.2 RAW 10.0.97.10 STREAM 10.0.97.10 DGRAM 10.0.97.10 RAW I need to find out why that is though and what configures that.
> The question is however why that net.LookupIP(hostname) returns ALL IP addresses on the host. On GCP we have the following: > > $ ./tmpok > IP is: 10.0.0.5 So you're implying that we don't have "myhostname" in /etc/nsswitch.conf on GCP but we do on OpenStack? Either way, it sounds like kubelet's autodetection behavior and "hosts myhostname" are not compatible... In particular, it seems like kubelet is more to blame here, since it's not even checking that the IP it picks is routable off the host. It needs to intersect the return value of `LookupIP` with the set of IPs that could theoretically have been returned from `utilnet.ChooseHostInterface` or somethign... > This is presumably because the interface index of ovn-k8s-mp0 is lower than br-ex. That doesn't make sense though... configure-ovs-network runs well before ovnkube-node starts, so br-ex should have a lower interface index than any of the other ovn-kube-related interfaces... Did it get deleted and recreated? Can you check dmesg and/or NetworkManager journals to see how/when the various interfaces were created? So: 1. If we can fix the index ordering, the bug will probably go away; this may be the easiest fix. 2. kubelet's baremetal IP-finding code is wrong and we should fix it upstream, and that would fix the problem if we can't fix the index ordering. (There are other problems with that code too, like the fact that uses the node *name* where it should be using the node *hostname*, so we have to figure out how much fixing we want to do...) 3. If the OCP default is to *not* use "myhostname" and that's something that's being added for OpenStack, then removing that might fix the problem, but presumably that would have other side effects and is probably not an option.
> So you're implying that we don't have "myhostname" in /etc/nsswitch.conf on GCP but we do on OpenStack? No, I am saying the output of that DNS lookup (which is equal to getent ahosts `hostname`) is different. And specifically, on GCP the output only contains the eth0 IPv4, while on Openstack it contains all IPs for all interfaces on the node - which is strange given that http://0pointer.de/lennart/projects/nss-myhostname/ specifies that it should return "all locally configure public IP addresses", which ovn-k8s-mp0 should not be for example. > Did it get deleted and recreated? Can you check dmesg and/or NetworkManager journals to see how/when the various interfaces were created? I will check the interface ordering and update this BZ with my findings.
(In reply to Alexander Constantinescu from comment #14) > > So you're implying that we don't have "myhostname" in /etc/nsswitch.conf on GCP but we do on OpenStack? > > No, I am saying the output of that DNS lookup (which is equal to getent > ahosts `hostname`) is different. Ah, ok; so (presumably) nsswitch.conf says "hosts: files dns myhostname" on both, but on GCP systems, either "files" or "dns" succeeds so it never gets to "myhostname". Which makes sense; in GCP/AWS, the cloud makes sure there are DNS records for the nodes, whereas in OpenStack that wouldn't happen. (Perhaps this could also be fixed in OpenStack by adding an appropriate alias to /etc/hosts.)
> either "files" or "dns" succeeds so it never gets to "myhostname". It's true but it doesn't matter, because as I mentioned: the DNS resolution of myhostname (equivalent to "getent ahosts `hostname`") returns "only the eth0 IPv4" on GCP. Only Openstack returns that screwy list of all IPs. Even my local libvirt cluster returns the same result as on GCP: From my libvirt cluster: $ getent ahosts `hostname` 192.168.126.12 STREAM test-gz9pv-master-1.test.alexander 192.168.126.12 DGRAM 192.168.126.12 RAW $ ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master ovs-system state UP group default qlen 1000 link/ether 52:54:00:56:f9:70 brd ff:ff:ff:ff:ff:ff 3: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether fa:14:4b:fb:b1:a1 brd ff:ff:ff:ff:ff:ff 4: ovn-k8s-mp0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue state UNKNOWN group default qlen 1000 link/ether a6:85:9f:a1:9d:2c brd ff:ff:ff:ff:ff:ff inet 10.128.0.2/23 brd 10.128.1.255 scope global ovn-k8s-mp0 valid_lft forever preferred_lft forever 5: br-int: <BROADCAST,MULTICAST> mtu 1400 qdisc noop state DOWN group default qlen 1000 link/ether 16:30:f2:b5:0b:47 brd ff:ff:ff:ff:ff:ff 6: genev_sys_6081: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65000 qdisc noqueue master ovs-system state UNKNOWN group default qlen 1000 link/ether e2:41:79:73:b0:a2 brd ff:ff:ff:ff:ff:ff inet6 fe80::e041:79ff:fe73:b0a2/64 scope link valid_lft forever preferred_lft forever 7: ovn-k8s-gw0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue state UNKNOWN group default qlen 1000 link/ether 0a:58:a9:fe:00:01 brd ff:ff:ff:ff:ff:ff inet 169.254.0.1/20 brd 169.254.15.255 scope global ovn-k8s-gw0 valid_lft forever preferred_lft forever 8: br-local: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue state UNKNOWN group default qlen 1000 link/ether ea:3b:2d:a2:a1:4f brd ff:ff:ff:ff:ff:ff inet6 fe80::e83b:2dff:fea2:a14f/64 scope link valid_lft forever preferred_lft forever 9: br-ex: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000 link/ether 52:54:00:56:f9:70 brd ff:ff:ff:ff:ff:ff inet 192.168.126.12/24 brd 192.168.126.255 scope global dynamic noprefixroute br-ex valid_lft 3257sec preferred_lft 3257sec inet6 fe80::bc88:75ad:4faa:b92/64 scope link noprefixroute valid_lft forever preferred_lft forever 10: a95c973292b9d65@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue master ovs-system state UP group default link/ether c6:71:2a:2b:e5:2a brd ff:ff:ff:ff:ff:ff link-netns bd441c49-dc3e-4995-a844-45e0198b8e44 inet6 fe80::c471:2aff:fe2b:e52a/64 scope link valid_lft forever preferred_lft forever 11: 2fa27ccad01c4ba@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue master ovs-system state UP group default link/ether 1a:66:cf:f9:a3:4c brd ff:ff:ff:ff:ff:ff link-netns a2994b34-c922-4a45-a394-388a4553f041 inet6 fe80::1866:cfff:fef9:a34c/64 scope link valid_lft forever preferred_lft forever 13: d2b1fde3a4ccc64@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue master ovs-system state UP group default link/ether ba:30:82:40:67:d6 brd ff:ff:ff:ff:ff:ff link-netns 68b57816-d380-4243-8ec8-a6207d254caa inet6 fe80::b830:82ff:fe40:67d6/64 scope link valid_lft forever preferred_lft forever 14: b039a8d6bc12d71@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue master ovs-system state UP group default link/ether 52:69:92:ca:f7:77 brd ff:ff:ff:ff:ff:ff link-netns 03f6dc24-5a54-4df7-8e0a-fdf630e4a92d inet6 fe80::5069:92ff:feca:f777/64 scope link valid_lft forever preferred_lft forever 15: c8d6b4d871e6a9c@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue master ovs-system state UP group default link/ether 6e:6a:49:a8:93:7c brd ff:ff:ff:ff:ff:ff link-netns e11d2197-2705-4755-b4ad-7753defeafa4 inet6 fe80::6c6a:49ff:fea8:937c/64 scope link valid_lft forever preferred_lft forever 16: a71a7a45cac93ea@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue master ovs-system state UP group default link/ether 0a:ef:b5:8e:61:e1 brd ff:ff:ff:ff:ff:ff link-netns 85a4b12f-a490-419e-a4e0-9e2badd0cf5e inet6 fe80::8ef:b5ff:fe8e:61e1/64 scope link valid_lft forever preferred_lft forever 18: c5a270a46c08812@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue master ovs-system state UP group default link/ether 02:e4:71:c9:05:d6 brd ff:ff:ff:ff:ff:ff link-netns 99015917-8450-4b86-9b34-2c69a36d23e9 inet6 fe80::e4:71ff:fec9:5d6/64 scope link valid_lft forever preferred_lft forever Anyways, progress has been made again: I am able to reproduce this on my libvirt cluster locally on my computer. And I've modified the ovn-configuration.sh script to output `ip a` at its exit. There are several problems here, as you mentioned: > 2. kubelet's baremetal IP-finding code is wrong and we should fix it upstream, and that would fix the problem if we can't fix the index ordering. (There are other problems with that code too, like the fact that uses the node *name* where it should be using the node *hostname*, so we have to figure out how much fixing we want to do...) Then 3. Whatever VM configuration/cloud setting/kernel setting/fairy dust or unicorn has Openstack return ALL IPs across all interfaces for "getent ahosts `hostname`" should stop. Only publicly exposed IPs should be returned by that DNS resolution. Now, concerning 1. i.e: the re-ordering of interfaces: I've reproduced this on libvirt, and again: I don't know what causes it but once the node reboots the br-ex interface is placed last in the list. This is BEFORE ovnkube-node reboots and it is not caused by the ovs-configuration script. I know this because when the node reboots and br-ex is properly configured, the script does nothing but exists, see below output: -- Reboot -- Aug 28 15:51:31 test-gz9pv-master-1 systemd[1]: Starting Configures OVS with proper host networking configuration... Aug 28 15:51:31 test-gz9pv-master-1 configure-ovs.sh[1592]: + iface= Aug 28 15:51:31 test-gz9pv-master-1 configure-ovs.sh[1592]: + counter=0 Aug 28 15:51:31 test-gz9pv-master-1 configure-ovs.sh[1592]: + '[' 0 -lt 12 ']' Aug 28 15:51:31 test-gz9pv-master-1 configure-ovs.sh[1592]: ++ jq -r '.[0].dev' Aug 28 15:51:31 test-gz9pv-master-1 configure-ovs.sh[1592]: ++ ip -j route show default Aug 28 15:51:32 test-gz9pv-master-1 configure-ovs.sh[1592]: + iface=br-ex Aug 28 15:51:32 test-gz9pv-master-1 configure-ovs.sh[1592]: + [[ -n br-ex ]] Aug 28 15:51:32 test-gz9pv-master-1 configure-ovs.sh[1592]: + [[ br-ex != \n\u\l\l ]] Aug 28 15:51:32 test-gz9pv-master-1 configure-ovs.sh[1592]: + echo 'IPv4 Default gateway interface found: br-ex' Aug 28 15:51:32 test-gz9pv-master-1 configure-ovs.sh[1592]: IPv4 Default gateway interface found: br-ex Aug 28 15:51:32 test-gz9pv-master-1 configure-ovs.sh[1592]: + break Aug 28 15:51:32 test-gz9pv-master-1 configure-ovs.sh[1592]: + '[' br-ex = br-ex ']' Aug 28 15:51:32 test-gz9pv-master-1 configure-ovs.sh[1592]: + echo 'Networking already configured and up for br-ex!' Aug 28 15:51:32 test-gz9pv-master-1 configure-ovs.sh[1592]: Networking already configured and up for br-ex! Aug 28 15:51:32 test-gz9pv-master-1 configure-ovs.sh[1592]: + ip a Aug 28 15:51:32 test-gz9pv-master-1 systemd[1]: Started Configures OVS with proper host networking configuration. Aug 28 15:51:32 test-gz9pv-master-1 configure-ovs.sh[1592]: 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 Aug 28 15:51:32 test-gz9pv-master-1 configure-ovs.sh[1592]: link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 Aug 28 15:51:32 test-gz9pv-master-1 configure-ovs.sh[1592]: inet 127.0.0.1/8 scope host lo Aug 28 15:51:32 test-gz9pv-master-1 configure-ovs.sh[1592]: valid_lft forever preferred_lft forever Aug 28 15:51:32 test-gz9pv-master-1 configure-ovs.sh[1592]: inet6 ::1/128 scope host Aug 28 15:51:32 test-gz9pv-master-1 configure-ovs.sh[1592]: valid_lft forever preferred_lft forever Aug 28 15:51:32 test-gz9pv-master-1 configure-ovs.sh[1592]: 2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master ovs-system state UP group default qlen 1000 Aug 28 15:51:32 test-gz9pv-master-1 configure-ovs.sh[1592]: link/ether 52:54:00:56:f9:70 brd ff:ff:ff:ff:ff:ff Aug 28 15:51:32 test-gz9pv-master-1 configure-ovs.sh[1592]: 3: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 Aug 28 15:51:32 test-gz9pv-master-1 configure-ovs.sh[1592]: link/ether fa:14:4b:fb:b1:a1 brd ff:ff:ff:ff:ff:ff Aug 28 15:51:32 test-gz9pv-master-1 configure-ovs.sh[1592]: 4: ovn-k8s-mp0: <BROADCAST,MULTICAST> mtu 1400 qdisc noop state DOWN group default qlen 1000 Aug 28 15:51:32 test-gz9pv-master-1 configure-ovs.sh[1592]: link/ether a6:85:9f:a1:9d:2c brd ff:ff:ff:ff:ff:ff Aug 28 15:51:32 test-gz9pv-master-1 configure-ovs.sh[1592]: 5: br-int: <BROADCAST,MULTICAST> mtu 1400 qdisc noop state DOWN group default qlen 1000 Aug 28 15:51:32 test-gz9pv-master-1 configure-ovs.sh[1592]: link/ether 16:30:f2:b5:0b:47 brd ff:ff:ff:ff:ff:ff Aug 28 15:51:32 test-gz9pv-master-1 configure-ovs.sh[1592]: 6: genev_sys_6081: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65000 qdisc noqueue master ovs-system state UNKNOWN group default qlen 1000 Aug 28 15:51:32 test-gz9pv-master-1 configure-ovs.sh[1592]: link/ether e2:41:79:73:b0:a2 brd ff:ff:ff:ff:ff:ff Aug 28 15:51:32 test-gz9pv-master-1 configure-ovs.sh[1592]: inet6 fe80::e041:79ff:fe73:b0a2/64 scope link Aug 28 15:51:32 test-gz9pv-master-1 configure-ovs.sh[1592]: valid_lft forever preferred_lft forever Aug 28 15:51:32 test-gz9pv-master-1 configure-ovs.sh[1592]: 7: ovn-k8s-gw0: <BROADCAST,MULTICAST> mtu 1400 qdisc noop state DOWN group default qlen 1000 Aug 28 15:51:32 test-gz9pv-master-1 configure-ovs.sh[1592]: link/ether 0a:58:a9:fe:00:01 brd ff:ff:ff:ff:ff:ff Aug 28 15:51:32 test-gz9pv-master-1 configure-ovs.sh[1592]: 8: br-local: <BROADCAST,MULTICAST> mtu 1400 qdisc noop state DOWN group default qlen 1000 Aug 28 15:51:32 test-gz9pv-master-1 configure-ovs.sh[1592]: link/ether ea:3b:2d:a2:a1:4f brd ff:ff:ff:ff:ff:ff Aug 28 15:51:32 test-gz9pv-master-1 configure-ovs.sh[1592]: 9: br-ex: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000 Aug 28 15:51:32 test-gz9pv-master-1 configure-ovs.sh[1592]: link/ether 52:54:00:56:f9:70 brd ff:ff:ff:ff:ff:ff Aug 28 15:51:32 test-gz9pv-master-1 configure-ovs.sh[1592]: inet 192.168.126.12/24 brd 192.168.126.255 scope global dynamic noprefixroute br-ex Aug 28 15:51:32 test-gz9pv-master-1 configure-ovs.sh[1592]: valid_lft 3600sec preferred_lft 3600sec Aug 28 15:51:32 test-gz9pv-master-1 configure-ovs.sh[1592]: inet6 fe80::bc88:75ad:4faa:b92/64 scope link tentative noprefixroute Aug 28 15:51:32 test-gz9pv-master-1 configure-ovs.sh[1592]: valid_lft forever preferred_lft forever Aug 28 15:51:32 test-gz9pv-master-1 configure-ovs.sh[1592]: + exit 0 Aug 28 15:51:32 test-gz9pv-master-1 systemd[1]: ovs-configuration.service: Consumed 57ms CPU time br-ex is ninth and ovnkube-node has not started and the configuration script did nothing except echo some stuff. So either this is something left over from the previous ovnkube-node configuration of br-ex which triggers the re-ordering on restart, or NetworkManager does this (but I am unable to understand from reading its logs, see below) Here are NetworkManager logs from the reboot: Aug 28 15:51:31 localhost systemd[1]: Started Hostname Service. Aug 28 15:51:31 localhost NetworkManager[1420]: <info> [1598629891.0169] hostname: hostname: using hostnamed Aug 28 15:51:31 localhost NetworkManager[1420]: <info> [1598629891.0172] dns-mgr[0x563d62234250]: init: dns=default,systemd-resolved rc-manager=symlink Aug 28 15:51:31 localhost NetworkManager[1420]: <info> [1598629891.0231] Loaded device plugin: NMOvsFactory (/usr/lib64/NetworkManager/1.22.8-6.el8_2/libnm-device-plugin-ovs.so) Aug 28 15:51:31 localhost NetworkManager[1420]: <info> [1598629891.0249] Loaded device plugin: NMTeamFactory (/usr/lib64/NetworkManager/1.22.8-6.el8_2/libnm-device-plugin-team.so) Aug 28 15:51:31 localhost NetworkManager[1420]: <info> [1598629891.0250] manager: rfkill: Wi-Fi enabled by radio killswitch; enabled by state file Aug 28 15:51:31 localhost NetworkManager[1420]: <info> [1598629891.0251] manager: rfkill: WWAN enabled by radio killswitch; enabled by state file Aug 28 15:51:31 localhost dbus-daemon[1146]: [system] Activating via systemd: service name='org.freedesktop.nm_dispatcher' unit='dbus-org.freedesktop.nm-dispatcher.service' requested by ':1.6' (uid=0 pid=1420 comm="/usr/sbin/NetworkManag> Aug 28 15:51:31 localhost NetworkManager[1420]: <info> [1598629891.0252] manager: Networking is enabled by state file Aug 28 15:51:31 localhost NetworkManager[1420]: <info> [1598629891.0253] dhcp-init: Using DHCP client 'internal' Aug 28 15:51:31 localhost NetworkManager[1420]: <info> [1598629891.0256] settings: Loaded settings plugin: keyfile (internal) Aug 28 15:51:31 localhost NetworkManager[1420]: <info> [1598629891.0274] settings: Loaded settings plugin: ifcfg-rh ("/usr/lib64/NetworkManager/1.22.8-6.el8_2/libnm-settings-plugin-ifcfg-rh.so") Aug 28 15:51:31 localhost systemd[1]: Starting Network Manager Script Dispatcher Service... Aug 28 15:51:31 localhost NetworkManager[1420]: <info> [1598629891.0334] device (lo): carrier: link connected Aug 28 15:51:31 localhost NetworkManager[1420]: <info> [1598629891.0336] manager: (lo): new Generic device (/org/freedesktop/NetworkManager/Devices/1) Aug 28 15:51:31 localhost NetworkManager[1420]: <info> [1598629891.0345] manager: (br-int): new Open vSwitch Interface device (/org/freedesktop/NetworkManager/Devices/2) Aug 28 15:51:31 localhost NetworkManager[1420]: <info> [1598629891.0351] manager: (br-local): new Open vSwitch Interface device (/org/freedesktop/NetworkManager/Devices/3) Aug 28 15:51:31 localhost NetworkManager[1420]: <info> [1598629891.0358] manager: (ovn-k8s-gw0): new Open vSwitch Interface device (/org/freedesktop/NetworkManager/Devices/4) Aug 28 15:51:31 localhost NetworkManager[1420]: <info> [1598629891.0364] manager: (ovn-k8s-mp0): new Open vSwitch Interface device (/org/freedesktop/NetworkManager/Devices/5) Aug 28 15:51:31 localhost NetworkManager[1420]: <info> [1598629891.0372] manager: (ens3): new Ethernet device (/org/freedesktop/NetworkManager/Devices/6) Aug 28 15:51:31 localhost NetworkManager[1420]: <info> [1598629891.0376] device (ens3): state change: unmanaged -> unavailable (reason 'managed', sys-iface-state: 'external') Aug 28 15:51:31 localhost kernel: IPv6: ADDRCONF(NETDEV_UP): ens3: link is not ready Aug 28 15:51:31 localhost NetworkManager[1420]: <info> [1598629891.0435] device (ens3): carrier: link connected Aug 28 15:51:31 localhost NetworkManager[1420]: <info> [1598629891.0453] device (genev_sys_6081): carrier: link connected Aug 28 15:51:31 localhost NetworkManager[1420]: <info> [1598629891.0456] manager: (genev_sys_6081): new Generic device (/org/freedesktop/NetworkManager/Devices/7) Aug 28 15:51:31 localhost NetworkManager[1420]: <info> [1598629891.0464] manager: (br-ex): new Open vSwitch Port device (/org/freedesktop/NetworkManager/Devices/8) Aug 28 15:51:31 localhost dbus-daemon[1146]: [system] Successfully activated service 'org.freedesktop.nm_dispatcher' Aug 28 15:51:31 localhost NetworkManager[1420]: <info> [1598629891.0469] manager: (br-ex): new Open vSwitch Bridge device (/org/freedesktop/NetworkManager/Devices/9) Aug 28 15:51:31 localhost NetworkManager[1420]: <info> [1598629891.0473] device (br-ex): state change: unmanaged -> unavailable (reason 'managed', sys-iface-state: 'external') Aug 28 15:51:31 localhost NetworkManager[1420]: <info> [1598629891.0485] manager: (br-ex): new Open vSwitch Interface device (/org/freedesktop/NetworkManager/Devices/10) Aug 28 15:51:31 localhost NetworkManager[1420]: <info> [1598629891.0489] device (br-ex): state change: unmanaged -> unavailable (reason 'managed', sys-iface-state: 'external') Aug 28 15:51:31 localhost NetworkManager[1420]: <info> [1598629891.0502] manager: (ens3): new Open vSwitch Port device (/org/freedesktop/NetworkManager/Devices/11) Aug 28 15:51:31 localhost NetworkManager[1420]: <info> [1598629891.0507] device (ens3): state change: unmanaged -> unavailable (reason 'managed', sys-iface-state: 'external') Aug 28 15:51:31 localhost systemd[1]: Started Network Manager Script Dispatcher Service. Aug 28 15:51:31 localhost NetworkManager[1420]: <info> [1598629891.0572] ovs: ovs interface "92714f253c5cba9" ((null)) failed: could not open network device 92714f253c5cba9 (No such device) Aug 28 15:51:31 localhost NetworkManager[1420]: <info> [1598629891.0572] ovs: ovs interface "6eb64ae03e9a18f" ((null)) failed: could not open network device 6eb64ae03e9a18f (No such device) Aug 28 15:51:31 localhost NetworkManager[1420]: <info> [1598629891.0572] ovs: ovs interface "70f3adba4eb5107" ((null)) failed: could not open network device 70f3adba4eb5107 (No such device) Aug 28 15:51:31 localhost NetworkManager[1420]: <info> [1598629891.0573] ovs: ovs interface "18fb30d42c5147b" ((null)) failed: could not open network device 18fb30d42c5147b (No such device) Aug 28 15:51:31 localhost NetworkManager[1420]: <info> [1598629891.0573] ovs: ovs interface "bf59011c624959f" ((null)) failed: could not open network device bf59011c624959f (No such device) Aug 28 15:51:31 localhost NetworkManager[1420]: <info> [1598629891.0573] ovs: ovs interface "ddec5a12e243f08" ((null)) failed: could not open network device ddec5a12e243f08 (No such device) Aug 28 15:51:31 localhost NetworkManager[1420]: <info> [1598629891.0573] ovs: ovs interface "c488d2b84d35146" ((null)) failed: could not open network device c488d2b84d35146 (No such device) Aug 28 15:51:31 localhost NetworkManager[1420]: <info> [1598629891.0577] manager: (6eb64ae03e9a18f): new Open vSwitch Port device (/org/freedesktop/NetworkManager/Devices/12) Aug 28 15:51:31 localhost NetworkManager[1420]: <info> [1598629891.0584] manager: (92714f253c5cba9): new Open vSwitch Port device (/org/freedesktop/NetworkManager/Devices/13) Aug 28 15:51:31 localhost NetworkManager[1420]: <info> [1598629891.0591] manager: (patch-lnet-node_local_switch-to-br-int): new Open vSwitch Port device (/org/freedesktop/NetworkManager/Devices/14) Aug 28 15:51:31 localhost NetworkManager[1420]: <info> [1598629891.0599] manager: (patch-br-int-to-lnet-node_local_switch): new Open vSwitch Port device (/org/freedesktop/NetworkManager/Devices/15) Aug 28 15:51:31 localhost NetworkManager[1420]: <info> [1598629891.0611] manager: (patch-br-ex_test-gz9pv-master-1-to-br-int): new Open vSwitch Port device (/org/freedesktop/NetworkManager/Devices/16) Aug 28 15:51:31 localhost NetworkManager[1420]: <info> [1598629891.0619] manager: (br-int): new Open vSwitch Port device (/org/freedesktop/NetworkManager/Devices/17) Aug 28 15:51:31 localhost NetworkManager[1420]: <info> [1598629891.0626] manager: (patch-br-int-to-br-ex_test-gz9pv-master-1): new Open vSwitch Port device (/org/freedesktop/NetworkManager/Devices/18) Aug 28 15:51:31 localhost NetworkManager[1420]: <info> [1598629891.0633] manager: (ovn-k8s-mp0): new Open vSwitch Port device (/org/freedesktop/NetworkManager/Devices/19) Aug 28 15:51:31 localhost NetworkManager[1420]: <info> [1598629891.0641] manager: (bf59011c624959f): new Open vSwitch Port device (/org/freedesktop/NetworkManager/Devices/20) Aug 28 15:51:31 localhost NetworkManager[1420]: <info> [1598629891.0653] manager: (ovn-c6e3dd-0): new Open vSwitch Port device (/org/freedesktop/NetworkManager/Devices/21) Aug 28 15:51:31 localhost NetworkManager[1420]: <info> [1598629891.0679] manager: (ovn-k8s-gw0): new Open vSwitch Port device (/org/freedesktop/NetworkManager/Devices/22) Aug 28 15:51:31 localhost NetworkManager[1420]: <info> [1598629891.0689] manager: (ddec5a12e243f08): new Open vSwitch Port device (/org/freedesktop/NetworkManager/Devices/23) Aug 28 15:51:31 localhost NetworkManager[1420]: <info> [1598629891.0697] manager: (18fb30d42c5147b): new Open vSwitch Port device (/org/freedesktop/NetworkManager/Devices/24) Aug 28 15:51:31 localhost NetworkManager[1420]: <info> [1598629891.0705] manager: (c488d2b84d35146): new Open vSwitch Port device (/org/freedesktop/NetworkManager/Devices/25) Aug 28 15:51:31 localhost NetworkManager[1420]: <info> [1598629891.0713] manager: (70f3adba4eb5107): new Open vSwitch Port device (/org/freedesktop/NetworkManager/Devices/26) Aug 28 15:51:31 localhost NetworkManager[1420]: <info> [1598629891.0721] manager: (br-local): new Open vSwitch Port device (/org/freedesktop/NetworkManager/Devices/27) Aug 28 15:51:31 localhost NetworkManager[1420]: <info> [1598629891.0729] manager: (br-int): new Open vSwitch Bridge device (/org/freedesktop/NetworkManager/Devices/28) Aug 28 15:51:31 localhost NetworkManager[1420]: <info> [1598629891.0737] manager: (br-local): new Open vSwitch Bridge device (/org/freedesktop/NetworkManager/Devices/29) Aug 28 15:51:31 localhost NetworkManager[1420]: <info> [1598629891.0813] policy: auto-activating connection 'ovs-port-br-ex' (66688bb6-77a3-4e43-8bd4-ed620a470aca) Aug 28 15:51:31 localhost NetworkManager[1420]: <info> [1598629891.0816] device (ens3): state change: unavailable -> disconnected (reason 'none', sys-iface-state: 'managed') Aug 28 15:51:31 localhost NetworkManager[1420]: <info> [1598629891.0823] device (br-ex): state change: unavailable -> disconnected (reason 'none', sys-iface-state: 'managed') Aug 28 15:51:31 localhost NetworkManager[1420]: <info> [1598629891.0825] device (br-ex): state change: unavailable -> disconnected (reason 'none', sys-iface-state: 'managed') Aug 28 15:51:31 localhost NetworkManager[1420]: <info> [1598629891.0827] device (ens3): state change: unavailable -> disconnected (reason 'none', sys-iface-state: 'managed') Aug 28 15:51:31 localhost NetworkManager[1420]: <info> [1598629891.0833] device (br-ex): state change: unmanaged -> unavailable (reason 'managed', sys-iface-state: 'external') Aug 28 15:51:31 localhost NetworkManager[1420]: <info> [1598629891.0841] device (br-ex): state change: unavailable -> disconnected (reason 'user-requested', sys-iface-state: 'managed') Aug 28 15:51:31 localhost NetworkManager[1420]: <info> [1598629891.0848] device (br-ex): Activation: starting connection 'ovs-port-br-ex' (66688bb6-77a3-4e43-8bd4-ed620a470aca) Aug 28 15:51:31 localhost NetworkManager[1420]: <info> [1598629891.0849] policy: auto-activating connection 'ovs-if-phys0' (215ef9cd-4337-4d6f-ab55-74f234d132ff) Aug 28 15:51:31 localhost NetworkManager[1420]: <info> [1598629891.0850] policy: auto-activating connection 'br-ex' (7f199444-421a-436e-b14d-44b8e1a11c98) Aug 28 15:51:31 localhost NetworkManager[1420]: <info> [1598629891.0851] policy: auto-activating connection 'ovs-if-br-ex' (810c8081-134d-435a-ad50-c3df0143a7ea) Aug 28 15:51:31 localhost NetworkManager[1420]: <info> [1598629891.0853] policy: auto-activating connection 'ovs-port-phys0' (820dd895-36d0-4664-a3ea-670375e85743) Aug 28 15:51:31 localhost NetworkManager[1420]: <info> [1598629891.0856] device (br-ex): Activation: starting connection 'br-ex' (7f199444-421a-436e-b14d-44b8e1a11c98) Aug 28 15:51:31 localhost NetworkManager[1420]: <info> [1598629891.0857] device (br-ex): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'managed') Aug 28 15:51:31 localhost NetworkManager[1420]: <info> [1598629891.0860] manager: NetworkManager state is now CONNECTING Aug 28 15:51:31 localhost NetworkManager[1420]: <info> [1598629891.0862] device (ens3): Activation: starting connection 'ovs-if-phys0' (215ef9cd-4337-4d6f-ab55-74f234d132ff) Aug 28 15:51:31 localhost NetworkManager[1420]: <info> [1598629891.0864] device (br-ex): Activation: starting connection 'ovs-if-br-ex' (810c8081-134d-435a-ad50-c3df0143a7ea) Aug 28 15:51:31 localhost NetworkManager[1420]: <info> [1598629891.0867] device (ens3): Activation: starting connection 'ovs-port-phys0' (820dd895-36d0-4664-a3ea-670375e85743) Aug 28 15:51:31 localhost NetworkManager[1420]: <info> [1598629891.0868] device (br-ex): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'managed') Aug 28 15:51:31 localhost NetworkManager[1420]: <info> [1598629891.0872] device (br-ex): state change: prepare -> config (reason 'none', sys-iface-state: 'managed') Aug 28 15:51:31 localhost NetworkManager[1420]: <info> [1598629891.0875] device (ens3): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'managed') Aug 28 15:51:31 localhost NetworkManager[1420]: <info> [1598629891.0879] device (br-ex): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'managed') Aug 28 15:51:31 localhost NetworkManager[1420]: <info> [1598629891.0883] device (br-ex): state change: prepare -> config (reason 'none', sys-iface-state: 'managed') Aug 28 15:51:31 localhost NetworkManager[1420]: <info> [1598629891.0888] device (ens3): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'managed') Aug 28 15:51:31 localhost NetworkManager[1420]: <info> [1598629891.0894] device (ens3): state change: prepare -> config (reason 'none', sys-iface-state: 'managed') Aug 28 15:51:31 localhost NetworkManager[1420]: <info> [1598629891.0897] device (br-ex): state change: prepare -> config (reason 'none', sys-iface-state: 'managed') Aug 28 15:51:31 localhost NetworkManager[1420]: <info> [1598629891.0900] device (ens3): state change: prepare -> config (reason 'none', sys-iface-state: 'managed') Aug 28 15:51:31 localhost NetworkManager[1420]: <info> [1598629891.0905] device (br-ex): state change: config -> ip-config (reason 'none', sys-iface-state: 'managed') Aug 28 15:51:31 localhost NetworkManager[1420]: <info> [1598629891.0906] device (br-ex): state change: ip-config -> secondaries (reason 'ip-config-unavailable', sys-iface-state: 'managed') Aug 28 15:51:31 localhost NetworkManager[1420]: <info> [1598629891.0921] device (br-ex): state change: config -> ip-config (reason 'none', sys-iface-state: 'managed') Aug 28 15:51:31 localhost NetworkManager[1420]: <info> [1598629891.0928] device (br-ex): Activation: connection 'ovs-if-br-ex' enslaved, continuing activation Aug 28 15:51:31 localhost NetworkManager[1420]: <info> [1598629891.0930] device (ens3): state change: config -> ip-config (reason 'none', sys-iface-state: 'managed') Aug 28 15:51:31 localhost NetworkManager[1420]: <info> [1598629891.0931] device (ens3): Activation: connection 'ovs-port-phys0' enslaved, continuing activation Aug 28 15:51:31 localhost NetworkManager[1420]: <info> [1598629891.0932] device (ens3): state change: ip-config -> secondaries (reason 'ip-config-unavailable', sys-iface-state: 'managed') Aug 28 15:51:31 localhost NetworkManager[1420]: <info> [1598629891.0938] device (br-ex): state change: config -> ip-config (reason 'none', sys-iface-state: 'managed') Aug 28 15:51:31 localhost NetworkManager[1420]: <info> [1598629891.0941] device (br-ex): Activation: connection 'ovs-port-br-ex' enslaved, continuing activation Aug 28 15:51:31 localhost NetworkManager[1420]: <info> [1598629891.0944] device (br-ex): state change: ip-config -> secondaries (reason 'ip-config-unavailable', sys-iface-state: 'managed') Aug 28 15:51:31 localhost NetworkManager[1420]: <info> [1598629891.0946] device (br-ex): state change: secondaries -> activated (reason 'none', sys-iface-state: 'managed') Aug 28 15:51:31 localhost NetworkManager[1420]: <info> [1598629891.0956] policy: set-hostname: set hostname to 'localhost.localdomain' (no default device) Aug 28 15:51:31 localhost NetworkManager[1420]: <info> [1598629891.0957] device (br-ex): Activation: successful, device activated. Aug 28 15:51:31 localhost NetworkManager[1420]: <info> [1598629891.0964] device (ens3): state change: config -> ip-config (reason 'none', sys-iface-state: 'managed') Aug 28 15:51:31 localhost NetworkManager[1420]: <info> [1598629891.0971] device (ens3): Activation: connection 'ovs-if-phys0' enslaved, continuing activation Aug 28 15:51:31 localhost NetworkManager[1420]: <info> [1598629891.0973] device (ens3): state change: ip-config -> ip-check (reason 'none', sys-iface-state: 'managed') Aug 28 15:51:31 localhost NetworkManager[1420]: <info> [1598629891.0978] device (ens3): state change: secondaries -> activated (reason 'none', sys-iface-state: 'managed') Aug 28 15:51:31 localhost NetworkManager[1420]: <info> [1598629891.0987] device (ens3): Activation: successful, device activated. Aug 28 15:51:31 localhost NetworkManager[1420]: <info> [1598629891.0994] device (br-ex): state change: secondaries -> activated (reason 'none', sys-iface-state: 'managed') Aug 28 15:51:31 localhost NetworkManager[1420]: <info> [1598629891.1001] device (br-ex): Activation: successful, device activated. Aug 28 15:51:31 localhost.localdomain systemd-hostnamed[1432]: Changed host name to 'localhost.localdomain' Aug 28 15:51:31 localhost.localdomain NetworkManager[1420]: <info> [1598629891.1098] device (ens3): state change: ip-check -> secondaries (reason 'none', sys-iface-state: 'managed') Aug 28 15:51:31 localhost.localdomain NetworkManager[1420]: <info> [1598629891.1100] device (ens3): state change: secondaries -> activated (reason 'none', sys-iface-state: 'managed') Aug 28 15:51:31 localhost.localdomain NetworkManager[1420]: <info> [1598629891.1109] device (ens3): Activation: successful, device activated. Aug 28 15:51:31 localhost.localdomain kernel: device br-ex entered promiscuous mode Aug 28 15:51:31 localhost.localdomain systemd-udevd[1485]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable. Aug 28 15:51:31 localhost.localdomain systemd-udevd[1485]: Could not generate persistent MAC address for br-ex: No such file or directory Aug 28 15:51:31 localhost.localdomain NetworkManager[1420]: <info> [1598629891.1359] device (br-ex): set-hw-addr: set-cloned MAC address to 52:54:00:56:F9:70 (52:54:00:56:F9:70) Aug 28 15:51:31 localhost.localdomain NetworkManager[1420]: <info> [1598629891.1366] device (br-ex): carrier: link connected Aug 28 15:51:31 localhost.localdomain ovs-vswitchd[1336]: ovs|00109|bridge|ERR|interface br-ex: ignoring mac in Interface record (use Bridge record to set local port's mac) Aug 28 15:51:31 localhost.localdomain NetworkManager[1420]: <info> [1598629891.1378] dhcp4 (br-ex): activation: beginning transaction (timeout in 45 seconds) Aug 28 15:51:31 localhost.localdomain NetworkManager[1420]: <info> [1598629891.1508] dhcp4 (br-ex): option dhcp_lease_time => '3600' Aug 28 15:51:31 localhost.localdomain NetworkManager[1420]: <info> [1598629891.1509] dhcp4 (br-ex): option domain_name => 'test.alexander' Aug 28 15:51:31 localhost.localdomain NetworkManager[1420]: <info> [1598629891.1509] dhcp4 (br-ex): option domain_name_servers => '192.168.126.1' Aug 28 15:51:31 localhost.localdomain NetworkManager[1420]: <info> [1598629891.1509] dhcp4 (br-ex): option expiry => '1598633491' Aug 28 15:51:31 localhost.localdomain NetworkManager[1420]: <info> [1598629891.1509] dhcp4 (br-ex): option host_name => 'test-gz9pv-master-1' Aug 28 15:51:31 localhost.localdomain NetworkManager[1420]: <info> [1598629891.1509] dhcp4 (br-ex): option ip_address => '192.168.126.12' Aug 28 15:51:31 localhost.localdomain NetworkManager[1420]: <info> [1598629891.1509] dhcp4 (br-ex): option next_server => '192.168.126.1' Aug 28 15:51:31 localhost.localdomain NetworkManager[1420]: <info> [1598629891.1509] dhcp4 (br-ex): option requested_broadcast_address => '1' Aug 28 15:51:31 localhost.localdomain NetworkManager[1420]: <info> [1598629891.1509] dhcp4 (br-ex): option requested_domain_name => '1' Aug 28 15:51:31 localhost.localdomain NetworkManager[1420]: <info> [1598629891.1510] dhcp4 (br-ex): option requested_domain_name_servers => '1' Aug 28 15:51:31 localhost.localdomain NetworkManager[1420]: <info> [1598629891.1510] dhcp4 (br-ex): option requested_domain_search => '1' Aug 28 15:51:31 localhost.localdomain NetworkManager[1420]: <info> [1598629891.1510] dhcp4 (br-ex): option requested_host_name => '1' Aug 28 15:51:31 localhost.localdomain NetworkManager[1420]: <info> [1598629891.1510] dhcp4 (br-ex): option requested_interface_mtu => '1' Aug 28 15:51:31 localhost.localdomain NetworkManager[1420]: <info> [1598629891.1510] dhcp4 (br-ex): option requested_ms_classless_static_routes => '1' Aug 28 15:51:31 localhost.localdomain NetworkManager[1420]: <info> [1598629891.1510] dhcp4 (br-ex): option requested_nis_domain => '1' Aug 28 15:51:31 localhost.localdomain NetworkManager[1420]: <info> [1598629891.1510] dhcp4 (br-ex): option requested_nis_servers => '1' Aug 28 15:51:31 localhost.localdomain NetworkManager[1420]: <info> [1598629891.1510] dhcp4 (br-ex): option requested_ntp_servers => '1' Aug 28 15:51:31 localhost.localdomain NetworkManager[1420]: <info> [1598629891.1510] dhcp4 (br-ex): option requested_rfc3442_classless_static_routes => '1' Aug 28 15:51:31 localhost.localdomain NetworkManager[1420]: <info> [1598629891.1511] dhcp4 (br-ex): option requested_root_path => '1' Aug 28 15:51:31 localhost.localdomain NetworkManager[1420]: <info> [1598629891.1511] dhcp4 (br-ex): option requested_routers => '1' Aug 28 15:51:31 localhost.localdomain NetworkManager[1420]: <info> [1598629891.1511] dhcp4 (br-ex): option requested_static_routes => '1' Aug 28 15:51:31 localhost.localdomain NetworkManager[1420]: <info> [1598629891.1511] dhcp4 (br-ex): option requested_subnet_mask => '1' Aug 28 15:51:31 localhost.localdomain NetworkManager[1420]: <info> [1598629891.1511] dhcp4 (br-ex): option requested_time_offset => '1' Aug 28 15:51:31 localhost.localdomain NetworkManager[1420]: <info> [1598629891.1511] dhcp4 (br-ex): option requested_wpad => '1' Aug 28 15:51:31 localhost.localdomain NetworkManager[1420]: <info> [1598629891.1511] dhcp4 (br-ex): option routers => '192.168.126.1' Aug 28 15:51:31 localhost.localdomain NetworkManager[1420]: <info> [1598629891.1511] dhcp4 (br-ex): option subnet_mask => '255.255.255.0' Aug 28 15:51:31 localhost.localdomain NetworkManager[1420]: <info> [1598629891.1512] dhcp4 (br-ex): state changed unknown -> bound Aug 28 15:51:31 localhost.localdomain NetworkManager[1420]: <info> [1598629891.1552] device (br-ex): state change: ip-config -> ip-check (reason 'none', sys-iface-state: 'managed') Aug 28 15:51:31 localhost.localdomain NetworkManager[1420]: <info> [1598629891.1597] device (br-ex): state change: ip-check -> secondaries (reason 'none', sys-iface-state: 'managed') Aug 28 15:51:31 localhost.localdomain NetworkManager[1420]: <info> [1598629891.1600] device (br-ex): state change: secondaries -> activated (reason 'none', sys-iface-state: 'managed') Aug 28 15:51:31 localhost.localdomain NetworkManager[1420]: <info> [1598629891.1606] manager: NetworkManager state is now CONNECTED_LOCAL Aug 28 15:51:31 localhost.localdomain NetworkManager[1420]: <info> [1598629891.1631] manager: NetworkManager state is now CONNECTED_SITE Aug 28 15:51:31 localhost.localdomain NetworkManager[1420]: <info> [1598629891.1632] policy: set 'ovs-if-br-ex' (br-ex) as default for IPv4 routing and DNS Aug 28 15:51:31 localhost.localdomain NetworkManager[1420]: <info> [1598629891.1634] policy: set-hostname: set hostname to 'test-gz9pv-master-1' (from DHCPv4) Aug 28 15:51:31 localhost.localdomain NetworkManager[1420]: <info> [1598629891.1679] device (br-ex): Activation: successful, device activated. Aug 28 15:51:31 localhost.localdomain ovs-vswitchd[1336]: ovs|00117|bridge|ERR|interface br-ex: ignoring mac in Interface record (use Bridge record to set local port's mac) Aug 28 15:51:31 localhost.localdomain dbus-daemon[1146]: [system] Activating via systemd: service name='org.freedesktop.resolve1' unit='dbus-org.freedesktop.resolve1.service' requested by ':1.6' (uid=0 pid=1420 comm="/usr/sbin/NetworkMan> Aug 28 15:51:31 localhost.localdomain NetworkManager[1420]: <info> [1598629891.1697] manager: NetworkManager state is now CONNECTED_GLOBAL Aug 28 15:51:31 test-gz9pv-master-1 dbus-daemon[1146]: [system] Activation via systemd failed for unit 'dbus-org.freedesktop.resolve1.service': Unit dbus-org.freedesktop.resolve1.service not found. Aug 28 15:51:31 test-gz9pv-master-1 systemd-hostnamed[1432]: Changed host name to 'test-gz9pv-master-1' Aug 28 15:51:31 test-gz9pv-master-1 NetworkManager[1420]: <info> [1598629891.1712] manager: startup complete Aug 28 15:51:31 test-gz9pv-master-1 systemd[1]: Started Network Manager Wait Online. Aug 28 15:51:31 test-gz9pv-master-1 network-manager/90-long-hostname[1506]: hostname is already set Aug 28 15:51:31 test-gz9pv-master-1 network-manager/90-long-hostname[1525]: hostname is already set Aug 28 15:51:31 test-gz9pv-master-1 network-manager/90-long-hostname[1538]: hostname is already set Aug 28 15:51:31 test-gz9pv-master-1 network-manager/90-long-hostname[1546]: hostname is already set Aug 28 15:51:31 test-gz9pv-master-1 network-manager/90-long-hostname[1559]: hostname is already set Aug 28 15:51:31 test-gz9pv-master-1 network-manager/90-long-hostname[1572]: hostname is already set Aug 28 15:51:31 test-gz9pv-master-1 network-manager/90-long-hostname[1585]: hostname is already set Aug 28 15:51:31 test-gz9pv-master-1 bash[1590]: node identified as test-gz9pv-master-1 Aug 28 15:51:31 test-gz9pv-master-1 systemd[1]: Started Ensure the node hostname is valid for the cluster. Aug 28 15:51:31 test-gz9pv-master-1 systemd[1]: Starting Configures OVS with proper host networking configuration... Aug 28 15:51:31 test-gz9pv-master-1 configure-ovs.sh[1592]: + iface= Aug 28 15:51:31 test-gz9pv-master-1 configure-ovs.sh[1592]: + counter=0 Aug 28 15:51:31 test-gz9pv-master-1 configure-ovs.sh[1592]: + '[' 0 -lt 12 ']' Aug 28 15:51:31 test-gz9pv-master-1 configure-ovs.sh[1592]: ++ jq -r '.[0].dev' Aug 28 15:51:31 test-gz9pv-master-1 configure-ovs.sh[1592]: ++ ip -j route show default Aug 28 15:51:32 test-gz9pv-master-1 configure-ovs.sh[1592]: + iface=br-ex Aug 28 15:51:32 test-gz9pv-master-1 configure-ovs.sh[1592]: + [[ -n br-ex ]] Aug 28 15:51:32 test-gz9pv-master-1 configure-ovs.sh[1592]: + [[ br-ex != \n\u\l\l ]] Aug 28 15:51:32 test-gz9pv-master-1 configure-ovs.sh[1592]: + echo 'IPv4 Default gateway interface found: br-ex' Aug 28 15:51:32 test-gz9pv-master-1 configure-ovs.sh[1592]: IPv4 Default gateway interface found: br-ex Aug 28 15:51:32 test-gz9pv-master-1 configure-ovs.sh[1592]: + break Aug 28 15:51:32 test-gz9pv-master-1 configure-ovs.sh[1592]: + '[' br-ex = br-ex ']' Aug 28 15:51:32 test-gz9pv-master-1 configure-ovs.sh[1592]: + echo 'Networking already configured and up for br-ex!' Aug 28 15:51:32 test-gz9pv-master-1 configure-ovs.sh[1592]: Networking already configured and up for br-ex! Aug 28 15:51:32 test-gz9pv-master-1 configure-ovs.sh[1592]: + ip a Aug 28 15:51:32 test-gz9pv-master-1 systemd[1]: Started Configures OVS with proper host networking configuration. Aug 28 15:51:32 test-gz9pv-master-1 configure-ovs.sh[1592]: 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 Aug 28 15:51:32 test-gz9pv-master-1 configure-ovs.sh[1592]: link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 Aug 28 15:51:32 test-gz9pv-master-1 configure-ovs.sh[1592]: inet 127.0.0.1/8 scope host lo Aug 28 15:51:32 test-gz9pv-master-1 configure-ovs.sh[1592]: valid_lft forever preferred_lft forever Aug 28 15:51:32 test-gz9pv-master-1 configure-ovs.sh[1592]: inet6 ::1/128 scope host Aug 28 15:51:32 test-gz9pv-master-1 configure-ovs.sh[1592]: valid_lft forever preferred_lft forever Aug 28 15:51:32 test-gz9pv-master-1 configure-ovs.sh[1592]: 2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master ovs-system state UP group default qlen 1000 Aug 28 15:51:32 test-gz9pv-master-1 configure-ovs.sh[1592]: link/ether 52:54:00:56:f9:70 brd ff:ff:ff:ff:ff:ff Aug 28 15:51:32 test-gz9pv-master-1 configure-ovs.sh[1592]: 3: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 Aug 28 15:51:32 test-gz9pv-master-1 configure-ovs.sh[1592]: link/ether fa:14:4b:fb:b1:a1 brd ff:ff:ff:ff:ff:ff Aug 28 15:51:32 test-gz9pv-master-1 configure-ovs.sh[1592]: 4: ovn-k8s-mp0: <BROADCAST,MULTICAST> mtu 1400 qdisc noop state DOWN group default qlen 1000 Aug 28 15:51:32 test-gz9pv-master-1 configure-ovs.sh[1592]: link/ether a6:85:9f:a1:9d:2c brd ff:ff:ff:ff:ff:ff Aug 28 15:51:32 test-gz9pv-master-1 configure-ovs.sh[1592]: 5: br-int: <BROADCAST,MULTICAST> mtu 1400 qdisc noop state DOWN group default qlen 1000 Aug 28 15:51:32 test-gz9pv-master-1 configure-ovs.sh[1592]: link/ether 16:30:f2:b5:0b:47 brd ff:ff:ff:ff:ff:ff Aug 28 15:51:32 test-gz9pv-master-1 configure-ovs.sh[1592]: 6: genev_sys_6081: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65000 qdisc noqueue master ovs-system state UNKNOWN group default qlen 1000 Aug 28 15:51:32 test-gz9pv-master-1 configure-ovs.sh[1592]: link/ether e2:41:79:73:b0:a2 brd ff:ff:ff:ff:ff:ff Aug 28 15:51:32 test-gz9pv-master-1 configure-ovs.sh[1592]: inet6 fe80::e041:79ff:fe73:b0a2/64 scope link Aug 28 15:51:32 test-gz9pv-master-1 configure-ovs.sh[1592]: valid_lft forever preferred_lft forever Aug 28 15:51:32 test-gz9pv-master-1 configure-ovs.sh[1592]: 7: ovn-k8s-gw0: <BROADCAST,MULTICAST> mtu 1400 qdisc noop state DOWN group default qlen 1000 Aug 28 15:51:32 test-gz9pv-master-1 configure-ovs.sh[1592]: link/ether 0a:58:a9:fe:00:01 brd ff:ff:ff:ff:ff:ff Aug 28 15:51:32 test-gz9pv-master-1 configure-ovs.sh[1592]: 8: br-local: <BROADCAST,MULTICAST> mtu 1400 qdisc noop state DOWN group default qlen 1000 Aug 28 15:51:32 test-gz9pv-master-1 configure-ovs.sh[1592]: link/ether ea:3b:2d:a2:a1:4f brd ff:ff:ff:ff:ff:ff Aug 28 15:51:32 test-gz9pv-master-1 configure-ovs.sh[1592]: 9: br-ex: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000 Aug 28 15:51:32 test-gz9pv-master-1 configure-ovs.sh[1592]: link/ether 52:54:00:56:f9:70 brd ff:ff:ff:ff:ff:ff Aug 28 15:51:32 test-gz9pv-master-1 configure-ovs.sh[1592]: inet 192.168.126.12/24 brd 192.168.126.255 scope global dynamic noprefixroute br-ex Aug 28 15:51:32 test-gz9pv-master-1 configure-ovs.sh[1592]: valid_lft 3600sec preferred_lft 3600sec Aug 28 15:51:32 test-gz9pv-master-1 configure-ovs.sh[1592]: inet6 fe80::bc88:75ad:4faa:b92/64 scope link tentative noprefixroute Aug 28 15:51:32 test-gz9pv-master-1 configure-ovs.sh[1592]: valid_lft forever preferred_lft forever Aug 28 15:51:32 test-gz9pv-master-1 configure-ovs.sh[1592]: + exit 0 Aug 28 15:51:32 test-gz9pv-master-1 systemd[1]: ovs-configuration.service: Consumed 57ms C
OK, it's certainly NetworkManager on restart. I've inserted a custom service to run before NetworkManager-wait-online.service Here's the output: Aug 28 16:32:17 localhost systemd[1]: Starting A small hello world from Alex... Aug 28 16:32:17 localhost alex.sh[1138]: + echo Hello world from Alex Aug 28 16:32:17 localhost alex.sh[1138]: Hello world from Alex Aug 28 16:32:17 localhost alex.sh[1138]: + ip a Aug 28 16:32:17 localhost systemd[1]: Starting NTP client/server... Aug 28 16:32:17 localhost bash[1139]: waiting for non-localhost hostname to be assigned Aug 28 16:32:17 localhost systemd[1]: Starting Open vSwitch Database Unit... Aug 28 16:32:17 localhost systemd[1]: Started irqbalance daemon. Aug 28 16:32:17 localhost systemd[1]: Starting System Security Services Daemon... Aug 28 16:32:17 localhost systemd[1]: Reached target sshd-keygen.target. Aug 28 16:32:17 localhost systemd[1]: Starting Generate /run/issue.d/console-login-helper-messages.issue... Aug 28 16:32:17 localhost alex.sh[1138]: 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 Aug 28 16:32:17 localhost alex.sh[1138]: link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 Aug 28 16:32:17 localhost alex.sh[1138]: inet 127.0.0.1/8 scope host lo Aug 28 16:32:17 localhost alex.sh[1138]: valid_lft forever preferred_lft forever Aug 28 16:32:17 localhost alex.sh[1138]: inet6 ::1/128 scope host Aug 28 16:32:17 localhost alex.sh[1138]: valid_lft forever preferred_lft forever Aug 28 16:32:17 localhost alex.sh[1138]: 2: ens3: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 Aug 28 16:32:17 localhost alex.sh[1138]: link/ether 52:54:00:56:f9:70 brd ff:ff:ff:ff:ff:ff Aug 28 16:32:17 localhost chronyd[1157]: chronyd version 3.5 starting (+CMDMON +NTP +REFCLOCK +RTC +PRIVDROP +SCFILTER +SIGND +ASYNCDNS +SECHASH +IPV6 +DEBUG) Aug 28 16:32:17 localhost chown[1143]: /usr/bin/chown: cannot access '/var/run/openvswitch': No such file or directory Aug 28 16:32:17 localhost chronyd[1157]: Frequency -19.709 +/- 5.979 ppm read from /var/lib/chrony/drift Aug 28 16:32:17 localhost chronyd[1157]: Using right/UTC timezone to obtain leap second data Aug 28 16:32:17 localhost systemd[1]: Started A small hello world from Alex. Aug 28 16:32:17 localhost systemd[1]: alex.service: Consumed 5ms CPU time
So at this point I am not sure what's faster...fixing the kubelet or fixing NetworkManager or fixing Openstack?
Anything that relies on interface index ordering is fundamentally broken. We should not be working around dumb kubelet bugs by manipulating interface indexes at all.
And actually the more I think about it, I tell myself that the kubelet *cannot* be fixed. Go has no way of retrieving the default route without performing OS specific syscalls (which why netlink can do that, but only compiles for linux). The kubelet cannot be built like that...and I think they were really counting on `net.LookupIP(node.Name)` not returning crazy stuff and thus just picking the first item in the list (which would have equated to the IP of the default route). So this should probably just be sent over to the Openstack team / NetworkManager
(In reply to Alexander Constantinescu from comment #16) > > either "files" or "dns" succeeds so it never gets to "myhostname". > > It's true but it doesn't matter, because as I mentioned: the DNS resolution > of myhostname (equivalent to "getent ahosts `hostname`") returns "only the > eth0 IPv4" on GCP. Only Openstack returns that screwy list of all IPs. The way nsswitch.conf works is that if it says "hosts files dns myhostname", then that means when someone tries to resolve a hostname, first use "files" (ie, /etc/hosts), and if there's an answer there, return that answer. If there's no answer, then use "dns" and if there's an answer there, return that answer. If there's still no answer, then use "myhostname". So what's happening is that on GCP, either "files" or "dns" has a match for the system hostname, so we don't have to try "myhostname". While on OpenStack, the node's name does not appear in either /etc/hosts or in DNS, so it falls back to myhostname. If the host's name did appear in /etc/hosts or DNS on OpenStack, then we would not fall back to "myhostname" on OpenStack either. > Even my local libvirt cluster returns the same result as on GCP: Presumably your hostname appears in /etc/hosts, and so "myhostname" does not get used. > or NetworkManager does this (but I am unable to understand from reading its logs, see below) The NetworkManager logs show that NetworkManager is _observing_ the network configuration, not that it's creating it. I'm guessing OVS must be the one creating those interfaces? Are we persisting our ovsdb across reboots? We don't want that, since ovn-kubernetes is not written to deal with the possibility that there might be leftover interfaces from the last time it was run (which might have been with an older version of ovn-kubernetes that had a slightly different internal architecture). > And actually the more I think about it, I tell myself that the kubelet > *cannot* be fixed. > > Go has no way of retrieving the default route without performing OS specific > syscalls (which why netlink can do that, but only compiles for linux). kubelet *already* has a function that finds the IP corresponding to the default route; it's just that it only uses it if `LookupIP()` fails, so if you have "myhostname" configured in nsswitch.conf, it will never get used. (Go uses netlink in the standard library, it just doesn't expose it via any public APIs)
If a cloud provider is non-functional or no cloud provider is used, and for some reason looking up the node's hostname does not provide a usable result (in this case because DNS does not provide a result, but myhostname appears to provide one that is not useful) then the standard practice is to set the --node-ip or --bind-addr of components run on that node to the address the system administrator expects the node to have. Actual bare-metal (with devscripts) does this correctly, OpenStack must also do this if it does not already. Over to Node since I"m not sure where to put bare-metal/cloud-provider/etc issues.
For(In reply to Alexander Constantinescu from comment #12) > OK, I think I've narrowed the problem down. > > On Openstack we run the kubelet with the flag `--cloud-priver=`, that means > it's up to the kubelet to set the IP address without looking up the node's > IP address from the external cloud provider. This is done here: > > https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/nodestatus/ > setters.go#L205 For OpenStack we set the --node-ip option [1] so I'm not sure the above code is used at all. It's reading its value from a /etc/systemd/system/kubelet.service.d/20-nodenet.conf file dropped by the nodeip-finder script [2] from baremetal-runtimecfg. Would it be enough to discard the ovn-k8s-* interfaces? [1] https://github.com/openshift/machine-config-operator/blob/36f37f2d6009affe8174854f5ef5538e0cc49034/templates/master/01-master-kubelet/openstack/units/kubelet.service.yaml#L27 [2] https://github.com/openshift/baremetal-runtimecfg/blob/b2b74d7c6a5c02811f7d8262ee2e0c00e73f8b68/scripts/nodeip-finder
(In reply to Martin André from comment #23) > For(In reply to Alexander Constantinescu from comment #12) > > OK, I think I've narrowed the problem down. > > > > On Openstack we run the kubelet with the flag `--cloud-priver=`, that means > > it's up to the kubelet to set the IP address without looking up the node's > > IP address from the external cloud provider. This is done here: > > > > https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/nodestatus/ > > setters.go#L205 > > For OpenStack we set the --node-ip option [1] so I'm not sure the above code > is used at all. It's reading its value from a > /etc/systemd/system/kubelet.service.d/20-nodenet.conf file dropped by the > nodeip-finder script [2] from baremetal-runtimecfg. Would it be enough to > discard the ovn-k8s-* interfaces? > > [1] > https://github.com/openshift/machine-config-operator/blob/ > 36f37f2d6009affe8174854f5ef5538e0cc49034/templates/master/01-master-kubelet/ > openstack/units/kubelet.service.yaml#L27 > [2] > https://github.com/openshift/baremetal-runtimecfg/blob/ > b2b74d7c6a5c02811f7d8262ee2e0c00e73f8b68/scripts/nodeip-finder That would be insufficient. The information cannot come from scraping the a bunch of random interfaces, it needs to come from the actual network configuration of the node. Something knows what that configuration is.
To be clear, it needs to come from the actual network configuration *as seen from outside the node*. Like, whatever is provisioning the node knows exactly what the IP is. And that's the IP(s) that kubelet needs to report.
> That would be insufficient. The information cannot come from scraping the a bunch of random interfaces, > it needs to come from the actual network configuration of the node. Something knows what that configuration is. The nodeip-finder script is the thing that is supposed to understand OCP node configuration. In particular, it exists because kubelet isn't smart enough to know to ignore IPs added by ipfailoverd, so nodeip-finder is. However, nodeip-finder should already have been smart enough to not use the ovn-k8s-mp0 interface, because that interface doesn't have a route to the apiserver... so it seems like something went wrong there? It should not need to specifically know to ignore ovn-k8s-mp0.
This bug may be fixed with: https://github.com/openshift/machine-config-operator/pull/2031 It makes sure that we set correct hostname before starting networking configuration. we didn't have issues with pure ovs, but with ovn there were some weird race conditions, and I expect this bug can be one of the manifestations of them.
Possibly a duplicate of bug 1851540; could you verify if this is still happening with the latest nightly please?
I got confirmation from Gaoyun Pei this isn't deployment on OpenStack but rather simulated BM nodes using OpenStack VMs. Re-assigning to Node component.
Ok, this bug has had a ton of noise. Just to level set, the is bare metal platform using OVN installed on Openstack VMs (no cloud provider integration configured) It is the combination of bare-metal + OVN that leads to this situation. The issue is the kubelet doesn't select the expected interface address for the internal IP. The kubelet is designed around a host with a single candidate address (per address family) for the internal IP. comment #12 shows how the kubelet would do the selection > IP: fe80::c06d:abff:fe70:9a09 is skipped because: nodeIP can't be a link-local unicast address > IP: fe80::58c3:b9ff:fe32:3348 is skipped because: nodeIP can't be a link-local unicast address > IP: fe80::f64e:de02:c198:b6db is skipped because: nodeIP can't be a link-local unicast address > IP: fe80::f4d3:1aff:fe0b:5765 is skipped because: nodeIP can't be a link-local unicast address > IP: fe80::78a9:45ff:fe87:487 is skipped because: nodeIP can't be a link-local unicast address > IP: fe80::4839:9dff:fe04:25d3 is skipped because: nodeIP can't be a link-local unicast address > IP: fe80::7cb5:37ff:fe42:b1e5 is skipped because: nodeIP can't be a link-local unicast address > IP: fe80::e81d:e3ff:fe9e:f894 is skipped because: nodeIP can't be a link-local unicast address > IP: fe80::9087:99ff:fe3c:3d3 is skipped because: nodeIP can't be a link-local unicast address > IP: fe80::dcfa:a5ff:feb1:3fdf is skipped because: nodeIP can't be a link-local unicast address > IP: fe80::c4d5:f2ff:fec1:1acf is skipped because: nodeIP can't be a link-local unicast address > IP: fe80::9828:d0ff:feca:7068 is skipped because: nodeIP can't be a link-local unicast address > IP is: 2620:52:0:60:946a:c6c1:950f:c7aa > IP: 169.254.0.1 is skipped because: nodeIP can't be a link-local unicast address > IP is: 10.128.2.2 > IP is: 10.0.97.10 comment #20 is correct > And actually the more I think about it, I tell myself that the kubelet *cannot* be fixed. > > Go has no way of retrieving the default route without performing OS specific syscalls (which why netlink can do that, but only compiles for linux). The kubelet cannot be built like that...and I think they were really counting on > `net.LookupIP(node.Name)` not returning crazy stuff and thus just picking the first item in the list (which would have equated to the IP of the default route). > > As seen in the code referenced just before: the kubelet takes the first IPv4 address it finds and assigns that IP to the InternalIP address. Thus 10.128.2.2 (in this example) - which is > ovn-k8s-mp0 address. This is presumably because the interface index of ovn-k8s-mp0 is lower than br-ex. comment #21 > kubelet *already* has a function that finds the IP corresponding to the default route; it's just that it only uses it if `LookupIP()` fails, so if you have "myhostname" configured in nsswitch.conf, it will never get used. (Go uses netlink in the standard library, it just doesn't expose it via any public APIs) If OVN is going to create another interface on the host that has another candidate address for the internal IP, the --node-ip flag will have to be provided to the kubelet to disambiguate.
OVN is creating this new source of ambiguity. OVN depended on cloud provider integration to resolve this ambiguity, but it is not always present i.e. bare-metal. Routing to them.
> Ok, this bug has had a ton of noise. Yeah... > OVN is creating this new source of ambiguity. OVN depended on cloud > provider integration to resolve this ambiguity, but it is not always present > i.e. bare-metal. Routing to them. Per comment #23, this type of openstack install uses the baremetal-cfg nodeip-finder to generate a `--node-ip` to pass to kubelet. The nodeip-finder code ought to be doing the right thing here and it is not clear why it is not (comment #26). It works fine on actual-bare-metal-via-dev-scripts. eg, rerunning nodeip-configuration.service after ovn-kubernetes is up shows: Parsed Virtual IP 192.168.111.5 Checking whether address 192.168.111.23/24 br-ex contains VIP 192.168.111.5 Address 192.168.111.23/24 br-ex contains VIP 192.168.111.5 Checking whether address 169.254.0.1/20 ovn-k8s-gw0 contains VIP 192.168.111.5 Checking whether address 172.22.0.35/24 enp1s0 contains VIP 192.168.111.5 Checking whether address 10.131.0.2/23 ovn-k8s-mp0 contains VIP 192.168.111.5 Checking whether address 127.0.0.1/8 lo contains VIP 192.168.111.5 Chosen Node IP 192.168.111.23 (deleted all the IPv6 link-local address lines to make the output shorter). I'm not sure what order it's checking in, but it doesn't matter, since it looks at both br-ex and ovn-k8s-mp0, and sees that br-ex is correct and ovn-k8s-mp0 is not. @huirwang, we need to see what's happening with the nodeip-configuration.service on the reboot in your cluster that breaks things; what arguments it is being passed, and what it outputs. Also, what does "ip a" and "ip r" show on the node when it runs? If must-gather isn't functional you can log in via openstack console or something to get logs...
(In reply to Dan Winship from comment #36) > Per comment #23, this type of openstack install uses the baremetal-cfg > nodeip-finder to generate a `--node-ip` to pass to kubelet. The > nodeip-finder code ought to be doing the right thing here and it is not > clear why it is not (comment #26). It works fine on > actual-bare-metal-via-dev-scripts. comment #23 is no longer relevant because it's not deploying on openstack platform. From what I could tell, none of the static pods you'd normally find in OpenStack or BM deployments (keepalived, haproxy, coredns, ...) where running on those nodes. It didn't run the NM dispatcher scripts either.
(In reply to Seth Jennings from comment #35) > OVN is creating this new source of ambiguity. OVN depended on cloud > provider integration to resolve this ambiguity, but it is not always present > i.e. bare-metal. Routing to them. Seth, this is not an OVN/ovnkube issue, and ovnkube has been around since OCP 4.1 anyway. The logic of nodeip-finder is fundamentally wrong. OVN does not depend on cloud provider integration for anything. But it does require (like anything else, even openshift-sdn) that the platform provider is doing the right thing when it determines the node IP to pass to kubelet. I don't know whether this is a Node bug or a Cloud provider bug or what, but it is *not* networking.
ah, you are not running the nodeip-configuration service. Kubelet is being started without `--node-ip`, so it's running the default kubelet node-ip-detecting code, which is not able to deal with this. nodeip-configuration gets run if MCO thinks the node is "baremetal", and that definitely happens if you do the "official" install procedure using dev-scripts. I'm not sure why you are getting the "base" kubelet config (https://github.com/openshift/machine-config-operator/blob/master/templates/master/01-master-kubelet/_base/units/kubelet.service.yaml) rather than the "baremetal" one (https://github.com/openshift/machine-config-operator/blob/master/templates/master/01-master-kubelet/baremetal/units/kubelet.service.yaml) in this cluster. Either you are configuring things wrong, or MCO is detecting things wrong... So I guess over to MCO...
Assigning Baremetal to take a look here
This is being deployed as a None platform: ... spec: cloudConfig: name: "" platformSpec: type: None status: apiServerInternalURI: https://api-int.huirwang0911.qe.devcluster.openshift.com:6443 apiServerURL: https://api.huirwang0911.qe.devcluster.openshift.com:6443 etcdDiscoveryDomain: huirwang0911.qe.devcluster.openshift.com infrastructureName: huirwang0911-srh9h platform: None platformStatus: type: None That's why our services aren't running. I don't believe our nodeip-configuration code will work without a platform, so if the None platform was intentional then another method would have to be used. If the None platform was not intentional then whatever caused the infrastructure object to be populated this way needs to be fixed.
I noticed that the install-config in this cluster said platform "none", but that's what the docs say to do for bare metal: https://docs.openshift.com/container-platform/4.5/installing/installing_bare_metal/installing-bare-metal.html#installation-bare-metal-config-yaml_installing-bare-metal Is that wrong or is there something else that triggers a baremetal platform in the infrastructure object?
Okay, I missed that this was UPI because of all the discussion of IPI components. My team only does IPI so I don't really know anything about UPI. I do know that if you install with UPI then you don't get any of our stuff because it's all dependent on baremetal (IPI) platform configuration options. It's possible it could be adapted for use with UPI, but I believe you'd need to talk to the installer team about that. They maintain baremetal UPI.
Moving to networking team , as OVN net-interfaces are not something the installer team can triage or fix.
(In reply to Abhinav Dahiya from comment #45) > Moving to networking team , as OVN net-interfaces are not something the > installer team can triage or fix. Seriously people. There is literally nothing that is networking specific about this. The *PLATFORM* in use must pass --node-ip to kubelet, and it must do something intelligent to figure out what the node's actual NIC is. The networking team does not do nor is it involved with host-level networking configuration. It is not a networking team problem. I don't know if it's an installer problem, or what. But whatever platform is being used, even if it is none, MUST PASS A VALID --node-ip TO KUBELET. And it's that platform's job to figure out what IP addresses are useful, because hey, it's the platform, and it knows how networking is configured for that deployment. That has nothing to do with the SDN/OVN/whatever. What kind of meeting do we need to call with everyone so that I may clearly explain this?
OK, so. On bare metal IPI, we run the nodeip-configuration service, which figures out the right node IP to use and passes it to kubelet via --node-ip. However, this depends on having an apiserver VIP, which doesn't exist in the UPI case. We cannot just pass an arbitrary IP in the UPI case because the helper binary used by nodeip-configuration expects to find an interface that has a _direct_ route to the provided IP, not just a default route. On bare metal UPI, kubelet tries to find the node IP like so: 1. Was --node-ip passed? No 2. Is the node name actually an IP address? No 3. Does the node name resolve to an IP? YES! 4. Find an IP on an interface with a default route. (Not reached because step 3 succeeded) The problem with step 3, as diagnosed earlier, is that the node has the nss-myhostname plugin installed (this is standard in Fedora; presumably also RHEL/RHCOS?) and so when we try to resolve the hostname in step 3, it returns all of the node's IPs in interface number order, which includes the ovn-k8s-mp0 IP before the br-ex IP, and kubelet does not do any sanity checking of routes in this case, so it uses that bad IP. (If ovn-kubernetes had not fiddled with the node's network configuration then there would not be any other network interfaces with IPs on them at startup, and so kubelet would have picked the correct IP.) So, some fixes 1. We could add a new mode to runtimecfg node-ip to just figure out the best node IP in a global sense (ie not relative to the apiserver VIP), and make MCO call it in that mode on bare metal UPI and pass that --node-ip to kubelet. There is a small chance that this could result in the default node IP changing on UPI nodes where people are doing strange things with multiple default routes. 2. We could try to fix ovs-configure.sh to set things up in such a way that br-ex ends up being created before ovn-k8s-mp0 on reboot, so that naively iterating interfaces will find the correct one first. This depends on internal details of OVS and NetworkManager and may not be possible. 2a. We could make br-ex be transient so it didn't get automatically recreated by OVS on restart, but IIRC there's some reason to not do that which I don't remember. 3. We could try uninstalling nss-myhostname, but I suspect we'd run into other problems on the system if we tried to remove it. 4. We could have ovs-configuration.sh add a node name to IP mapping in /etc/hosts so that nss-myhostname would be bypassed and kubelet would get the right IP when it looked up the node name. (And other processes? It is possible that kubelet is not the only piece of software on the system that is being thwarted by the unexpected weird interaction of nss-myhostname and ovs-configuration.sh) 5. We could fix kubelet to look more carefully at the results of `net.LookupIP(node.Name)` and require/prefer an IP on the interface with the default route. _Theoretically_ this is an incompatible behavior change and people might object to it, but it seems unlikely. This may be worth doing even if we also do one of the other options. 6. We could say that users need to manually configure the node IP (somehow) when doing bare metal UPI + ovn-kubernetes.
I didn't actually mean to click "undo all of dcbw's change" but at any rate, if it's not Networking it's MCO, not Installer anyway
(In reply to Dan Winship from comment #47) > OK, so. > > On bare metal IPI, we run the nodeip-configuration service, which figures > out the right node IP to use and passes it to kubelet via --node-ip. > However, this depends on having an apiserver VIP, which doesn't exist in the > UPI case. We cannot just pass an arbitrary IP in the UPI case because the > helper binary used by nodeip-configuration expects to find an interface that > has a _direct_ route to the provided IP, not just a default route. > > On bare metal UPI, kubelet tries to find the node IP like so: > > 1. Was --node-ip passed? No > 2. Is the node name actually an IP address? No > 3. Does the node name resolve to an IP? YES! > 4. Find an IP on an interface with a default route. (Not reached because > step 3 succeeded) > > The problem with step 3, as diagnosed earlier, is that the node has the > nss-myhostname plugin installed (this is standard in Fedora; presumably also > RHEL/RHCOS?) and so when we try to resolve the hostname in step 3, it > returns all of the node's IPs in interface number order, which includes the > ovn-k8s-mp0 IP before the br-ex IP, and kubelet does not do any sanity > checking of routes in this case, so it uses that bad IP. (If ovn-kubernetes > had not fiddled with the node's network configuration then there would not > be any other network interfaces with IPs on them at startup, and so kubelet > would have picked the correct IP.) Nothing can expect interfaces to be in any specific order. Ever. It doesn't matter if ovn-kubernetes fiddles with them, or if some other magic VPN the customer wants creates VPN tunnels as part of startup, or if for whatever reason the run IPsec and that creates a magic interface to. > So, some fixes > > 1. We could add a new mode to runtimecfg node-ip to just figure out the > best node IP > in a global sense (ie not relative to the apiserver VIP), and make MCO > call it in > that mode on bare metal UPI and pass that --node-ip to kubelet. There > is a small > chance that this could result in the default node IP changing on UPI > nodes where > people are doing strange things with multiple default routes. In all cases that don't have a cloud provider or external heavily-managed DNS the correct auto-detect approach is "the IP of whatever interface has the default route". That's where kubelet falls down, because UPI simply doesn't have the heavily-managed DNS infrastructure. For cloud providers that have a more nuanced idea of internal/external/DNS/etc it makes sense to do a DNS lookup, because the cloud controls DNS and you need to take what the cloud wants as the node's identity/IP. > 2. We could try to fix ovs-configure.sh to set things up in such a way > that br-ex ends > up being created before ovn-k8s-mp0 on reboot, so that naively > iterating interfaces > will find the correct one first. This depends on internal details of > OVS and > NetworkManager and may not be possible. Nope. This has nothing to do with ovs-configure.sh and is not its responsiblity. > 2a. We could make br-ex be transient so it didn't get automatically > recreated by > OVS on restart, but IIRC there's some reason to not do that which > I don't > remember. Still not the problem. Interfaces come and go and nothing can rely on their ordering, ever. > 3. We could try uninstalling nss-myhostname, but I suspect we'd run into > other problems on > the system if we tried to remove it. > > 4. We could have ovs-configuration.sh add a node name to IP mapping in > /etc/hosts so that > nss-myhostname would be bypassed and kubelet would get the right IP > when it looked up > the node name. (And other processes? It is possible that kubelet is not > the only piece > of software on the system that is being thwarted by the unexpected > weird interaction > of nss-myhostname and ovs-configuration.sh) Nope, still not ovs-configure.sh's problem. If the platform (even if it's "none') doesn't set the machine up correctly, we should not be working around that. > 5. We could fix kubelet to look more carefully at the results of > `net.LookupIP(node.Name)` > and require/prefer an IP on the interface with the default route. > _Theoretically_ this is > an incompatible behavior change and people might object to it, but it > seems unlikely. > This may be worth doing even if we also do one of the other options. Possibly, yes, because kubelet is the thing that actually needs the right information. But as you say, I'm pretty sure upstream kubelet won't care much about non-cloud-provider cases at large scale and will just punt back to making the machine itself be set up correctly. We can try though. > 6. We could say that users need to manually configure the node IP > (somehow) when doing bare > metal UPI + ovn-kubernetes. Seems like scripting should really be doing this for us, eg #1, a quasi-cloud-provider thing that does half what a cloud provider does, but assumes no external intelligence.
To be clear, my vote is Winship's #1 option; runtimecfg node-ip.
> Nothing can expect interfaces to be in any specific order. Ever. Unless there's only one of them. Until we landed shared gateway mode, it was guaranteed that a node that only had one interface could just run kubelet without needing to override --node-ip, so... > This has nothing to do with ovs-configure.sh ...I 100% disagree with that. All that said, it turns out that nss-myhostname *doesn't* simply return the addresses in interface order: · The local, configured hostname is resolved to all locally configured IP addresses ordered by their scope We are claiming that the IP address on ovn-k8s-mp0 has global scope, so nss-myhostname considers it valid to return as the node's primary IP, which seems not implausible. Though OTOH it seems like nobody uses scope correctly... on my laptop the libvirt bridge and the VPN tunnel both also have "scope global"
(In reply to Dan Winship from comment #54) > > Nothing can expect interfaces to be in any specific order. Ever. > > Unless there's only one of them. Until we landed shared gateway mode, it was > guaranteed that a node that only had one interface could just run kubelet > without needing to override --node-ip, so... guaranteed because it mostly worked this way before, even though something was doing the wrong thing, is not really guaranteed. It's "by accident it worked this way in the past". > > This has nothing to do with ovs-configure.sh > > ...I 100% disagree with that. I may be exposed by ovs-configure.sh, but the problem is 100% not the fault of the network configuration done there, precisely because ovs-configure.sh is not the only thing that touches host networking. Anything else can also do that especially in UPI where the custom is free to run whatever they want on the host, including VMs, tunnels, whatever. Any of those things may also confuse nss-myhostname. > All that said, it turns out that nss-myhostname *doesn't* simply return the > addresses in interface order: > > · The local, configured hostname is resolved to all locally > configured IP addresses > ordered by their scope > > We are claiming that the IP address on ovn-k8s-mp0 has global scope, so > nss-myhostname considers it valid to return as the node's primary IP, which > seems not implausible. Sure, we should fix that. But... > Though OTOH it seems like nobody uses scope correctly... on my laptop the > libvirt bridge and the VPN tunnel both also have "scope global" I can guarantee that nobody uses it correctly, as you've found. I appreciate the attempt by nss-myhostname to bring some order to the chaos, but it's a long road and stuff gets updated incrementally. Even if we update ovs-configure.sh and ovn-kubernetes *and* openshift-sdn to set the right scope on tun0/mp0/gw0/etc everything else in the world sets the wrong scope and we'll *still* have this problem on UPI whenever anyone does anything custom to the machine's networking. This can all be avoided by just *doing the right thing* on UPI by using the IP addresses of the NIC that has the default route, and let the administrator override it explicitly if there are multiple NICs in the machine.
Everyone agrees that on UPI machines with "complicated" networking configurations, the administrator is going to have to configure stuff. I am arguing specifically that if you have a machine with a _trivial_ network configuration, it used to work without needing any further manual tweaking, and now we have broken that. Saying "it will break if anyone else does anything custom" is irrelevant because I am specifically only talking about the case where no one did anything custom.
(In reply to Dan Winship from comment #57) > Everyone agrees that on UPI machines with "complicated" networking > configurations, the administrator is going to have to configure stuff. I am > arguing specifically that if you have a machine with a _trivial_ network > configuration, it used to work without needing any further manual tweaking, > and now we have broken that. Saying "it will break if anyone else does > anything custom" is irrelevant because I am specifically only talking about > the case where no one did anything custom. Disagree. The whole point of UPI is *because* you want to do custom things and heavily manage the OS. Otherwise you would use RHCOS.
"Otherwise you would use RHCOS and metal. Not "none" platform." Or is that not what we expect?
The merged PR is not a complete fix (though it may unblock testing by letting you manually override the node IP by setting KUBELET_NODE_IP=... in /etc/kubernetes/kubelet-env
Hello Dan, After setting KUBELET_NODE_IP in /etc/kubernetes/kubelet-env and restarting kubelet, do we need any other steps? I tired this but kubelet is not picking up this IP. # cat /etc/kubernetes/kubelet-env KUBELET_NODE_IP="10.0.98.97" Restarted kubelet service. F UID PID PPID PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND 4 0 375383 1 20 0 2545984 157636 - Ssl ? 0:29 kubelet --config=/etc/kubernetes/kubelet.conf --bootstrap-kubeconfig=/etc/kubernetes/kubeconfig --kubeconfig=/var/lib/kubelet/kubeconfig --container-runtime=remote --container-runtime-endpoint=/var/run/crio/crio.sock --runtime-cgroups=/system.slice/crio.service --node-labels=node-role.kubernetes.io/worker,node.openshift.io/os_id=rhcos --node-ip=${KUBELET_NODE_IP:-} --minimum-container-ttl-duration=6m0s --volume-plugin-dir=/etc/kubernetes/kubelet-plugins/volume/exec --cloud-provider= --pod-infra-container-image=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:70a88050d9c755137378b8a8618bb42b35e78397b9d044986aa3ba72aae97077 --v=4 # journalctl -u kubelet | grep -i "node_ip" Sep 18 02:56:49 wsun09181ci-6ncwv-compute-0 hyperkube[1785]: I0918 02:56:49.649436 1785 flags.go:59] FLAG: --node-ip="${KUBELET_NODE_IP:-}" Sep 18 03:12:44 wsun09181ci-6ncwv-compute-0 hyperkube[1742]: I0918 03:12:44.846087 1742 flags.go:59] FLAG: --node-ip="${KUBELET_NODE_IP:-}" Sep 18 03:22:08 wsun09181ci-6ncwv-compute-0 hyperkube[1737]: I0918 03:22:08.735134 1737 flags.go:59] FLAG: --node-ip="${KUBELET_NODE_IP:-}" Sep 18 06:57:05 wsun09181ci-6ncwv-compute-0 hyperkube[338775]: I0918 06:57:05.590494 338775 flags.go:59] FLAG: --node-ip="${KUBELET_NODE_IP:-}" Sep 18 06:58:07 wsun09181ci-6ncwv-compute-0 hyperkube[340344]: I0918 06:58:07.113976 340344 flags.go:59] FLAG: --node-ip="${KUBELET_NODE_IP:-}" Sep 18 06:58:23 wsun09181ci-6ncwv-compute-0 hyperkube[340748]: I0918 06:58:23.770677 340748 flags.go:59] FLAG: --node-ip="${KUBELET_NODE_IP:-}" Sep 18 07:00:20 wsun09181ci-6ncwv-compute-0 hyperkube[344096]: I0918 07:00:20.339392 344096 flags.go:59] FLAG: --node-ip="${KUBELET_NODE_IP:-}" Sep 18 07:00:29 wsun09181ci-6ncwv-compute-0 hyperkube[344221]: I0918 07:00:29.352124 344221 flags.go:59] FLAG: --node-ip="${KUBELET_NODE_IP:-}" Sep 18 07:06:11 wsun09181ci-6ncwv-compute-0 hyperkube[353937]: I0918 07:06:11.828694 353937 flags.go:59] FLAG: --node-ip="${KUBELET_NODE_IP:-}" Sep 18 07:17:48 wsun09181ci-6ncwv-compute-0 hyperkube[373719]: I0918 07:17:48.370699 373719 flags.go:59] FLAG: --node-ip="${KUBELET_NODE_IP:-}" Sep 18 07:18:51 wsun09181ci-6ncwv-compute-0 hyperkube[375383]: I0918 07:18:51.079080 375383 flags.go:59] FLAG: --node-ip="${KUBELET_NODE_IP:-}"
oops, I'm an idiot. It doesn't currenty work.
(In reply to Sunil Choudhary from comment #62) > After setting KUBELET_NODE_IP in /etc/kubernetes/kubelet-env and restarting > kubelet, do we need any other steps? I tired this but kubelet is not picking > up this IP. For reasons I don't understand, `/etc/kubernetes/kubelet-env` does not actually seem to get read by the service, but it works (with the latest code, not the earlier version) if you create an additional file. eg: cat > /etc/systemd/system/kubelet.service.d/80-nodeip.conf [Service] Environment=KUBELET_NODE_IP="10.0.98.97"
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196