This might be an infra issue with PSI. We'll look into it.
Still NEW (not ASSIGNED) and 4.6 is closing in. Punting to 4.7, because the "infra issue with PSI" theory does not sound like a product-side issue that would block 4.6 GA.
Could you please describe how you add the secondary network to your instances? You should normally have an existing default route, with a lower metric than the new default of metric 101. Here, we have a default gateway to 10.0.128.1, coming from the node subnet: [core@mfedosin-6vvnj-bootstrap ~]$ route -n Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 0.0.0.0 10.0.128.1 0.0.0.0 UG 100 0 0 ens3 10.0.128.0 0.0.0.0 255.255.128.0 U 100 0 0 ens3 169.254.169.254 10.0.128.11 255.255.255.255 UGH 100 0 0 ens3 Now we attach the instance to the manila subnet: ❯ openstack subnet show manila_subnet -f yaml -c gateway_ip -c network_id gateway_ip: 172.16.32.1 network_id: 27671b90-c2bc-483f-b783-cc856f20ee5d ❯ openstack server add network 40ba2453-99a1-4b1e-bcfc-f75a9cb20030 27671b90-c2bc-483f-b783-cc856f20ee5d We get a new default gateway of 172.16.32.1, but with higher metric: [core@mfedosin-6vvnj-bootstrap ~]$ route -n Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 0.0.0.0 10.0.128.1 0.0.0.0 UG 100 0 0 ens3 0.0.0.0 172.16.32.1 0.0.0.0 UG 101 0 0 ens6 10.0.128.0 0.0.0.0 255.255.128.0 U 100 0 0 ens3 169.254.169.254 10.0.128.11 255.255.255.255 UGH 100 0 0 ens3 169.254.169.254 172.16.34.2 255.255.255.255 UGH 101 0 0 ens6 172.16.32.0 0.0.0.0 255.255.240.0 U 101 0 0 ens6 We're also still able to talk to api endpoints.
Hello Martin, Attached the second network interface from the PSI web console. I did not reproduce this issue on OCP with Openshift SDN network cluster. The partial route information of the node with this issue is: # route Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface default _gateway 0.0.0.0 UG 101 0 0 ens6 default host-192-168-0- 0.0.0.0 UG 800 0 0 br-ex The I realize it might be a specific issue of OCP with OVN network, let me try to reproduce with OVN network.
This is the route info for OVN before attach the second network interface: sh-4.4# route Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface default host-192-168-0- 0.0.0.0 UG 800 0 0 br-ex 10.128.0.0 10.128.2.1 255.252.0.0 UG 0 0 0 ovn-k8s-mp0 10.128.2.0 0.0.0.0 255.255.254.0 U 0 0 0 ovn-k8s-mp0 link-local 0.0.0.0 255.255.240.0 U 0 0 0 ovn-k8s-gw0 169.254.169.254 host-192-168-0- 255.255.255.255 UGH 800 0 0 br-ex 172.30.0.0 10.128.2.1 255.255.0.0 UG 0 0 0 ovn-k8s-mp0 192.168.0.0 0.0.0.0 255.255.192.0 U 800 0 0 br-ex
Tried to install OCP cluster with additional network + OVN network, installation failed. The root cause looks like, when the master has ovn and additional network configured, it can not communicated with internet. The route info of master node: $ route Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface default _gateway 0.0.0.0 UG 101 0 0 ens4 default _gateway 0.0.0.0 UG 800 0 0 br-ex 169.254.169.254 host-172-16-34- 255.255.255.255 UGH 101 0 0 ens4 169.254.169.254 192.168.0.11 255.255.255.255 UGH 800 0 0 br-ex 172.16.32.0 0.0.0.0 255.255.240.0 U 101 0 0 ens4 192.168.0.0 0.0.0.0 255.255.192.0 U 800 0 0 br-ex Kubelet logs: Sep 28 05:33:45 piqin-9281-wx96r-master-1 hyperkube[3292]: E0928 05:33:45.895590 3292 controller.go:136] failed to ensure node lease exists, will retry in 7s, error: Get "https://api-int.piqin-9281.0928-ew6.qe.rhcloud.com:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/piqin-9281-wx96r-master-1?timeout=10s": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) So, I'll update the bug title and increase the Severity and Priority. It blocks the additional network + OVN profile test.
Moving this over to Networking -> ovn-kubernetes as this doesn't seems like an issue with the installer.
Why does ens4 get the default route? It's overriding the normal node default route, and that means all your internet traffic is going out ens4, not br-ex (which is where it should be going). Whatever is adding ens4, and setting the default route, is the likely culprit. Is that the PSI web console?
Adding an interface post install with a default route with a lower metric than br-ex is going to cause this to happen. We can investigate using the same metric as the original interface when we install instead of getting 800, but this isn't a blocker for 4.6. Moving to 4.6z.
Created attachment 1739244 [details] Logs get from bootstrap node.
Created attachment 1739245 [details] ovn-configuration.service log
It looks like the fix is not right. The metric needs to be placed on the interface instead of the bridge: sh-4.4# nmcli conn modify ovs-if-br-ex ipv4.route-metric 100 sh-4.4# nmcli conn modify br-ex ipv4.route-metric -1 sh-4.4# ip route default via 10.0.32.1 dev br-ex proto dhcp metric 800 10.0.32.1 dev br-ex proto dhcp scope link metric 800 10.128.0.0/16 via 10.128.8.1 dev ovn-k8s-mp0 10.128.8.0/23 dev ovn-k8s-mp0 proto kernel scope link src 10.128.8.2 169.254.0.0/20 dev ovn-k8s-gw0 proto kernel scope link src 169.254.0.1 172.30.0.0/16 via 10.128.8.1 dev ovn-k8s-mp0 sh-4.4# systemctl restart NetworkManager sh-4.4# ip route default via 10.0.32.1 dev br-ex proto dhcp metric 100 10.0.32.1 dev br-ex proto dhcp scope link metric 100 10.128.0.0/16 via 10.128.8.1 dev ovn-k8s-mp0 10.128.8.0/23 dev ovn-k8s-mp0 proto kernel scope link src 10.128.8.2 169.254.0.0/20 dev ovn-k8s-gw0 proto kernel scope link src 169.254.0.1 172.30.0.0/16 via 10.128.8.1 dev ovn-k8s-mp0
Verified with: 4.7.0-fc.1 # ip route default via 192.168.0.1 dev br-ex proto dhcp metric 100 default via 172.16.32.1 dev ens4 proto dhcp metric 101 10.128.0.0/14 via 10.128.2.1 dev ovn-k8s-mp0 10.128.2.0/23 dev ovn-k8s-mp0 proto kernel scope link src 10.128.2.2 169.254.0.0/20 dev ovn-k8s-gw0 proto kernel scope link src 169.254.0.1 169.254.169.254 via 192.168.0.10 dev br-ex proto dhcp metric 100 169.254.169.254 via 172.16.34.1 dev ens4 proto dhcp metric 101 172.16.32.0/20 dev ens4 proto kernel scope link src 172.16.35.179 metric 101 172.30.0.0/16 via 10.128.2.1 dev ovn-k8s-mp0 192.168.0.0/18 dev br-ex proto kernel scope link src 192.168.1.16 metric 100
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633