Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 2011502

Summary: Update Docs to confirm ovn-k only supports the use of a single default gateway
Product: OpenShift Container Platform Reporter: Mat Kowalski <mko>
Component: DocumentationAssignee: Mike McKiernan <mmckiern>
Status: CLOSED CURRENTRELEASE QA Contact: Ross Brattain <rbrattai>
Severity: unspecified Docs Contact: Vikram Goyal <vigoyal>
Priority: unspecified    
Version: 4.8CC: anusaxen, aos-bugs, astoycos, bpickard, jokerman, mmckiern, rbrattai
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-11-24 23:43:37 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2011747    

Description Mat Kowalski 2021-10-06 17:17:57 UTC
+++ Problem

When installing 4.8.12 the following configuration causes pod/ovnkube-node to CrashLoopBackOff during the cluster installation

* 3 nodes
* dual-stack with IPv4 subnet as the first one
* 2 separate NICs for IP stacks

The installation of the cluster is completely blocked because of this.

+++ OVN-K8s status

```
# oc -n openshift-ovn-kubernetes get pod/ovnkube-node-fjjx5 -o yaml | less
[...]
          I1006 16:09:50.985852   60651 helper_linux.go:73] Found default gateway interface br-ex 192.168.127.1
          I1006 16:09:50.985923   60651 helper_linux.go:73] Found default gateway interface ens4 fe80::5054:ff:febe:bcd4
          F1006 16:09:50.985939   60651 ovnkube.go:130] multiple gateway interfaces detected: br-ex ens4
```

```
[root@rdu-infra-edge-01 tmp]# oc get co network
NAME      VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE
network             False       True          False      30m
[root@rdu-infra-edge-01 tmp]# oc -n openshift-ovn-kubernetes get pods
NAME                   READY   STATUS             RESTARTS   AGE
ovnkube-master-ksz6s   6/6     Running            6          30m
ovnkube-master-lmmhx   6/6     Running            3          30m
ovnkube-node-fjjx5     3/4     CrashLoopBackOff   10         30m
ovnkube-node-kqppf     3/4     CrashLoopBackOff   10         30m
```

+++ Cluster status

```
[root@rdu-infra-edge-01 tmp]# oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version             False       True          99m     Unable to apply 4.8.12: an unknown error has occurred: MultipleErrors

[root@rdu-infra-edge-01 tmp]# oc get nodes
NAME                                   STATUS     ROLES    AGE   VERSION
test-infra-cluster-9736b6ab-master-1   NotReady   master   98m   v1.21.1+d8043e1
test-infra-cluster-9736b6ab-master-2   NotReady   master   98m   v1.21.1+d8043e1
```

+++ Node network configuration

```
[root@test-infra-cluster-9736b6ab-master-1 ~]# ip -4 r
default via 192.168.127.1 dev br-ex proto dhcp metric 100 
172.30.0.0/16 via 172.30.2.1 dev ovn-k8s-mp0 
172.30.2.0/23 dev ovn-k8s-mp0 proto kernel scope link src 172.30.2.2 
192.168.127.0/24 dev br-ex proto kernel scope link src 192.168.127.128 metric 100 

[root@test-infra-cluster-9736b6ab-master-1 ~]# ip -6 r
::1 dev lo proto kernel metric 256 pref medium
2002:db8:0:1::/64 dev ovn-k8s-mp0 proto kernel metric 256 pref medium
2002:db8::/53 via 2002:db8:0:1::1 dev ovn-k8s-mp0 metric 1024 pref medium
3001:db9::7 dev ens4 proto kernel metric 101 pref medium
3001:db9::/120 dev ens4 proto ra metric 101 pref medium
fe80::/64 dev br-ex proto kernel metric 100 pref medium
fe80::/64 dev ens4 proto kernel metric 101 pref medium
fe80::/64 dev api proto kernel metric 256 pref medium
fe80::/64 dev ingress proto kernel metric 256 pref medium
fe80::/64 dev genev_sys_6081 proto kernel metric 256 pref medium
default via fe80::5054:ff:febe:bcd4 dev ens4 proto ra metric 101 pref medium

[root@test-infra-cluster-9736b6ab-master-1 ~]# ip -4 a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
5: br-ex: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
    inet 192.168.127.128/24 brd 192.168.127.255 scope global dynamic noprefixroute br-ex
       valid_lft 3045sec preferred_lft 3045sec
10: ovn-k8s-mp0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue state UNKNOWN group default qlen 1000
    inet 172.30.2.2/23 brd 172.30.3.255 scope global ovn-k8s-mp0
       valid_lft forever preferred_lft forever

[root@test-infra-cluster-9736b6ab-master-1 ~]# ip -6 a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 state UNKNOWN qlen 1000
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
3: ens4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 state UP qlen 1000
    inet6 3001:db9::7/128 scope global dynamic noprefixroute 
       valid_lft 2887sec preferred_lft 2887sec
    inet6 fe80::e55c:e28:2ee1:87dc/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever
5: br-ex: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 state UNKNOWN qlen 1000
    inet6 fe80::269c:dfb7:60de:a1bb/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever
6: api@br-ex: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 state UP 
    inet6 fe80::21a:4aff:febb:312d/64 scope link 
       valid_lft forever preferred_lft forever
7: ingress@br-ex: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 state UP 
    inet6 fe80::21a:4aff:feac:2144/64 scope link 
       valid_lft forever preferred_lft forever
9: genev_sys_6081: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65000 state UNKNOWN qlen 1000
    inet6 fe80::bcab:a1ff:fe42:c4f0/64 scope link 
       valid_lft forever preferred_lft forever
10: ovn-k8s-mp0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 state UNKNOWN qlen 1000
    inet6 2002:db8:0:1::2/64 scope global 
       valid_lft forever preferred_lft forever
    inet6 fe80::a826:3aff:fef2:862b/64 scope link 
       valid_lft forever preferred_lft forever

[root@test-infra-cluster-9736b6ab-master-1 ~]# ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master ovs-system state UP mode DEFAULT group default qlen 1000
    link/ether 02:00:00:72:09:ed brd ff:ff:ff:ff:ff:ff
3: ens4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000
    link/ether 02:00:00:79:8b:3e brd ff:ff:ff:ff:ff:ff
4: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 82:75:a1:c7:de:4c brd ff:ff:ff:ff:ff:ff
5: br-ex: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether 02:00:00:72:09:ed brd ff:ff:ff:ff:ff:ff
6: api@br-ex: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default 
    link/ether 00:1a:4a:bb:31:2d brd ff:ff:ff:ff:ff:ff
7: ingress@br-ex: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default 
    link/ether 00:1a:4a:ac:21:44 brd ff:ff:ff:ff:ff:ff
8: br-int: <BROADCAST,MULTICAST> mtu 1400 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether d2:1d:f3:18:b6:6e brd ff:ff:ff:ff:ff:ff
9: genev_sys_6081: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65000 qdisc noqueue master ovs-system state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether be:ab:a1:42:c4:f0 brd ff:ff:ff:ff:ff:ff
10: ovn-k8s-mp0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether aa:26:3a:f2:86:2b brd ff:ff:ff:ff:ff:ff

```

+++ OCP installation configuration

```
# cat /data/test/631f8ae1-bb6f-4c7e-acf2-6010d75c7200/install-config.yaml                                                       [...]
networking:
  networkType: OVNKubernetes
  clusterNetwork:
  - cidr: 172.30.0.0/16
    hostPrefix: 23
  - cidr: 2002:db8::/53                                                                                                                                                    
    hostPrefix: 64                                                                                                                                                         
  machineNetwork:                                                                                                                                                          
  - cidr: 192.168.127.0/24                                                                                                                                                 
  - cidr: 3001:db9::/120                                                                                                                                                   
  serviceNetwork:                                                                                                                                                          
  - 10.128.0.0/14                                                                                                                                                          
  - 2003:db8::/112
[...]
Platform:                                                                                                                                                                  
  baremetal:                                                                                                                                                               
    provisioningNetwork: Disabled                                                                                                                                          
    apiVIP: 192.168.127.121
    ingressVIP: 192.168.127.118
    hosts:
    - name: test-infra-cluster-9736b6ab-master-0
      role: master
      bootMACAddress: 02:00:00:1a:75:16
      bootMode: legacy
    - name: test-infra-cluster-9736b6ab-master-1
      role: master
      bootMACAddress: 02:00:00:72:09:ed
      bootMode: legacy
    - name: test-infra-cluster-9736b6ab-master-2
      role: master
      bootMACAddress: 02:00:00:b3:e8:46
      bootMode: legacy
  vsphere: null
```

Comment 2 Mat Kowalski 2021-10-06 17:24:12 UTC
Output from configure-ovs.sh on the affected node - http://pastebin.test.redhat.com/999292

Comment 5 Mat Kowalski 2021-10-13 11:43:40 UTC
*** Bug 2011747 has been marked as a duplicate of this bug. ***

Comment 10 Ross Brattain 2021-10-15 12:59:33 UTC
Ah, maybe it's more clear in the code. 

		// validate that both IP Families use the same interface for the gateway
		if intfName == "" {
			intfName = intfIPv6Name
		} else if intfName != intfIPv6Name {
			return "", nil, fmt.Errorf("multiple gateway interfaces detected: %s %s", intfName, intfIPv6Name)
		}


Maybe "both IP Families must use the same interface for the gateway" in some Dual-stack related section.

Also for configure-ovs.sh and `br-ex` when used with OVN Dual-stack: "OVN will choose/use the interface with the IPv4 default route first."


  # find default interface
  while [ $counter -lt 12 ]; do
    # check ipv4
    iface=$(ip route show default | awk '{ if ($4 == "dev") { print $5; exit } }')
    if [[ -n "$iface" ]]; then
      echo "IPv4 Default gateway interface found: ${iface}"
      break
    fi
    # check ipv6
    iface=$(ip -6 route show default | awk '{ if ($4 == "dev") { print $5; exit } }')
    if [[ -n "$iface" ]]; then
      echo "IPv6 Default gateway interface found: ${iface}"
      break
    fi
    counter=$((counter+1))
    echo "No default route found on attempt: ${counter}"
    sleep 5
  done

Comment 11 Mike McKiernan 2021-10-27 14:56:53 UTC
I took a stab at a doc update: https://github.com/openshift/openshift-docs/pull/38035

* Please confirm or correct my assertion in the "OVN-K limitations" section that the only remedy is to reconfigure host networking so that both IP families use the same iface for the default gateway.

* I'm not familiar with installation (and not to familiar with dual-stack either).  From an installation angle, is this UPI only? Bare-metal only?