Bug 1980135

Summary: On an IPv6 single stack cluster traffic between master nodes is sent via default gw instead of local subnet
Product: OpenShift Container Platform Reporter: Marius Cornea <mcornea>
Component: NetworkingAssignee: Jaime Caamaño Ruiz <jcaamano>
Networking sub component: ovn-kubernetes QA Contact: Marius Cornea <mcornea>
Status: CLOSED ERRATA Docs Contact: Padraig O'Grady <pogrady>
Severity: urgent    
Priority: unspecified CC: achernet, agurenko, astoycos, danw, dphillip, ealcaniz, jcaamano, jdee, mifiedle, pogrady, rolove, yboaron, yprokule, yroblamo
Version: 4.8   
Target Milestone: ---   
Target Release: 4.9.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: When using IPv6 DHCP, node interface addresses might be leased with a /128 prefix. Consequence: OVN-Kubernetes uses the same prefix to infer the node's network and thus routes any other address traffic, including traffic to other cluster nodes, through the gateway. Fix: OVN-Kubernetes inspects the node's routing table and checks for the wider routing entry for the node's interface address and uses that prefix to infer the node's network. Result: Traffic to other cluster nodes is no longer routed through the gateway.
Story Points: ---
Clone Of:
: 1994624 (view as bug list) Environment:
Last Closed: 2021-10-18 17:38:24 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1994624    

Description Marius Cornea 2021-07-07 21:14:04 UTC
Description of problem:

On an IPv6 single stack cluster traffic between master nodes is sent via the network's router. Since all master nodes are using the same subnet communication between the master nodes should not reach the network router. As a result this can impact the cluster functionality in the eventuality one of the master nodes cannot reach the router. 

Version-Release number of selected component (if applicable):
4.8.0-0.nightly-2021-07-04-112043

How reproducible:
100%

Steps to Reproduce:

1. Deploy a 3 x masters cluster with IPv6 single stack networking via baremetal IPI flow

[kni ~]$ oc get nodes -o wide

NAME                                                  STATUS   ROLES           AGE     VERSION           INTERNAL-IP         EXTERNAL-IP   OS-IMAGE                                                       KERNEL-VERSION                CONTAINER-RUNTIME
openshift-master-0.kni-qe-0.lab.eng.rdu2.redhat.com   Ready    master,worker   2d11h   v1.21.1+f36aa36   2620:52:0:11c::20   <none>        Red Hat Enterprise Linux CoreOS 48.84.202107040900-0 (Ootpa)   4.18.0-305.7.1.el8_4.x86_64   cri-o://1.21.1-12.rhaos4.8.git30ca719.el8

openshift-master-1.kni-qe-0.lab.eng.rdu2.redhat.com   Ready    master,worker   2d11h   v1.21.1+f36aa36   2620:52:0:11c::21   <none>        Red Hat Enterprise Linux CoreOS 48.84.202107040900-0 (Ootpa)   4.18.0-305.7.1.el8_4.x86_64   cri-o://1.21.1-12.rhaos4.8.git30ca719.el8

openshift-master-2.kni-qe-0.lab.eng.rdu2.redhat.com   Ready    master,worker   2d11h   v1.21.1+f36aa36   2620:52:0:11c::22   <none>        Red Hat Enterprise Linux CoreOS 48.84.202107040900-0 (Ootpa)   4.18.0-305.7.1.el8_4.x86_64   cri-o://1.21.1-12.rhaos4.8.git30ca719.el8


3. Run a packet capture on the router(external to the cluster)

[root kni]$ tcpdump -i baremetal not tcp port 22 -ennn
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on baremetal, link-type EN10MB (Ethernet), capture size 262144 bytes
16:53:02.641786 0c:42:a1:ee:7e:12 > 0c:42:a1:ee:86:72, ethertype IPv6 (0x86dd), length 1782: 2620:52:0:11c::22.39343 > 2620:52:0:11c::21.6443: Flags [P.], seq 350251707:350253403, ack 61527403, win 258, options [nop,nop,TS val 2888958758 ecr 3796529115], length 1696
16:53:02.641792 0c:42:a1:ee:86:72 > 0c:42:a1:ee:7e:1a, ethertype IPv6 (0x86dd), length 1782: 2620:52:0:11c::22.39343 > 2620:52:0:11c::21.6443: Flags [P.], seq 0:1696, ack 1, win 258, options [nop,nop,TS val 2888958758 ecr 3796529115], length 1696
16:53:02.642231 0c:42:a1:ee:7e:12 > 0c:42:a1:ee:86:72, ethertype IPv6 (0x86dd), length 86: 2620:52:0:11c::22.39343 > 2620:52:0:11c::21.6443: Flags [.], ack 611, win 300, options [nop,nop,TS val 2888958758 ecr 3796529115], length 0
16:53:02.642237 0c:42:a1:ee:86:72 > 0c:42:a1:ee:7e:1a, ethertype IPv6 (0x86dd), length 86: 2620:52:0:11c::22.39343 > 2620:52:0:11c::21.6443: Flags [.], ack 611, win 300, options [nop,nop,TS val 2888958758 ecr 3796529115], length 0
16:53:02.642309 0c:42:a1:ee:7e:12 > 0c:42:a1:ee:86:72, ethertype IPv6 (0x86dd), length 110: 2620:52:0:11c::22.39343 > 2620:52:0:11c::21.6443: Flags [P.], seq 1696:1720, ack 636, win 300, options [nop,nop,TS val 2888958758 ecr 3796529116], length 24
16:53:02.642314 0c:42:a1:ee:86:72 > 0c:42:a1:ee:7e:1a, ethertype IPv6 (0x86dd), length 110: 2620:52:0:11c::22.39343 > 2620:52:0:11c::21.6443: Flags [P.], seq 1696:1720, ack 636, win 300, options [nop,nop,TS val 2888958758 ecr 3796529116], length 24
16:53:02.642345 0c:42:a1:ee:7e:12 > 0c:42:a1:ee:86:72, ethertype IPv6 (0x86dd), length 86: 2620:52:0:11c::22.39343 > 2620:52:0:11c::21.6443: Flags [F.], seq 1720, ack 636, win 300, options [nop,nop,TS val 2888958758 ecr 3796529116], length 0
16:53:02.642349 0c:42:a1:ee:86:72 > 0c:42:a1:ee:7e:1a, ethertype IPv6 (0x86dd), length 86: 2620:52:0:11c::22.39343 > 2620:52:0:11c::21.6443: Flags [F.], seq 1720, ack 636, win 300, options [nop,nop,TS val 2888958758 ecr 3796529116], length 0
16:53:02.642351 0c:42:a1:ee:7e:12 > 0c:42:a1:ee:86:72, ethertype IPv6 (0x86dd), length 86: 2620:52:0:11c::22.39343 > 2620:52:0:11c::21.6443: Flags [R.], seq 1721, ack 636, win 300, options [nop,nop,TS val 2888958758 ecr 3796529116], length 0
16:53:02.642355 0c:42:a1:ee:86:72 > 0c:42:a1:ee:7e:1a, ethertype IPv6 (0x86dd), length 86: 2620:52:0:11c::22.39343 > 2620:52:0:11c::21.6443: Flags [R.], seq 1721, ack 636, win 300, options [nop,nop,TS val 2888958758 ecr 3796529116], length 0
16:53:02.642373 0c:42:a1:ee:7e:12 > 0c:42:a1:ee:86:72, ethertype IPv6 (0x86dd), length 125: 2620:52:0:11c::22.37722 > 2620:52:0:11c::21.6443: Flags [P.], seq 2060208521:2060208560, ack 2866995814, win 9734, options [nop,nop,TS val 4109381108 ecr 3796529035], length 39
16:53:02.642407 0c:42:a1:ee:86:72 > 0c:42:a1:ee:7e:1a, ethertype IPv6 (0x86dd), length 125: 2620:52:0:11c::22.37722 > 2620:52:0:11c::21.6443: Flags [P.], seq 0:39, ack 1, win 9734, options [nop,nop,TS val 4109381108 ecr 3796529035], length 39
16:53:02.644072 0c:42:a1:ee:7e:12 > 0c:42:a1:ee:86:72, ethertype IPv6 (0x86dd), length 1782: 2620:52:0:11c::22.33728 > 2620:52:0:11c::20.6443: Flags [P.], seq 3467523297:3467524993, ack 3528094138, win 258, options [nop,nop,TS val 3477795710 ecr 1716754515], length 1696
16:53:02.644081 0c:42:a1:ee:86:72 > 0c:42:a1:ee:7f:12, ethertype IPv6 (0x86dd), length 1782: 2620:52:0:11c::22.33728 > 2620:52:0:11c::20.6443: Flags [P.], seq 0:1696, ack 1, win 258, options [nop,nop,TS val 3477795710 ecr 1716754515], length 1696
16:53:02.644399 0c:42:a1:ee:7e:12 > 0c:42:a1:ee:86:72, ethertype IPv6 (0x86dd), length 86: 2620:52:0:11c::22.37722 > 2620:52:0:11c::21.6443: Flags [.], ack 67, win 9734, options [nop,nop,TS val 4109381110 ecr 3796529118], length 0
16:53:02.644416 0c:42:a1:ee:86:72 > 0c:42:a1:ee:7e:1a, ethertype IPv6 (0x86dd), length 86: 2620:52:0:11c::22.37722 > 2620:52:0:11c::21.6443: Flags [.], ack 67, win 9734, options [nop,nop,TS val 4109381110 ecr 3796529118], length 0
16:53:02.644470 0c:42:a1:ee:7e:12 > 0c:42:a1:ee:86:72, ethertype IPv6 (0x86dd), length 86: 2620:52:0:11c::22.37722 > 2620:52:0:11c::21.6443: Flags [.], ack 9166, win 9734, options [nop,nop,TS val 4109381110 ecr 3796529118], length 0
16:53:02.644478 0c:42:a1:ee:86:72 > 0c:42:a1:ee:7e:1a, ethertype IPv6 (0x86dd), length 86: 2620:52:0:11c::22.37722 > 2620:52:0:11c::21.6443: Flags [.], ack 9166, win 9734, options [nop,nop,TS val 4109381110 ecr 3796529118], length 0
16:53:02.644502 0c:42:a1:ee:7e:12 > 0c:42:a1:ee:86:72, ethertype IPv6 (0x86dd), length 86: 2620:52:0:11c::22.33728 > 2620:52:0:11c::20.6443: Flags [.], ack 611, win 300, options [nop,nop,TS val 3477795710 ecr 1716754516], length 0
16:53:02.644509 0c:42:a1:ee:86:72 > 0c:42:a1:ee:7f:12, ethertype IPv6 (0x86dd), length 86: 2620:52:0:11c::22.33728 > 2620:52:0:11c::20.6443: Flags [.], ack 611, win 300, options [nop,nop,TS val 3477795710 ecr 1716754516], length 0
16:53:02.644527 0c:42:a1:ee:7e:12 > 0c:42:a1:ee:86:72, ethertype IPv6 (0x86dd), length 86: 2620:52:0:11c::22.37722 > 2620:52:0:11c::21.6443: Flags [.], ack 9197, win 9734, options [nop,nop,TS val 4109381110 ecr 3796529118], length 0
16:53:02.644534 0c:42:a1:ee:86:72 > 0c:42:a1:ee:7e:1a, ethertype IPv6 (0x86dd), length 86: 2620:52:0:11c::22.37722 > 2620:52:0:11c::21.6443: Flags [.], ack 9197, win 9734, options [nop,nop,TS val 4109381110 ecr 3796529118], length 0
16:53:02.644536 0c:42:a1:ee:7e:12 > 0c:42:a1:ee:86:72, ethertype IPv6 (0x86dd), length 121: 2620:52:0:11c::22.37722 > 2620:52:0:11c::21.6443: Flags [P.], seq 39:74, ack 9197, win 9734, options [nop,nop,TS val 4109381110 ecr 3796529118], length 35
16:53:02.644542 0c:42:a1:ee:86:72 > 0c:42:a1:ee:7e:1a, ethertype IPv6 (0x86dd), length 121: 2620:52:0:11c::22.37722 > 2620:52:0:11c::21.6443: Flags [P.], seq 39:74, ack 9197, win 9734, options [nop,nop,TS val 4109381110 ecr 3796529118], length 35
16:53:02.644585 0c:42:a1:ee:7e:12 > 0c:42:a1:ee:86:72, ethertype IPv6 (0x86dd), length 110: 2620:52:0:11c::22.33728 > 2620:52:0:11c::20.6443: Flags [P.], seq 1696:1720, ack 636, win 300, options [nop,nop,TS val 3477795710 ecr 1716754516], length 24
16:53:02.644592 0c:42:a1:ee:86:72 > 0c:42:a1:ee:7f:12, ethertype IPv6 (0x86dd), length 110: 2620:52:0:11c::22.33728 > 2620:52:0:11c::20.6443: Flags [P.], seq 1696:1720, ack 636, win 300, options [nop,nop,TS val 3477795710 ecr 1716754516], length 24
16:53:02.644599 0c:42:a1:ee:7e:12 > 0c:42:a1:ee:86:72, ethertype IPv6 (0x86dd), length 86: 2620:52:0:11c::22.33728 > 2620:52:0:11c::20.6443: Flags [F.], seq 1720, ack 636, win 300, options [nop,nop,TS val 3477795710 ecr 1716754516], length 0
16:53:02.644605 0c:42:a1:ee:86:72 > 0c:42:a1:ee:7f:12, ethertype IPv6 (0x86dd), length 86: 2620:52:0:11c::22.33728 > 2620:52:0:11c::20.6443: Flags [F.], seq 1720, ack 636, win 300, options [nop,nop,TS val 3477795710 ecr 1716754516], length 0
16:53:02.644621 0c:42:a1:ee:7e:12 > 0c:42:a1:ee:86:72, ethertype IPv6 (0x86dd), length 86: 2620:52:0:11c::22.33728 > 2620:52:0:11c::20.6443: Flags [R.], seq 1721, ack 636, win 300, options [nop,nop,TS val 3477795710 ecr 1716754516], length 0
16:53:02.644627 0c:42:a1:ee:86:72 > 0c:42:a1:ee:7f:12, ethertype IPv6 (0x86dd), length 86: 2620:52:0:11c::22.33728 > 2620:52:0:11c::20.6443: Flags [R.], seq 1721, ack 636, win 300, options [nop,nop,TS val 3477795710 ecr 1716754516], length 0
16:53:02.645036 0c:42:a1:ee:7e:12 > 0c:42:a1:ee:86:72, ethertype IPv6 (0x86dd), length 125: 2620:52:0:11c::22.37722 > 2620:52:0:11c::21.6443: Flags [P.], seq 74:113, ack 9197, win 9734, options [nop,nop,TS val 4109381111 ecr 3796529118], length 39
16:53:02.645043 0c:42:a1:ee:86:72 > 0c:42:a1:ee:7e:1a, ethertype IPv6 (0x86dd), length 125: 2620:52:0:11c::22.37722 > 2620:52:0:11c::21.6443: Flags [P.], seq 74:113, ack 9197, win 9734, options [nop,nop,TS val 4109381111 ecr 3796529118], length 39
16:53:02.645765 0c:42:a1:ee:7e:12 > 0c:42:a1:ee:86:72, ethertype IPv6 (0x86dd), length 94: 2620:52:0:11c::22.33738 > 2620:52:0:11c::20.6443: Flags [S], seq 303809659, win 26800, options [mss 1340,sackOK,TS val 3477795712 ecr 0,nop,wscale 7], length 0
16:53:02.645788 0c:42:a1:ee:86:72 > 0c:42:a1:ee:7f:12, ethertype IPv6 (0x86dd), length 94: 2620:52:0:11c::22.33738 > 2620:52:0:11c::20.6443: Flags [S], seq 303809659, win 26800, options [mss 1340,sackOK,TS val 3477795712 ecr 0,nop,wscale 7], length 0
16:53:02.645870 0c:42:a1:ee:7e:12 > 0c:42:a1:ee:86:72, ethertype IPv6 (0x86dd), length 86: 2620:52:0:11c::22.33738 > 2620:52:0:11c::20.6443: Flags [.], ack 1931895112, win 210, options [nop,nop,TS val 3477795712 ecr 1716754518], length 0
16:53:02.645879 0c:42:a1:ee:86:72 > 0c:42:a1:ee:7f:12, ethertype IPv6 (0x86dd), length 86: 2620:52:0:11c::22.33738 > 2620:52:0:11c::20.6443: Flags [.], ack 1, win 210, options [nop,nop,TS val 3477795712 ecr 1716754518], length 0
16:53:02.646061 0c:42:a1:ee:7e:12 > 0c:42:a1:ee:86:72, ethertype IPv6 (0x86dd), length 628: 2620:52:0:11c::22.33738 > 2620:52:0:11c::20.6443: Flags [P.], seq 0:542, ack 1, win 210, options [nop,nop,TS val 3477795712 ecr 1716754518], length 542
16:53:02.646068 0c:42:a1:ee:86:72 > 0c:42:a1:ee:7f:12, ethertype IPv6 (0x86dd), length 628: 2620:52:0:11c::22.33738 > 2620:52:0:11c::20.6443: Flags [P.], seq 0:542, ack 1, win 210, options [nop,nop,TS val 3477795712 ecr 1716754518], length 542
16:53:02.646251 0c:42:a1:ee:7e:12 > 0c:42:a1:ee:86:72, ethertype IPv6 (0x86dd), length 86: 2620:52:0:11c::22.37722 > 2620:52:0:11c::21.6443: Flags [.], ack 9462, win 9734, options [nop,nop,TS val 4109381112 ecr 3796529120], length 0
16:53:02.646259 0c:42:a1:ee:86:72 > 0c:42:a1:ee:7e:1a, ethertype IPv6 (0x86dd), length 86: 2620:52:0:11c::22.37722 > 2620:52:0:11c::21.6443: Flags [.], ack 9462, win 9734, options [nop,nop,TS val 4109381112 ecr 3796529120], length 0
16:53:02.646326 0c:42:a1:ee:7e:12 > 0c:42:a1:ee:86:72, ethertype IPv6 (0x86dd), length 86: 2620:52:0:11c::22.33738 > 2620:52:0:11c::20.6443: Flags [.], ack 410, win 218, options [nop,nop,TS val 3477795712 ecr 1716754518], length 0
16:53:02.646333 0c:42:a1:ee:86:72 > 0c:42:a1:ee:7f:12, ethertype IPv6 (0x86dd), length 86: 2620:52:0:11c::22.33738 > 2620:52:0:11c::20.6443: Flags [.], ack 410, win 218, options [nop,nop,TS val 3477795712 ecr 1716754518], length 0
16:53:02.646573 0c:42:a1:ee:7e:12 > 0c:42:a1:ee:86:72, ethertype IPv6 (0x86dd), length 1752: 2620:52:0:11c::22.33738 > 2620:52:0:11c::20.6443: Flags [P.], seq 542:2208, ack 410, win 218, options [nop,nop,TS val 3477795712 ecr 1716754518], length 1666
16:53:02.646593 0c:42:a1:ee:86:72 > 0c:42:a1:ee:7f:12, ethertype IPv6 (0x86dd), length 1752: 2620:52:0:11c::22.33738 > 2620:52:0:11c::20.6443: Flags [P.], seq 542:2208, ack 410, win 218, options [nop,nop,TS val 3477795712 ecr 1716754518], length 1666
16:53:02.647058 0c:42:a1:ee:7e:12 > 0c:42:a1:ee:86:72, ethertype IPv6 (0x86dd), length 110: 2620:52:0:11c::22.33738 > 2620:52:0:11c::20.6443: Flags [P.], seq 2208:2232, ack 877, win 227, options [nop,nop,TS val 3477795713 ecr 1716754519], length 24
16:53:02.647078 0c:42:a1:ee:86:72 > 0c:42:a1:ee:7f:12, ethertype IPv6 (0x86dd), length 110: 2620:52:0:11c::22.33738 > 2620:52:0:11c::20.6443: Flags [P.], seq 2208:2232, ack 877, win 227, options [nop,nop,TS val 3477795713 ecr 1716754519], length 24
16:53:02.647082 0c:42:a1:ee:7e:12 > 0c:42:a1:ee:86:72, ethertype IPv6 (0x86dd), length 86: 2620:52:0:11c::22.33738 > 2620:52:0:11c::20.6443: Flags [F.], seq 2232, ack 877, win 227, options [nop,nop,TS val 3477795713 ecr 1716754519], length 0
16:53:02.647102 0c:42:a1:ee:86:72 > 0c:42:a1:ee:7f:12, ethertype IPv6 (0x86dd), length 86: 2620:52:0:11c::22.33738 > 2620:52:0:11c::20.6443: Flags [F.], seq 2232, ack 877, win 227, options [nop,nop,TS val 3477795713 ecr 1716754519], length 0
16:53:02.647110 0c:42:a1:ee:7e:12 > 0c:42:a1:ee:86:72, ethertype IPv6 (0x86dd), length 86: 2620:52:0:11c::22.33738 > 2620:52:0:11c::20.6443: Flags [R.], seq 2233, ack 877, win 227, options [nop,nop,TS val 3477795713 ecr 1716754519], length

Router interface:

[root kni]$ ip a s dev baremetal
374: baremetal: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9100 qdisc noqueue state UP group default qlen 1000
    link/ether 0c:42:a1:ee:86:72 brd ff:ff:ff:ff:ff:ff
    inet 10.1.28.1/24 brd 10.1.28.255 scope global noprefixroute baremetal
       valid_lft forever preferred_lft forever
    inet6 2620:52:0:11c::1/64 scope global noprefixroute 
       valid_lft forever preferred_lft forever
    inet6 fe80::4315:47f1:33d8:3815/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever


master-0 interface and routing table:

[core@openshift-master-0 ~]$ ip a s dev br-ex
133: br-ex: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether 0c:42:a1:ee:7f:12 brd ff:ff:ff:ff:ff:ff
    inet6 2620:52:0:11c::20/128 scope global dynamic noprefixroute 
       valid_lft 3350sec preferred_lft 3350sec
    inet6 fe80::4600:7018:f0ae:c7f5/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever

[core@openshift-master-0 ~]$ ip -6 r 
::1 dev lo proto kernel metric 256 pref medium
2620:52:0:11c::20 dev br-ex proto kernel metric 100 pref medium
2620:52:0:11c::/64 dev br-ex proto ra metric 100 pref medium
fd01:0:0:1::/64 dev ovn-k8s-mp0 proto kernel metric 256 pref medium
fd01::/48 via fd01:0:0:1::1 dev ovn-k8s-mp0 metric 1024 pref medium
fd02::/112 via fe80::4315:47f1:33d8:3815 dev br-ex metric 1024 mtu 1400 pref medium
fe80::/64 dev br-ex proto kernel metric 100 pref medium
fe80::/64 dev genev_sys_6081 proto kernel metric 256 pref medium
default via fe80::4315:47f1:33d8:3815 dev br-ex proto ra metric 100 pref medium

master-1 interface and routing table:

[core@openshift-master-1 ~]$ ip a s dev br-ex
7: br-ex: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether 0c:42:a1:ee:7e:1a brd ff:ff:ff:ff:ff:ff
    inet6 2620:52:0:11c::21/128 scope global dynamic noprefixroute 
       valid_lft 3083sec preferred_lft 3083sec
    inet6 fe80::922b:8255:e760:33a1/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever

[core@openshift-master-1 ~]$ ip -6 r
::1 dev lo proto kernel metric 256 pref medium
2620:52:0:11c::11 dev br-ex proto kernel metric 256 pref medium
2620:52:0:11c::21 dev br-ex proto kernel metric 100 pref medium
2620:52:0:11c::/64 dev br-ex proto ra metric 100 pref medium
fd01:0:0:3::/64 dev ovn-k8s-mp0 proto kernel metric 256 pref medium
fd01::/48 via fd01:0:0:3::1 dev ovn-k8s-mp0 metric 1024 pref medium
fd02::/112 via fe80::4315:47f1:33d8:3815 dev br-ex metric 1024 mtu 1400 pref medium
fe80::/64 dev br-ex proto kernel metric 100 pref medium
default via fe80::4315:47f1:33d8:3815 dev br-ex proto ra metric 100 pref medium

master-2 interface and routing table:

[core@openshift-master-2 ~]$ ip a s dev br-ex
7: br-ex: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether 0c:42:a1:ee:7e:12 brd ff:ff:ff:ff:ff:ff
    inet6 2620:52:0:11c::11/128 scope global nodad deprecated noprefixroute 
       valid_lft forever preferred_lft 0sec
    inet6 2620:52:0:11c::10/128 scope global nodad deprecated noprefixroute 
       valid_lft forever preferred_lft 0sec
    inet6 2620:52:0:11c::22/128 scope global dynamic noprefixroute 
       valid_lft 2364sec preferred_lft 2364sec
    inet6 fe80::3def:8ee0:1d96:d1e7/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever

[core@openshift-master-2 ~]$ ip -6 r
::1 dev lo proto kernel metric 256 pref medium
2620:52:0:11c::10 dev br-ex proto kernel metric 256 pref medium
2620:52:0:11c::11 dev br-ex proto kernel metric 256 pref medium
2620:52:0:11c::22 dev br-ex proto kernel metric 100 pref medium
2620:52:0:11c::/64 dev br-ex proto ra metric 100 pref medium
fd01:0:0:2::/64 dev ovn-k8s-mp0 proto kernel metric 256 pref medium
fd01::/48 via fd01:0:0:2::1 dev ovn-k8s-mp0 metric 1024 pref medium
fd02::/112 via fe80::4315:47f1:33d8:3815 dev br-ex metric 1024 mtu 1400 pref medium
fe80::/64 dev br-ex proto kernel metric 100 pref medium
default via fe80::4315:47f1:33d8:3815 dev br-ex proto ra metric 100 pref medium


Actual results:

There is a lot of traffic between the master nodes reaching the network router which is external to the cluster.

Expected results:

Traffic between the master nodes should stay local to the subnet and not reach the router.

Additional info:

I tried removing on one of the nodes the interface from br-ex bridge and assigned the address to the interface outside the bridge and the router stopped receiving packets with source of this node and destination of other masters. The problem resumed after the ovnkube-node pod running on this node recovers so I suspect this may be a problem with OVN flows.

Comment 1 Marius Cornea 2021-07-07 21:53:46 UTC
To validate I tried blocking one the master nodes in address the forwarding chain on the router:

ip6tables -I FORWARD -s 2620:52:0:11c::20 -j DROP

and as a result authentication operator got degraded:


I0707 21:47:51.882426       1 request.go:668] Waited for 1.194851493s due to client-side throttling, not priority and fairness, request: GET:https://[fd02::1]:443/api/v1/namespaces/openshift-oauth-apiserver/services/api
I0707 21:47:53.150944       1 status_controller.go:211] clusteroperator/authentication diff {"status":{"conditions":[{"lastTransitionTime":"2021-07-07T21:45:26Z","message":"All is well","reason":"AsExpected","status":"False","type":"Degraded"},{"lastTransitionTime":"2021-07-07T21:10:41Z","message":"AuthenticatorCertKeyProgressing: All is well","reason":"AsExpected","status":"False","type":"Progressing"},{"lastTransitionTime":"2021-07-07T21:47:53Z","message":"WellKnownAvailable: The well-known endpoint is not yet available: failed to GET kube-apiserver oauth endpoint https://[2620:52:0:11c::20]:6443/.well-known/oauth-authorization-server: dial tcp [2620:52:0:11c::20]:6443: i/o timeout","reason":"WellKnown_NotReady","status":"False","type":"Available"},{"lastTransitionTime":"2021-07-05T09:21:37Z","message":"All is well","reason":"AsExpected","status":"True","type":"Upgradeable"}]}}
I0707 21:47:53.155857       1 event.go:282] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-authentication-operator", Name:"authentication-operator", UID:"f7c40d64-aa1e-4560-920f-16c361819931", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'OperatorStatusChanged' Status for clusteroperator/authentication changed: Available changed from True to False ("WellKnownAvailable: The well-known endpoint is not yet available: failed to GET kube-apiserver oauth endpoint https://[2620:52:0:11c::20]:6443/.well-known/oauth-authorization-server: dial tcp [2620:52:0:11c::20]:6443: i/o timeout")
E0707 21:47:53.173068       1 base_controller.go:266] WellKnownReadyController reconciliation failed: failed to GET kube-apiserver oauth endpoint https://[2620:52:0:11c::20]:6443/.well-known/oauth-authorization-server: dial tcp [2620:52:0:11c::20]:6443: i/o timeout
I0707 21:47:53.173877       1 status_controller.go:211] clusteroperator/authentication diff {"status":{"conditions":[{"lastTransitionTime":"2021-07-07T21:45:26Z","message":"WellKnownReadyControllerDegraded: failed to GET kube-apiserver oauth endpoint https://[2620:52:0:11c::20]:6443/.well-known/oauth-authorization-server: dial tcp [2620:52:0:11c::20]:6443: i/o timeout","reason":"AsExpected","status":"False","type":"Degraded"},{"lastTransitionTime":"2021-07-07T21:10:41Z","message":"AuthenticatorCertKeyProgressing: All is well","reason":"AsExpected","status":"False","type":"Progressing"},{"lastTransitionTime":"2021-07-07T21:47:53Z","message":"WellKnownAvailable: The well-known endpoint is not yet available: failed to GET kube-apiserver oauth endpoint https://[2620:52:0:11c::20]:6443/.well-known/oauth-authorization-server: dial tcp [2620:52:0:11c::20]:6443: i/o timeout","reason":"WellKnown_NotReady","status":"False","type":"Available"},{"lastTransitionTime":"2021-07-05T09:21:37Z","message":"All is well","reason":"AsExpected","status":"True","type":"Upgradeable"}]}}
I0707 21:47:53.178825       1 event.go:282] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-authentication-operator", Name:"authentication-operator", UID:"f7c40d64-aa1e-4560-920f-16c361819931", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'OperatorStatusChanged' Status for clusteroperator/authentication changed: Degraded message changed from "All is well" to "WellKnownReadyControllerDegraded: failed to GET kube-apiserver oauth endpoint https://[2620:52:0:11c::20]:6443/.well-known/oauth-authorization-server: dial tcp [2620:52:0:11c::20]:6443: i/o timeout"

Comment 6 Jaime Caamaño Ruiz 2021-07-09 17:15:20 UTC
This is a consequence of using stateful DHCPv6 IA_NA address allocation for the host interfaces which assigns to them a /128 prefix. OVN kubernetes uses these same addresses to configure the internal gateway routers and as they not provide any subnet information pod to node traffic is routed through the hosts gateway instead. Unfortunately OVN does not support RA so it is not aware of other routing information.

And alternative configuration that should work is to use static or SLAAC instead of stateful DHCPv6. In that way, the host interface address would have or would acquire the link local prefix shared through RA and OVN kubernetes would be aware of it.

If stateful DHCPv6 IA_NA allocation is required and traversing the gateway is not acceptable, then we might have to add support to pass on the CNO machineNetwork configuration to OVN kubernetes so that we can add it as static route on the internal gateway routers.

Comment 10 Yolanda Robla 2021-07-16 06:33:19 UTC
As a reference, please see https://bugzilla.redhat.com/show_bug.cgi?id=1973704 ... it's the same problem but with an ipv6 dhcp that comes from an IT router (Juniper). Juniper sends DHCP addresses with mask /128 and /64... ovn is taking the /128 mask and creates routes with this /128 mask instead of /64 one. As a consequence, the nodes cannot communicate between each other, and the deployment fails.

Comment 12 Dan Winship 2021-07-30 13:58:15 UTC
(In reply to Jaime Caamaño Ruiz from comment #6)
> If stateful DHCPv6 IA_NA allocation is required and traversing the gateway
> is not acceptable, then we might have to add support to pass on the CNO
> machineNetwork configuration to OVN kubernetes so that we can add it as
> static route on the internal gateway routers.

This won't work because the machineNetwork isn't guaranteed to be a single subnet.

But anyway, https://github.com/ovn-org/ovn-kubernetes/pull/2338 looks like the right fix to me

Comment 13 Dan Winship 2021-07-30 13:58:54 UTC
*** Bug 1973704 has been marked as a duplicate of this bug. ***

Comment 15 Marius Cornea 2021-08-26 10:53:50 UTC
[kni@sealusa2 ~]$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.9.0-0.nightly-2021-08-26-013855   True        False         105m    Cluster version is 4.9.0-0.nightly-2021-08-26-013855


[kni@sealusa2 ~]$ oc -n  openshift-ovn-kubernetes exec -it ovnkube-master-85nhm  -c ovnkube-master -- ovn-nbctl find  Logical_Router_Port | grep -A1 rtoe-GR
name                : rtoe-GR_worker-0-0.ocp-edge-cluster-0.qe.lab.redhat.com
networks            : ["fd2e:6f44:5dd8::47/64"]
--
  name                : rtoe-GR_master-0-0.ocp-edge-cluster-0.qe.lab.redhat.com
networks            : ["fd2e:6f44:5dd8::58/64"]
--
  name                : rtoe-GR_worker-0-1.ocp-edge-cluster-0.qe.lab.redhat.com
networks            : ["fd2e:6f44:5dd8::34/64"]
--
  name                : rtoe-GR_master-0-1.ocp-edge-cluster-0.qe.lab.redhat.com
networks            : ["fd2e:6f44:5dd8::5b/64"]
--
  name                : rtoe-GR_master-0-2.ocp-edge-cluster-0.qe.lab.redhat.com
networks            : ["fd2e:6f44:5dd8::8f/64"]


ip link show baremetal-0
70: baremetal-0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 52:54:00:1e:8e:e3 brd ff:ff:ff:ff:ff:ff


[kni@sealusa2 ~]$ sudo tcpdump -i baremetal-0 -ennn ether host 52:54:00:1e:8e:e3 and tcp port 6443
dropped privs to tcpdump
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on baremetal-0, link-type EN10MB (Ethernet), capture size 262144 bytes

Comment 18 errata-xmlrpc 2021-10-18 17:38:24 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3759