Bug 1959798 - DNAT rules for external IP services wrong in ovn-kubernetes
Summary: DNAT rules for external IP services wrong in ovn-kubernetes
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.6
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.9.0
Assignee: Andrew Stoycos
QA Contact: Weibin Liang
URL:
Whiteboard:
Depends On:
Blocks: 1955192 1988487
TreeView+ depends on / blocked
 
Reported: 2021-05-12 11:58 UTC by Pablo Alonso Rodriguez
Modified: 2022-06-16 13:44 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
: 1988487 (view as bug list)
Environment:
Last Closed: 2021-10-18 17:31:04 UTC
Target Upstream Version:
Embargoed:
astoycos: needinfo-


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift ovn-kubernetes pull 609 0 None closed Merge 2021-07-18 2021-07-29 13:29:42 UTC
Red Hat Knowledge Base (Solution) 6039191 0 None None None 2021-05-12 15:33:55 UTC
Red Hat Product Errata RHSA-2021:3759 0 None None None 2021-10-18 17:31:56 UTC

Description Pablo Alonso Rodriguez 2021-05-12 11:58:53 UTC
Description of problem:

DNAT rules to OVN-KUBE-EXTERNALIP iptables chain are not added for secondary node IPs that were not present in br-ex interface at ovnkube-node-XXXXX pod startup. This makes necessary to delete the pod so that the rules are added properly.

Version-Release number of selected component (if applicable):

4.6.21
4.7.9

How reproducible:

Always as long as the external IP was not added to node interface at the moment ovnkube-node-XXXXX pod started up.

Steps to Reproduce:
1. Add an IP to br-ex in one of the nodes, like `ip addr add 192.168.194.250/24 dev br-ex`
2. Create an external IP service
3. 

Actual results:

External IP not reachable. Correct rules on OVN-KUBE-EXTERNALIP iptables chain not added.

Expected results:

External IP reachable. Correct rules on OVN-KUBE-EXTERNALIP iptables chain added.

Additional info:

Reproduced on both 4.6.21 and latest 4.7

Comment 1 Andrew Stoycos 2021-05-18 14:52:31 UTC
Hi Pablo, 

I was able to reproduce and I'm working on an upstream patch that should recalculate the in memory list of node IPs for each service event (add/update/delete).  Now if you follow the above steps 

1. Add an IP to br-ex in one of the nodes, like `ip addr add 192.168.194.250/24 dev br-ex`
2. Create an external IP service with that IP 

The correct rules should be calculated without having to restart the ovnkube-node pod. 

I will link the upstream patch when it's complete. 

Thanks, 
Andrew

Comment 2 Pablo Alonso Rodriguez 2021-05-19 08:30:01 UTC
Thanks!

Comment 3 Andrew Stoycos 2021-05-20 14:24:51 UTC
Upstream PR -> https://github.com/ovn-org/ovn-kubernetes/pull/2226 

Once that merges we will pull to downstream and backport accordingly

Comment 13 Weibin Liang 2021-08-09 17:33:49 UTC
Tested and verified in 4.9.0-0.nightly-2021-08-07-175228: without restarting the ovnkube-node pod, the correct externalIP svc rule get updated for node secondary interface.

[root@dell-per740-36 ~]# oc debug node/dell-per740-14.rhts.eng.pek2.redhat.com
sh-4.4# ip a show br-ex
13: br-ex: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether e4:43:4b:5b:6c:28 brd ff:ff:ff:ff:ff:ff
    inet 10.73.116.62/23 brd 10.73.117.255 scope global dynamic noprefixroute br-ex
       valid_lft 32748sec preferred_lft 32748sec
    inet6 2620:52:0:4974:d94e:e1d5:fcfc:fdc7/64 scope global dynamic noprefixroute 
       valid_lft 2591925sec preferred_lft 604725sec
    inet6 fe80::c13e:c5ff:e5d3:8193/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever
sh-4.4# iptables -n -v -t nat -L OVN-KUBE-EXTERNALIP
Chain OVN-KUBE-EXTERNALIP (2 references)
 pkts bytes target     prot opt in     out     source               destination         
sh-4.4# ip addr add 10.73.116.64/23 dev br-ex
sh-4.4# ip a s br-ex 
13: br-ex: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether e4:43:4b:5b:6c:28 brd ff:ff:ff:ff:ff:ff
    inet 10.73.116.62/23 brd 10.73.117.255 scope global dynamic noprefixroute br-ex
       valid_lft 32419sec preferred_lft 32419sec
    inet 10.73.116.64/23 scope global secondary br-ex
       valid_lft forever preferred_lft forever
    inet6 2620:52:0:4974:d94e:e1d5:fcfc:fdc7/64 scope global dynamic noprefixroute 
       valid_lft 2591973sec preferred_lft 604773sec
    inet6 fe80::c13e:c5ff:e5d3:8193/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever
sh-4.4# 
[root@dell-per740-36 ~]# curl -s https://raw.githubusercontent.com/weliang1/Openshift_Networking/master/Features/FC/externalip-svc.yaml | sed s/10.0.76.163/10.73.116.64/g | oc create -f -
service/externalip-svc created
[root@dell-per740-36 ~]# oc create -f https://raw.githubusercontent.com/weliang1/Openshift_Networking/master/Features/FC/externalip-pod.yaml
deployment.apps/externalip-pod created
[root@dell-per740-36 ~]# oc rsh externalip-pod-57f9dd7cfb-967pw
error: unable to upgrade connection: container not found ("externalip-pod")
[root@dell-per740-36 ~]# oc rsh externalip-pod-57f9dd7cfb-967pw
~ $ curl 10.73.116.64:27018
Customer-Blue Test ExternalIP
[root@dell-per740-36 ~]# oc project openshift-ingress
Now using project "openshift-ingress" on server "https://api.bm2-zzhao.qe.devcluster.openshift.com:6443".
[root@dell-per740-36 ~]# oc get all
NAME                                  READY   STATUS    RESTARTS   AGE
pod/router-default-696c499cdf-85w9g   1/1     Running   0          3h26m
pod/router-default-696c499cdf-jtfgp   1/1     Running   0          3h26m

NAME                              TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)                   AGE
service/router-internal-default   ClusterIP   172.30.92.96   <none>        80/TCP,443/TCP,1936/TCP   3h26m

NAME                             READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/router-default   2/2     2            2           3h26m

NAME                                        DESIRED   CURRENT   READY   AGE
replicaset.apps/router-default-696c499cdf   2         2         2       3h26m
[root@dell-per740-36 ~]# oc rsh router-default-696c499cdf-85w9g
sh-4.4$ curl 10.73.116.64:27018
Customer-Blue Test ExternalIP
[root@dell-per740-36 ~]# curl 10.73.116.64:27018
Customer-Blue Test ExternalIP
[root@dell-per740-36 ~]# oc debug node/dell-per740-14.rhts.eng.pek2.redhat.com
Starting pod/dell-per740-14rhtsengpek2redhatcom-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.73.116.62
If you don't see a command prompt, try pressing enter.
sh-4.4# chroot /host
sh-4.4# iptables -n -v -t nat -L OVN-KUBE-EXTERNALIP
Chain OVN-KUBE-EXTERNALIP (2 references)
 pkts bytes target     prot opt in     out     source               destination         
    0     0 DNAT       tcp  --  *      *       0.0.0.0/0            10.73.116.64         tcp dpt:27018 to:172.30.33.188:27018
sh-4.4# 
[root@dell-per740-36 ~]# oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.9.0-0.nightly-2021-08-07-175228   True        False         173m    Cluster version is 4.9.0-0.nightly-2021-08-07-175228
[root@dell-per740-36 ~]# 
[root@dell-per740-36 ~]# ping -c 2 10.73.116.63
PING 10.73.116.63 (10.73.116.63) 56(84) bytes of data.
64 bytes from 10.73.116.63: icmp_seq=1 ttl=64 time=0.572 ms
64 bytes from 10.73.116.63: icmp_seq=2 ttl=64 time=0.525 ms

--- 10.73.116.63 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1003ms
rtt min/avg/max/mdev = 0.525/0.548/0.572/0.033 ms
[root@dell-per740-36 ~]#

Comment 16 errata-xmlrpc 2021-10-18 17:31:04 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3759


Note You need to log in before you can comment on or make changes to this bug.