Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1527602

Summary:	egress-router pod does not reply to ARP requests if coming from members of a VIP
Product:	OpenShift Container Platform	Reporter:	François Cami <fcami>
Component:	Networking	Assignee:	Dan Winship <danw>
Status:	CLOSED ERRATA	QA Contact:	Meng Bo <bmeng>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	3.6.0	CC:	aos-bugs, atragler, bbennett, danw, jokerman, mmccomas
Target Milestone:	---
Target Release:	3.9.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Enhancement
Doc Text:	Feature: It is now possible to specify a subnet length as part of the EGRESS_SOURCE variable passed to an egress router. (eg, "192.168.1.100/24" rather than "192.168.1.100") Reason: In some network configurations (such as if the gateway address was a virtual IP that might be backed by one of several physical IPs at different times), ARP traffic between the egress router and its gateway might not function correctly if the egress router isn't able to send traffic to other hosts on its local subnet. Result: By specifying EGRESS_SOURCE with a subnet length, the egress router setup script will configure the egress pod in a way that will work with these network setups.	Story Points:	---
Clone Of:		Environment:
Last Closed:	2018-03-28 14:16:17 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description François Cami 2017-12-19 15:49:38 UTC

Description of problem:
If the egress router targets an IP behind a gateway (EGRESS_GATEWAY), and if that gateway is provided by a VIP with multiple members, after a while egress router traffic will be blocked at that gateway.

The egress router only replies to ARP requests coming from the EGRESS_GATEWAY, but in some cases these ARP requests come from the IPs of the members of the cluster managing this gateway.
As the router does not reply to these requests, traffic works at first because an arping is sent when the pod is started, but when the arp table timeout is reached on the cluster members' table, traffic is blocked at the gateway.


Additional info:
Two workarounds:
* adding a route to each member of the VIP cluster after identifying then with tcpdump
* adding a sidecar container to launch an arping periodically like this:

~~~
      - env:
        - name: EGRESS_SOURCE
          value:                    <= add the same IP as in the egress router
        - name: TZ
          value: CET-1CEST
        name: arping 
        image: library/busybox
        command: ["sh", "-c", "while true; do echo \"$(date): BROADCASTING ARP REQUEST\"; arping -q -U -c 5 -I macvlan0 ${EGRESS_SOURCE}; sleep 200; done"]
~~~

Comment 2 Dan Winship 2018-01-05 17:21:00 UTC

I guess the problem is more specifically that the egress-router can't talk to other hosts on its local subnet besides the gateway, because it doesn't have a route to its local subnet; it only has a route to ${EGRESS_GATEWAY}/32 and a route to default (via ${EGRESS_GATEWAY}).

This could be fixed by letting you specify EGRESS_SOURCE as a CIDR rather than an IP. Eg, "EGRESS_SOURCE=192.168.1.10/24" would tell it to set up a route to all of 192.168.1.0/24 rather than just a /32 to the EGRESS_GATEWAY.


Note also that you can work around this yourself by just building an alternate version of the egress-router image (https://github.com/openshift/origin/tree/master/images/egress/router) and then running that image rather than the stock egress-router.

Comment 4 Dan Winship 2018-01-18 13:53:33 UTC

https://github.com/openshift/origin/pull/18143

Comment 5 Dan Winship 2018-01-19 20:23:29 UTC

Merged.

Note that the openshift/origin-egress-router:latest image has already been updated so the customer can try this out by using that image in their egress router pod specs (rather than "registry.access.redhat.com/openshift3/ose-egress-router"). The new egress-router image allows you to specify an EGRESS_SOURCE value like "192.168.1.10/24" rather than just "192.168.1.10".

Comment 7 Meng Bo 2018-02-07 09:50:24 UTC

@Dan

>I have tested with the old egress router image.
The egress router pod can also ping the ips in the same subnet, just being redirected once.

sh-4.2# ping 10.66.141.175
PING 10.66.141.175 (10.66.141.175) 56(84) bytes of data.
64 bytes from 10.66.141.175: icmp_seq=1 ttl=64 time=0.430 ms
From 10.66.141.254 icmp_seq=1 Redirect Host(New nexthop: 10.66.141.175)
From 10.66.141.254: icmp_seq=1 Redirect Host(New nexthop: 10.66.141.175)
64 bytes from 10.66.141.175: icmp_seq=2 ttl=64 time=0.516 ms
From 10.66.141.254 icmp_seq=2 Redirect Host(New nexthop: 10.66.141.175)
From 10.66.141.254: icmp_seq=2 Redirect Host(New nexthop: 10.66.141.175)
64 bytes from 10.66.141.175: icmp_seq=3 ttl=64 time=0.436 ms
From 10.66.141.254 icmp_seq=3 Redirect Host(New nexthop: 10.66.141.175)
From 10.66.141.254: icmp_seq=3 Redirect Host(New nexthop: 10.66.141.175)
64 bytes from 10.66.141.175: icmp_seq=4 ttl=64 time=0.465 ms


>And from the target machine, it can receive the icmp request from the egress router,
$ sudo tcpdump -nnv -i enp0s25 icmp
tcpdump: listening on enp0s25, link-type EN10MB (Ethernet), capture size 262144 bytes
17:29:23.225624 IP (tos 0x0, ttl 63, id 23425, offset 0, flags [DF], proto ICMP (1), length 84)
    10.66.140.100 > 10.66.141.175: ICMP echo request, id 27, seq 1, length 64
17:29:23.225716 IP (tos 0x0, ttl 64, id 64830, offset 0, flags [none], proto ICMP (1), length 84)
    10.66.141.175 > 10.66.140.100: ICMP echo reply, id 27, seq 1, length 64
17:29:24.225777 IP (tos 0x0, ttl 63, id 23884, offset 0, flags [DF], proto ICMP (1), length 84)
    10.66.140.100 > 10.66.141.175: ICMP echo request, id 27, seq 2, length 64
17:29:24.225860 IP (tos 0x0, ttl 64, id 59, offset 0, flags [none], proto ICMP (1), length 84)
    10.66.141.175 > 10.66.140.100: ICMP echo reply, id 27, seq 2, length 64
17:29:25.226975 IP (tos 0x0, ttl 63, id 24640, offset 0, flags [DF], proto ICMP (1), length 84)
    10.66.140.100 > 10.66.141.175: ICMP echo request, id 27, seq 3, length 64

>The routing table in the egress router:
sh-4.2# ip route
default via 10.66.141.254 dev macvlan0
10.66.141.254 dev macvlan0 scope link
10.128.0.0/14 dev eth0
10.128.2.0/24 dev eth0 proto kernel scope link src 10.128.2.167
224.0.0.0/4 dev eth0


>I did not see much difference with the new CIDR format EGRESS_SOURCE, just the ping to destination host will not be redirected. Like:
sh-4.2# ping 10.66.141.175
PING 10.66.141.175 (10.66.141.175) 56(84) bytes of data.
64 bytes from 10.66.141.175: icmp_seq=1 ttl=64 time=0.721 ms
64 bytes from 10.66.141.175: icmp_seq=2 ttl=64 time=0.347 ms
64 bytes from 10.66.141.175: icmp_seq=3 ttl=64 time=0.520 ms
64 bytes from 10.66.141.175: icmp_seq=4 ttl=64 time=0.470 ms


>Can you help check if this is the fix of the problem? Thanks.

Comment 8 François Cami 2018-02-07 09:59:46 UTC

@Meng 

Before that change, only the egress router's destination IP would receive answers to arping requests targeting the egress router's IP.

After that change and when the /24 CIDR is used, all IPs in that subnet should receive answers to arping requests targeting the egress router's IP.

You can test from an IP within the /24 subnet of the egress router - which is not the destination IP - like this:

FAIL:
# arping -I <interface> -c1 -w1 -f 192.168.3.254
ARPING 192.168.3.254 from 192.168.3.20 enp0s31f6
Sent 2 probes (2 broadcast(s))
Received 0 response(s)

SUCCESS:
[root@dan ~]# arping -I <interface> -c1 -w1 -f 192.168.3.254
ARPING 192.168.3.254 from 192.168.3.20 enp0s31f6
Unicast reply from 192.168.3.254 [52:54:00:6D:93:12]  0.715ms
Sent 1 probes (1 broadcast(s))
Received 1 response(s)

@Dan please review :)

Comment 9 Dan Winship 2018-02-07 14:33:13 UTC

Huh. I guess the exact behavior depends on the router; some routers may be willing to do a redirect, and others might not.

In any case, the results you got do demonstrate that the new egress-router is changing the pod's behavior in the expected way: you get a redirect when using the IP-only form of EGRESS_SOURCE, but not when you use the CIDR form of EGRESS_SOURCE, so that shows that when using the CIDR form, the pod is contacting local IPs directly rather than sending everything to the router.

Comment 10 Meng Bo 2018-02-13 03:31:16 UTC

As the request can be sent out to the subnet directly without redirecting by gateway when setting the EGRESS_SOURCE with CIDR range specified.

Move the bug to verified.

Comment 13 errata-xmlrpc 2018-03-28 14:16:17 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0489