Red Hat Bugzilla – Bug 1527602
egress-router pod does not reply to ARP requests if coming from members of a VIP
Last modified: 2018-03-28 10:16:58 EDT
Description of problem: If the egress router targets an IP behind a gateway (EGRESS_GATEWAY), and if that gateway is provided by a VIP with multiple members, after a while egress router traffic will be blocked at that gateway. The egress router only replies to ARP requests coming from the EGRESS_GATEWAY, but in some cases these ARP requests come from the IPs of the members of the cluster managing this gateway. As the router does not reply to these requests, traffic works at first because an arping is sent when the pod is started, but when the arp table timeout is reached on the cluster members' table, traffic is blocked at the gateway. Additional info: Two workarounds: * adding a route to each member of the VIP cluster after identifying then with tcpdump * adding a sidecar container to launch an arping periodically like this: ~~~ - env: - name: EGRESS_SOURCE value: <= add the same IP as in the egress router - name: TZ value: CET-1CEST name: arping image: library/busybox command: ["sh", "-c", "while true; do echo \"$(date): BROADCASTING ARP REQUEST\"; arping -q -U -c 5 -I macvlan0 ${EGRESS_SOURCE}; sleep 200; done"] ~~~
I guess the problem is more specifically that the egress-router can't talk to other hosts on its local subnet besides the gateway, because it doesn't have a route to its local subnet; it only has a route to ${EGRESS_GATEWAY}/32 and a route to default (via ${EGRESS_GATEWAY}). This could be fixed by letting you specify EGRESS_SOURCE as a CIDR rather than an IP. Eg, "EGRESS_SOURCE=192.168.1.10/24" would tell it to set up a route to all of 192.168.1.0/24 rather than just a /32 to the EGRESS_GATEWAY. Note also that you can work around this yourself by just building an alternate version of the egress-router image (https://github.com/openshift/origin/tree/master/images/egress/router) and then running that image rather than the stock egress-router.
https://github.com/openshift/origin/pull/18143
Merged. Note that the openshift/origin-egress-router:latest image has already been updated so the customer can try this out by using that image in their egress router pod specs (rather than "registry.access.redhat.com/openshift3/ose-egress-router"). The new egress-router image allows you to specify an EGRESS_SOURCE value like "192.168.1.10/24" rather than just "192.168.1.10".
@Dan >I have tested with the old egress router image. The egress router pod can also ping the ips in the same subnet, just being redirected once. sh-4.2# ping 10.66.141.175 PING 10.66.141.175 (10.66.141.175) 56(84) bytes of data. 64 bytes from 10.66.141.175: icmp_seq=1 ttl=64 time=0.430 ms From 10.66.141.254 icmp_seq=1 Redirect Host(New nexthop: 10.66.141.175) From 10.66.141.254: icmp_seq=1 Redirect Host(New nexthop: 10.66.141.175) 64 bytes from 10.66.141.175: icmp_seq=2 ttl=64 time=0.516 ms From 10.66.141.254 icmp_seq=2 Redirect Host(New nexthop: 10.66.141.175) From 10.66.141.254: icmp_seq=2 Redirect Host(New nexthop: 10.66.141.175) 64 bytes from 10.66.141.175: icmp_seq=3 ttl=64 time=0.436 ms From 10.66.141.254 icmp_seq=3 Redirect Host(New nexthop: 10.66.141.175) From 10.66.141.254: icmp_seq=3 Redirect Host(New nexthop: 10.66.141.175) 64 bytes from 10.66.141.175: icmp_seq=4 ttl=64 time=0.465 ms >And from the target machine, it can receive the icmp request from the egress router, $ sudo tcpdump -nnv -i enp0s25 icmp tcpdump: listening on enp0s25, link-type EN10MB (Ethernet), capture size 262144 bytes 17:29:23.225624 IP (tos 0x0, ttl 63, id 23425, offset 0, flags [DF], proto ICMP (1), length 84) 10.66.140.100 > 10.66.141.175: ICMP echo request, id 27, seq 1, length 64 17:29:23.225716 IP (tos 0x0, ttl 64, id 64830, offset 0, flags [none], proto ICMP (1), length 84) 10.66.141.175 > 10.66.140.100: ICMP echo reply, id 27, seq 1, length 64 17:29:24.225777 IP (tos 0x0, ttl 63, id 23884, offset 0, flags [DF], proto ICMP (1), length 84) 10.66.140.100 > 10.66.141.175: ICMP echo request, id 27, seq 2, length 64 17:29:24.225860 IP (tos 0x0, ttl 64, id 59, offset 0, flags [none], proto ICMP (1), length 84) 10.66.141.175 > 10.66.140.100: ICMP echo reply, id 27, seq 2, length 64 17:29:25.226975 IP (tos 0x0, ttl 63, id 24640, offset 0, flags [DF], proto ICMP (1), length 84) 10.66.140.100 > 10.66.141.175: ICMP echo request, id 27, seq 3, length 64 >The routing table in the egress router: sh-4.2# ip route default via 10.66.141.254 dev macvlan0 10.66.141.254 dev macvlan0 scope link 10.128.0.0/14 dev eth0 10.128.2.0/24 dev eth0 proto kernel scope link src 10.128.2.167 224.0.0.0/4 dev eth0 >I did not see much difference with the new CIDR format EGRESS_SOURCE, just the ping to destination host will not be redirected. Like: sh-4.2# ping 10.66.141.175 PING 10.66.141.175 (10.66.141.175) 56(84) bytes of data. 64 bytes from 10.66.141.175: icmp_seq=1 ttl=64 time=0.721 ms 64 bytes from 10.66.141.175: icmp_seq=2 ttl=64 time=0.347 ms 64 bytes from 10.66.141.175: icmp_seq=3 ttl=64 time=0.520 ms 64 bytes from 10.66.141.175: icmp_seq=4 ttl=64 time=0.470 ms >Can you help check if this is the fix of the problem? Thanks.
@Meng Before that change, only the egress router's destination IP would receive answers to arping requests targeting the egress router's IP. After that change and when the /24 CIDR is used, all IPs in that subnet should receive answers to arping requests targeting the egress router's IP. You can test from an IP within the /24 subnet of the egress router - which is not the destination IP - like this: FAIL: # arping -I <interface> -c1 -w1 -f 192.168.3.254 ARPING 192.168.3.254 from 192.168.3.20 enp0s31f6 Sent 2 probes (2 broadcast(s)) Received 0 response(s) SUCCESS: [root@dan ~]# arping -I <interface> -c1 -w1 -f 192.168.3.254 ARPING 192.168.3.254 from 192.168.3.20 enp0s31f6 Unicast reply from 192.168.3.254 [52:54:00:6D:93:12] 0.715ms Sent 1 probes (1 broadcast(s)) Received 1 response(s) @Dan please review :)
Huh. I guess the exact behavior depends on the router; some routers may be willing to do a redirect, and others might not. In any case, the results you got do demonstrate that the new egress-router is changing the pod's behavior in the expected way: you get a redirect when using the IP-only form of EGRESS_SOURCE, but not when you use the CIDR form of EGRESS_SOURCE, so that shows that when using the CIDR form, the pod is contacting local IPs directly rather than sending everything to the router.
As the request can be sent out to the subnet directly without redirecting by gateway when setting the EGRESS_SOURCE with CIDR range specified. Move the bug to verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:0489