Bug 1595291

Summary: [Backport 3.7] Egress Router HTTP Proxy cannot reach the node which router pod runs
Product: OpenShift Container Platform Reporter: Birol Bilgin <bbilgin>
Component: NetworkingAssignee: Dan Winship <danw>
Status: CLOSED ERRATA QA Contact: Meng Bo <bmeng>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 3.7.0CC: aos-bugs, bbennett, bmeng, cdc, dmoessne, pasik, zzhao
Target Milestone: ---   
Target Release: 3.7.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: The way that egress routers are set up made it impossible for an egress router pod to connect to the public IP address of the node it was hosted on. Consequence: If an egress pod was configured to use its node as a name server via /etc/resolv.conf, it would be unable to do DNS resolution. Fix: Traffic from an egress router pod to its node is now routed via the SDN tunnel instead of trying to send it via the egress interface. Result: Egress routers can now connect to their node's IP, and egress router DNS should always work, regardless of configuration.
Story Points: ---
Clone Of:
: 1698136 (view as bug list) Environment:
Last Closed: 2019-06-11 19:50:27 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
iptables_and_openflow none

Description Birol Bilgin 2018-06-26 14:00:54 UTC
Description of problem:

Backport request for https://bugzilla.redhat.com/show_bug.cgi?id=1552738 to 3.7

Comment 7 Dan Winship 2019-01-07 15:49:05 UTC
https://github.com/openshift/ose/pull/1484

Comment 9 Meng Bo 2019-04-01 07:47:31 UTC
Tested on ocp v3.7.108 and egress router image ose-egress-router:v3.7.108 d5b8e14f9ec6

The issue is not fixed.

The egress router pod cannot reach the host's eth0 IP and cannot reach local dnsmasq service.

The route of the egress router pod:
# ip r
default via 10.66.141.254 dev macvlan0 
10.66.140.97 via 10.128.0.1 dev eth0 
10.66.141.254 dev macvlan0 scope link 
10.128.0.0/23 dev eth0 proto kernel scope link src 10.128.0.152 
10.128.0.0/14 dev eth0 
172.30.0.0/16 via 10.128.0.1 dev eth0 
224.0.0.0/4 dev eth0

# ping 10.66.140.97
PING 10.66.140.97 (10.66.140.97) 56(84) bytes of data.
From 10.66.140.200 icmp_seq=1 Destination Host Unreachable
From 10.66.140.200 icmp_seq=2 Destination Host Unreachable
From 10.66.140.200 icmp_seq=3 Destination Host Unreachable
From 10.66.140.200 icmp_seq=4 Destination Host Unreachable

# iptables-save 
# Generated by iptables-save v1.4.21 on Mon Apr  1 15:45:53 2019
*filter
:INPUT ACCEPT [24:1937]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [20:1878]
COMMIT
# Completed on Mon Apr  1 15:45:53 2019
# Generated by iptables-save v1.4.21 on Mon Apr  1 15:45:53 2019
*nat
:PREROUTING ACCEPT [15:1639]
:INPUT ACCEPT [0:0]
:OUTPUT ACCEPT [3:252]
:POSTROUTING ACCEPT [3:252]
-A PREROUTING -i eth0 -j DNAT --to-destination 61.135.218.25
-A POSTROUTING -j SNAT --to-source 10.66.140.200
COMMIT
# Completed on Mon Apr  1 15:45:53 2019


* 10.66.140.97 is the node ip

Comment 11 Dan Winship 2019-04-02 14:34:45 UTC
It works for me... can I get access to this cluster? (Or another cluster demonstrating the bug)

Alternatively, can you get "iptables-save -c" (at the node level, not inside the router pod) and OVS dump-flows output, both before and after trying the ping test in the pod?

Comment 12 Meng Bo 2019-04-09 07:09:17 UTC
Created attachment 1553794 [details]
iptables_and_openflow

The attachment contains the requested dumps. And I will send you a separate mail about the cluster info if you'd like to have a look on it.

Comment 13 Dan Winship 2019-04-09 16:17:00 UTC
> Tested on ocp v3.7.108 and egress router image ose-egress-router:v3.7.108 d5b8e14f9ec6

Oh, right; it's not fixed for you because you're using the 3.7 egress-router image whereas I was using the :latest one. We need to backport a fix to the egress-router itself.

Comment 14 Dan Winship 2019-04-09 16:26:19 UTC
So FTR, note that RHBA-2019:0617 does contain the openshift-sdn side of this bugfix, it's just missing the fixed egress-router image. But if your egress routers use the "openshift/origin-egress-router" (:latest) image rather than "ose-egress-router", then it will work.

Comment 15 Dan Winship 2019-04-09 16:45:28 UTC
OK, https://github.com/openshift/ose/pull/1520 contains the rest of the fix

Comment 17 zhaozhanqi 2019-05-24 10:02:34 UTC
Verified this bug on v3.7.118

the egress router pod can access the node which located.

sh-4.2# ip route
default via 10.0.77.254 dev macvlan0 
10.0.76.163 via 10.129.0.1 dev eth0 
10.0.77.254 dev macvlan0 scope link

sh-4.2# ping 10.0.76.163
PING 10.0.76.163 (10.0.76.163) 56(84) bytes of data.
64 bytes from 10.0.76.163: icmp_seq=1 ttl=64 time=0.402 ms
64 bytes from 10.0.76.163: icmp_seq=2 ttl=64 time=0.113 m

sh-4.2# iptables-save | grep POSTROUTING
:POSTROUTING ACCEPT [2:168]
-A POSTROUTING -o macvlan0 -j SNAT --to-source 10.0.76.100    ##10.0.76.100 is the egress ip.

Comment 19 errata-xmlrpc 2019-06-11 19:50:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:1302