Bug 1595291 - [Backport 3.7] Egress Router HTTP Proxy cannot reach the node which router pod runs
Summary: [Backport 3.7] Egress Router HTTP Proxy cannot reach the node which router po...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 3.7.0
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 3.7.z
Assignee: Dan Winship
QA Contact: Meng Bo
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-06-26 14:00 UTC by Birol Bilgin
Modified: 2019-06-11 19:50 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: The way that egress routers are set up made it impossible for an egress router pod to connect to the public IP address of the node it was hosted on. Consequence: If an egress pod was configured to use its node as a name server via /etc/resolv.conf, it would be unable to do DNS resolution. Fix: Traffic from an egress router pod to its node is now routed via the SDN tunnel instead of trying to send it via the egress interface. Result: Egress routers can now connect to their node's IP, and egress router DNS should always work, regardless of configuration.
Clone Of:
: 1698136 (view as bug list)
Environment:
Last Closed: 2019-06-11 19:50:27 UTC
Target Upstream Version:


Attachments (Terms of Use)
iptables_and_openflow (60.00 KB, application/x-tar)
2019-04-09 07:09 UTC, Meng Bo
no flags Details


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:1302 None None None 2019-06-11 19:50:29 UTC
Github openshift ose pull 1520 None None None 2019-04-09 16:45:27 UTC
Github ose/pull/1484 None None None 2019-01-07 15:49:05 UTC
Red Hat Bugzilla 1552738 None CLOSED Egress Router HTTP Proxy cannot reach the node which router pod runs 2019-06-03 11:44:02 UTC

Description Birol Bilgin 2018-06-26 14:00:54 UTC
Description of problem:

Backport request for https://bugzilla.redhat.com/show_bug.cgi?id=1552738 to 3.7

Comment 7 Dan Winship 2019-01-07 15:49:05 UTC
https://github.com/openshift/ose/pull/1484

Comment 9 Meng Bo 2019-04-01 07:47:31 UTC
Tested on ocp v3.7.108 and egress router image ose-egress-router:v3.7.108 d5b8e14f9ec6

The issue is not fixed.

The egress router pod cannot reach the host's eth0 IP and cannot reach local dnsmasq service.

The route of the egress router pod:
# ip r
default via 10.66.141.254 dev macvlan0 
10.66.140.97 via 10.128.0.1 dev eth0 
10.66.141.254 dev macvlan0 scope link 
10.128.0.0/23 dev eth0 proto kernel scope link src 10.128.0.152 
10.128.0.0/14 dev eth0 
172.30.0.0/16 via 10.128.0.1 dev eth0 
224.0.0.0/4 dev eth0

# ping 10.66.140.97
PING 10.66.140.97 (10.66.140.97) 56(84) bytes of data.
From 10.66.140.200 icmp_seq=1 Destination Host Unreachable
From 10.66.140.200 icmp_seq=2 Destination Host Unreachable
From 10.66.140.200 icmp_seq=3 Destination Host Unreachable
From 10.66.140.200 icmp_seq=4 Destination Host Unreachable

# iptables-save 
# Generated by iptables-save v1.4.21 on Mon Apr  1 15:45:53 2019
*filter
:INPUT ACCEPT [24:1937]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [20:1878]
COMMIT
# Completed on Mon Apr  1 15:45:53 2019
# Generated by iptables-save v1.4.21 on Mon Apr  1 15:45:53 2019
*nat
:PREROUTING ACCEPT [15:1639]
:INPUT ACCEPT [0:0]
:OUTPUT ACCEPT [3:252]
:POSTROUTING ACCEPT [3:252]
-A PREROUTING -i eth0 -j DNAT --to-destination 61.135.218.25
-A POSTROUTING -j SNAT --to-source 10.66.140.200
COMMIT
# Completed on Mon Apr  1 15:45:53 2019


* 10.66.140.97 is the node ip

Comment 11 Dan Winship 2019-04-02 14:34:45 UTC
It works for me... can I get access to this cluster? (Or another cluster demonstrating the bug)

Alternatively, can you get "iptables-save -c" (at the node level, not inside the router pod) and OVS dump-flows output, both before and after trying the ping test in the pod?

Comment 12 Meng Bo 2019-04-09 07:09:17 UTC
Created attachment 1553794 [details]
iptables_and_openflow

The attachment contains the requested dumps. And I will send you a separate mail about the cluster info if you'd like to have a look on it.

Comment 13 Dan Winship 2019-04-09 16:17:00 UTC
> Tested on ocp v3.7.108 and egress router image ose-egress-router:v3.7.108 d5b8e14f9ec6

Oh, right; it's not fixed for you because you're using the 3.7 egress-router image whereas I was using the :latest one. We need to backport a fix to the egress-router itself.

Comment 14 Dan Winship 2019-04-09 16:26:19 UTC
So FTR, note that RHBA-2019:0617 does contain the openshift-sdn side of this bugfix, it's just missing the fixed egress-router image. But if your egress routers use the "openshift/origin-egress-router" (:latest) image rather than "ose-egress-router", then it will work.

Comment 15 Dan Winship 2019-04-09 16:45:28 UTC
OK, https://github.com/openshift/ose/pull/1520 contains the rest of the fix

Comment 17 zhaozhanqi 2019-05-24 10:02:34 UTC
Verified this bug on v3.7.118

the egress router pod can access the node which located.

sh-4.2# ip route
default via 10.0.77.254 dev macvlan0 
10.0.76.163 via 10.129.0.1 dev eth0 
10.0.77.254 dev macvlan0 scope link

sh-4.2# ping 10.0.76.163
PING 10.0.76.163 (10.0.76.163) 56(84) bytes of data.
64 bytes from 10.0.76.163: icmp_seq=1 ttl=64 time=0.402 ms
64 bytes from 10.0.76.163: icmp_seq=2 ttl=64 time=0.113 m

sh-4.2# iptables-save | grep POSTROUTING
:POSTROUTING ACCEPT [2:168]
-A POSTROUTING -o macvlan0 -j SNAT --to-source 10.0.76.100    ##10.0.76.100 is the egress ip.

Comment 19 errata-xmlrpc 2019-06-11 19:50:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:1302


Note You need to log in before you can comment on or make changes to this bug.