Bug 1552738 - Egress Router HTTP Proxy cannot reach the node which router pod runs
Summary: Egress Router HTTP Proxy cannot reach the node which router pod runs
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 3.7.0
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
: 3.10.0
Assignee: Dan Winship
QA Contact: Meng Bo
Depends On:
TreeView+ depends on / blocked
Reported: 2018-03-07 16:15 UTC by Birol Bilgin
Modified: 2018-12-29 07:35 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: The way that egress routers are set up made it impossible for an egress router pod to connect to the public IP address of the node it was hosted on. Consequence: If an egress pod was configured to use its node as a name server via /etc/resolv.conf, it would be unable to do DNS resolution. Fix: Traffic from an egress router pod to its node is now routed via the SDN tunnel instead of trying to send it via the egress interface. Result: Egress routers can now connect to their node's IP, and egress router DNS should always work, regardless of configuration.
Clone Of:
Last Closed: 2018-07-30 19:10:04 UTC
Target Upstream Version:

Attachments (Terms of Use)
iptables_filter (9.90 KB, text/plain)
2018-04-04 11:07 UTC, Birol Bilgin
no flags Details
iptables_nat (89.54 KB, text/plain)
2018-04-04 11:08 UTC, Birol Bilgin
no flags Details

System ID Priority Status Summary Last Updated
Origin (Github) 19885 None None None 2018-05-30 16:04:05 UTC
Red Hat Product Errata RHBA-2018:1816 None None None 2018-07-30 19:10:30 UTC

Internal Links: 1595291

Description Birol Bilgin 2018-03-07 16:15:35 UTC
Description of problem: 

Egress Router HTTP Proxy cannot reach the node which router pod runs
thus DNS name resolution does not work.

Version-Release number of selected component (if applicable):

OCP 3.7
it is probably applied to all version

How reproducible:

Created a namespace in a VM or on a host,
replicated the macvlan interface creation.

Used steps from the current snapshot of the github.com/openshift/origin

./images/egress/router/egress-router.sh 30:1 
function setup_network()

./pkg/network/node/pod.go 433:8-15 
                LinkAttrs: netlink.LinkAttrs{
                        MTU:         iface.Attrs().MTU,
                        Name:        "macvlan0",
                        ParentIndex: iface.Attrs().Index,
                        Namespace:   netlink.NsFd(podNs.Fd()),
                Mode: netlink.MACVLAN_MODE_PRIVATE,

Steps to Reproduce:
1. ip netns add test
2. ip link add macvlan0 link eth0 type macvlan mode private
3. ip link set dev macvlan0 netns test
4. ip netns exec test bash
// from now we running stuff in namespace
5. ip addr add <ip_belongs_the_same_subnet_as_host_ip> dev macvlan0
6. ip link set up dev macvlan0
7. ip route add "<host_gateway>"/32 dev macvlan0
8. ip route add default via "<host_gateway>" dev macvlan0
9. I ran a dnsmasq service to test dns, but I would imagine 
   any open port should work as well.
10. ping <host_ip>
11. dig @<host_ip> redhat.com

Actual results:

PING ( 56(84) bytes of data.
From icmp_seq=2 Redirect Host(New nexthop:
From icmp_seq=2 Redirect Host(New nexthop:
From icmp_seq=7 Redirect Host(New nexthop:
From icmp_seq=7 Redirect Host(New nexthop:
--- ping statistics ---
7 packets transmitted, 0 received, +2 errors, 100% packet loss, time 5999ms

dig @ redhat.com

; <<>> DiG 9.9.4-RedHat-9.9.4-51.el7_4.2 <<>> @ redhat.com
; (1 server found)
;; global options: +cmd
;; connection timed out; no servers could be reached

Expected results:

10. Not sure ping should work

11. Since pod nameserver is the node ip it should be
able to reach the node, so dns resolution does not work.

Additional info:

Comment 2 Meng Bo 2018-03-08 10:33:13 UTC
I can reproduce the issue with v3.9.3

The egress-http-proxy and egress-router pod cannot talk to the host ip when the dnsmasq is enabled on the node.

# ip neigh dev macvlan0  FAILED dev macvlan0 lladdr 52:54:00:7e:86:4e STALE

# ip route
default via dev macvlan0 dev macvlan0 proto kernel scope link src dev macvlan0 scope link dev eth0 proto kernel scope link src dev eth0 dev eth0 is the other node. is the node where the egress pod landed.

Comment 7 Dan Winship 2018-03-13 16:56:57 UTC
OK, right. This is inherent to the way macvlans work: even if you set them to "bridge" mode (which we don't), they can't send packets directly to their parent device, so even if you set up proper subnet routing, it would only be able to connect to the node's primary IP if the node's upstream router was willing to "hairpin" packets (which it probably isn't).

One possible fix would be to masquerade the packets to the node's internal SDN IP address instead. Eg, if the node has primary IP and tun0 IP, then you'd run (in the pod's network namespace):

  iptables -t nat -A OUTPUT -d \
      -j DNAT --to-destination
  iptables -t nat -I POSTROUTING -d \

(Note "-I" not "-A" on the second rule, to get it inserted before the default SNAT rule.)

This could be partially automated:

  egress_pod_eth0_address=$(ip addr show dev eth0 | \
      sed -ne 's/.*inet \([0-9.]*\)\/.*/\1/p')
  node_tun0_address=$(echo $(egress_pod_eth0_address) | sed -e 's/[0-9]*$/1/')
  iptables -t nat -A OUTPUT -d $(node_eth0_address)/32 \
      -j DNAT --to-destination $(node_tun0_address)
  iptables -t nat -I POSTROUTING -d $(node_tun0_address)/32 \

"node_eth0_address" needs to be filled in here by hand, but the tun0 address can be figured out from the egress-router's eth0 configuration.

Right now the egress-router.sh script doesn't know the node's primary IP so it wouldn't be able to set this up automatically. I need to think about the best way to do this.

Comment 8 Birol Bilgin 2018-03-14 14:34:19 UTC
When I ran this commands there were some errors, so values like $(egress_pod_eth0_address) should be used as $egress_pod_eth0_address otherwise bash interpreted this as a command to be run.

after running this, I could ping the nodeIP, 
however, I could not make any DNS queries.

these probes were taken from the pod's namespace 

$ nmap -p 53 -Pn

Starting Nmap 6.40 ( http://nmap.org ) at 2018-03-14 10:27 EDT
Nmap scan report for
Host is up.
53/tcp filtered domain

$ nmap -p 53 -Pn -sU

Starting Nmap 6.40 ( http://nmap.org ) at 2018-03-14 10:27 EDT
Nmap scan report for
Host is up.
53/udp open|filtered domain

seems we need modifications on host  networking as well

Comment 9 Meng Bo 2018-03-15 06:00:27 UTC
The script in comment#7, works for me. Though there are some shell syntax issues.

After fix the script and run it inside the container's netnamespace, the egress pod can access the host's ip address and can resolve domain names normally.

sh-4.2# cat /etc/resolv.conf 
search default.svc.cluster.local svc.cluster.local cluster.local par.redhat.com bmeng.local
options ndots:5

sh-4.2# ping
PING ( 56(84) bytes of data.
64 bytes from icmp_seq=1 ttl=64 time=0.459 ms
--- ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.459/0.459/0.459/0.000 ms

sh-4.2# curl -I www.youdao.com
HTTP/1.1 200 OK
Server: nginx
Date: Thu, 15 Mar 2018 05:58:04 GMT
Content-Type: text/html; charset=utf-8
Content-Length: 0
Connection: keep-alive
Cache-Control: private
Content-Language: en-US
Set-Cookie: DICT_UGC=be3af0da19b5c5e6aa4e17bd8d90b28a|; domain=.youdao.com
Set-Cookie: OUTFOX_SEARCH_USER_ID=542782096@; domain=.youdao.com; expires=Sat, 07-Mar-2048 05:58:03 GMT
Set-Cookie: JSESSIONID=abc5b_DX9j7OB_70jpOiw; domain=youdao.com; path=/

The iptables rules in the pod will like:
[root@ose-node2 ~]# nsenter -n -t 4291
[root@ose-node2 ~]# iptables -S -t nat 
-A OUTPUT -d -j DNAT --to-destination

Comment 10 Dan Winship 2018-03-15 12:58:35 UTC
> The iptables rules in the pod will like:

did you miss a line in the cut+paste? It should end with


if you don't see that there then the egress router isn't set up correctly. (Did you accidentally flush the egress router's own rules at some point?)

Comment 11 Birol Bilgin 2018-03-15 13:17:10 UTC
>for me the iptables rules are the same

# iptables -S -t nat
-A OUTPUT -d -j DNAT --to-destination

I think this is because of the routers EGRESS_ROUTER_MODE=http-proxy 
so for http-proxy setup_iptables does not run only setup_network runs


Comment 12 Birol Bilgin 2018-03-19 12:10:50 UTC
> correction https://bugzilla.redhat.com/show_bug.cgi?id=1552738#c8 

I have tested the workaround before on 3.6 it did not work

I have tested the workaround on 3.7 and now DNS resolution works.

Comment 15 Birol Bilgin 2018-04-04 11:07:52 UTC
Created attachment 1417214 [details]

Comment 16 Birol Bilgin 2018-04-04 11:08:36 UTC
Created attachment 1417215 [details]

Comment 20 Meng Bo 2018-04-27 08:44:20 UTC
Any update for this? Still face this problem in 3.10 testing.

Comment 22 Dan Winship 2018-05-30 16:04:06 UTC

Comment 23 Dan Winship 2018-05-30 16:07:55 UTC
Note to QE: the fix requires both a new origin binary and a new egress-router image. I'm not sure where your images come from when you're testing, but make sure you do get the new one. You can tell by looking at the iptables rules; if you do "oc exec my-egress-router-pod -- iptables-save", it should have:

  -A POSTROUTING -o macvlan0 -j SNAT --to-source ${EGRESS_SOURCE}



Comment 24 Weibin Liang 2018-06-01 20:24:05 UTC
@Dan, Testing on v3.10.0-0.56.0:

[root@ip-172-18-6-68 ~]# docker ps | grep egress
9d583f4b7e24        e32428b2269e                                                                                                                                       "/usr/bin/pod"           2 minutes ago        Up About a minute                           k8s_egressrouter-redirect_egress-redirect_p1_93d2644f-65d7-11e8-90ea-0e53bd2bbf32_0
55fa51e2bc48        registry.reg-aws.openshift.com:443/openshift3/ose-pod:v3.10.0-0.56.0                                                                               "/usr/bin/pod"           2 minutes ago        Up 2 minutes                                k8s_POD_egress-redirect_p1_93d2644f-65d7-11e8-90ea-0e53bd2bbf32_1
[root@ip-172-18-6-68 ~]# docker inspect 9d583f4b7e24  | grep Pid
            "Pid": 10927,
            "PidMode": "",
            "PidsLimit": 0,
[root@ip-172-18-6-68 ~]# nsenter -n -t 10927 bash
[root@ip-172-18-6-68 ~]# iptables-save
# Generated by iptables-save v1.4.21 on Fri Jun  1 16:12:47 2018
-A PREROUTING -i eth0 -j DNAT --to-destination
-A POSTROUTING -o macvlan0 -j SNAT --to-source
# Completed on Fri Jun  1 16:12:47 2018
# Generated by iptables-save v1.4.21 on Fri Jun  1 16:12:47 2018
# Completed on Fri Jun  1 16:12:47 2018
[root@ip-172-18-6-68 ~]# 

For the new fixed iptables rule, should it be -A POSTROUTING -o macvlan0 -j SNAT --to-source tunnel) not -A POSTROUTING -o macvlan0 -j SNAT --to-source interface)?

Comment 25 Dan Winship 2018-06-01 20:39:29 UTC
The rule is correct; it's not really a "new" rule, it's just a fix to the old rule. It used to be that *all* outgoing traffic got NATted to the EGRESS_SOURCE, but that ended up meaning that the egress-router couldn't send to addresses on the SDN. The fix was to add "-o macvlan0" to the NAT rule, so that it only applies to traffic that is going out the macvlan interface, which is what we'd intended all along.

Comment 27 Meng Bo 2018-06-05 10:45:38 UTC
Test on v3.10.0-0.58.0 with egress router images.

The egress router and egress http proxy features works well with dnsmasq enabled.

And the following iptables rule found in egress router pod:
-A POSTROUTING -o macvlan0 -j SNAT --to-source

Comment 33 errata-xmlrpc 2018-07-30 19:10:04 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.