Description of problem: Egress Router HTTP Proxy cannot reach the node which router pod runs thus DNS name resolution does not work. Version-Release number of selected component (if applicable): OCP 3.7 it is probably applied to all version How reproducible: Created a namespace in a VM or on a host, replicated the macvlan interface creation. Used steps from the current snapshot of the github.com/openshift/origin ./images/egress/router/egress-router.sh 30:1 function setup_network() ./pkg/network/node/pod.go 433:8-15 netlink.LinkAdd(&netlink.Macvlan{ LinkAttrs: netlink.LinkAttrs{ MTU: iface.Attrs().MTU, Name: "macvlan0", ParentIndex: iface.Attrs().Index, Namespace: netlink.NsFd(podNs.Fd()), }, Mode: netlink.MACVLAN_MODE_PRIVATE, }) Steps to Reproduce: 1. ip netns add test 2. ip link add macvlan0 link eth0 type macvlan mode private 3. ip link set dev macvlan0 netns test 4. ip netns exec test bash // from now we running stuff in namespace 5. ip addr add <ip_belongs_the_same_subnet_as_host_ip> dev macvlan0 6. ip link set up dev macvlan0 7. ip route add "<host_gateway>"/32 dev macvlan0 8. ip route add default via "<host_gateway>" dev macvlan0 9. I ran a dnsmasq service to test dns, but I would imagine any open port should work as well. 10. ping <host_ip> 11. dig @<host_ip> redhat.com Actual results: ping 172.22.2.52 PING 172.22.2.52 (172.22.2.52) 56(84) bytes of data. From 172.22.2.1 icmp_seq=2 Redirect Host(New nexthop: 172.22.2.52) From 172.22.2.1: icmp_seq=2 Redirect Host(New nexthop: 172.22.2.52) From 172.22.2.1 icmp_seq=7 Redirect Host(New nexthop: 172.22.2.52) From 172.22.2.1: icmp_seq=7 Redirect Host(New nexthop: 172.22.2.52) ^C --- 172.22.2.52 ping statistics --- 7 packets transmitted, 0 received, +2 errors, 100% packet loss, time 5999ms dig @172.22.2.52 redhat.com ; <<>> DiG 9.9.4-RedHat-9.9.4-51.el7_4.2 <<>> @172.22.2.52 redhat.com ; (1 server found) ;; global options: +cmd ;; connection timed out; no servers could be reached Expected results: 10. Not sure ping should work 11. Since pod nameserver is the node ip it should be able to reach the node, so dns resolution does not work. Additional info:
I can reproduce the issue with v3.9.3 The egress-http-proxy and egress-router pod cannot talk to the host ip when the dnsmasq is enabled on the node. # ip neigh 10.66.140.15 dev macvlan0 FAILED 10.66.140.117 dev macvlan0 lladdr 52:54:00:7e:86:4e STALE # ip route default via 10.66.141.254 dev macvlan0 10.66.140.0/23 dev macvlan0 proto kernel scope link src 10.66.140.200 10.66.141.254 dev macvlan0 scope link 10.128.0.0/23 dev eth0 proto kernel scope link src 10.128.0.17 10.128.0.0/14 dev eth0 224.0.0.0/4 dev eth0 10.66.140.15 is the other node. 10.66.140.117 is the node where the egress pod landed.
OK, right. This is inherent to the way macvlans work: even if you set them to "bridge" mode (which we don't), they can't send packets directly to their parent device, so even if you set up proper subnet routing, it would only be able to connect to the node's primary IP if the node's upstream router was willing to "hairpin" packets (which it probably isn't). One possible fix would be to masquerade the packets to the node's internal SDN IP address instead. Eg, if the node has primary IP 172.17.0.3 and tun0 IP 10.129.0.1, then you'd run (in the pod's network namespace): iptables -t nat -A OUTPUT -d 172.17.0.3/32 \ -j DNAT --to-destination 10.129.0.1 iptables -t nat -I POSTROUTING -d 10.129.0.1/32 \ -j MASQUERADE (Note "-I" not "-A" on the second rule, to get it inserted before the default SNAT rule.) This could be partially automated: #!/bin/bash node_eth0_address=172.22.2.52 egress_pod_eth0_address=$(ip addr show dev eth0 | \ sed -ne 's/.*inet \([0-9.]*\)\/.*/\1/p') node_tun0_address=$(echo $(egress_pod_eth0_address) | sed -e 's/[0-9]*$/1/') iptables -t nat -A OUTPUT -d $(node_eth0_address)/32 \ -j DNAT --to-destination $(node_tun0_address) iptables -t nat -I POSTROUTING -d $(node_tun0_address)/32 \ -j MASQUERADE "node_eth0_address" needs to be filled in here by hand, but the tun0 address can be figured out from the egress-router's eth0 configuration. Right now the egress-router.sh script doesn't know the node's primary IP so it wouldn't be able to set this up automatically. I need to think about the best way to do this.
When I ran this commands there were some errors, so values like $(egress_pod_eth0_address) should be used as $egress_pod_eth0_address otherwise bash interpreted this as a command to be run. after running this, I could ping the nodeIP, however, I could not make any DNS queries. these probes were taken from the pod's namespace $ nmap 172.22.2.52 -p 53 -Pn Starting Nmap 6.40 ( http://nmap.org ) at 2018-03-14 10:27 EDT Nmap scan report for 10.74.157.166 Host is up. PORT STATE SERVICE 53/tcp filtered domain $ nmap 172.22.2.52 -p 53 -Pn -sU Starting Nmap 6.40 ( http://nmap.org ) at 2018-03-14 10:27 EDT Nmap scan report for 10.74.157.166 Host is up. PORT STATE SERVICE 53/udp open|filtered domain seems we need modifications on host networking as well
The script in comment#7, works for me. Though there are some shell syntax issues. After fix the script and run it inside the container's netnamespace, the egress pod can access the host's ip address and can resolve domain names normally. sh-4.2# cat /etc/resolv.conf nameserver 10.66.140.15 search default.svc.cluster.local svc.cluster.local cluster.local par.redhat.com bmeng.local options ndots:5 sh-4.2# ping 10.66.140.15 PING 10.66.140.15 (10.66.140.15) 56(84) bytes of data. 64 bytes from 10.66.140.15: icmp_seq=1 ttl=64 time=0.459 ms ^C --- 10.66.140.15 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 0.459/0.459/0.459/0.000 ms sh-4.2# curl -I www.youdao.com HTTP/1.1 200 OK Server: nginx Date: Thu, 15 Mar 2018 05:58:04 GMT Content-Type: text/html; charset=utf-8 Content-Length: 0 Connection: keep-alive Cache-Control: private Content-Language: en-US Set-Cookie: DICT_UGC=be3af0da19b5c5e6aa4e17bd8d90b28a|; domain=.youdao.com Set-Cookie: OUTFOX_SEARCH_USER_ID=542782096.120.72; domain=.youdao.com; expires=Sat, 07-Mar-2048 05:58:03 GMT Set-Cookie: JSESSIONID=abc5b_DX9j7OB_70jpOiw; domain=youdao.com; path=/ The iptables rules in the pod will like: [root@ose-node2 ~]# nsenter -n -t 4291 [root@ose-node2 ~]# iptables -S -t nat -P PREROUTING ACCEPT -P INPUT ACCEPT -P OUTPUT ACCEPT -P POSTROUTING ACCEPT -A OUTPUT -d 10.66.140.15/32 -j DNAT --to-destination 10.129.0.1 -A POSTROUTING -d 10.129.0.1/32 -j MASQUERADE
> The iptables rules in the pod will like: did you miss a line in the cut+paste? It should end with -A POSTROUTING -j SNAT --to-source ${EGRESS_SOURCE} if you don't see that there then the egress router isn't set up correctly. (Did you accidentally flush the egress router's own rules at some point?)
>for me the iptables rules are the same # iptables -S -t nat -P PREROUTING ACCEPT -P INPUT ACCEPT -P OUTPUT ACCEPT -P POSTROUTING ACCEPT -A OUTPUT -d 10.74.157.166/32 -j DNAT --to-destination 10.128.0.1 -A POSTROUTING -d 10.128.0.1/32 -j MASQUERADE I think this is because of the routers EGRESS_ROUTER_MODE=http-proxy so for http-proxy setup_iptables does not run only setup_network runs https://github.com/openshift/origin/blob/9d81d1bbb5c512ffcb8cb34b373d93f92ed26628/images/egress/router/egress-router.sh#L153
> correction https://bugzilla.redhat.com/show_bug.cgi?id=1552738#c8 I have tested the workaround before on 3.6 it did not work I have tested the workaround on 3.7 and now DNS resolution works.
Created attachment 1417214 [details] iptables_filter
Created attachment 1417215 [details] iptables_nat
Any update for this? Still face this problem in 3.10 testing.
https://github.com/openshift/origin/pull/19885
Note to QE: the fix requires both a new origin binary and a new egress-router image. I'm not sure where your images come from when you're testing, but make sure you do get the new one. You can tell by looking at the iptables rules; if you do "oc exec my-egress-router-pod -- iptables-save", it should have: -A POSTROUTING -o macvlan0 -j SNAT --to-source ${EGRESS_SOURCE} not -A POSTROUTING -j SNAT --to-source ${EGRESS_SOURCE}
@Dan, Testing on v3.10.0-0.56.0: [root@ip-172-18-6-68 ~]# docker ps | grep egress 9d583f4b7e24 e32428b2269e "/usr/bin/pod" 2 minutes ago Up About a minute k8s_egressrouter-redirect_egress-redirect_p1_93d2644f-65d7-11e8-90ea-0e53bd2bbf32_0 55fa51e2bc48 registry.reg-aws.openshift.com:443/openshift3/ose-pod:v3.10.0-0.56.0 "/usr/bin/pod" 2 minutes ago Up 2 minutes k8s_POD_egress-redirect_p1_93d2644f-65d7-11e8-90ea-0e53bd2bbf32_1 [root@ip-172-18-6-68 ~]# docker inspect 9d583f4b7e24 | grep Pid "Pid": 10927, "PidMode": "", "PidsLimit": 0, [root@ip-172-18-6-68 ~]# nsenter -n -t 10927 bash [root@ip-172-18-6-68 ~]# iptables-save # Generated by iptables-save v1.4.21 on Fri Jun 1 16:12:47 2018 *nat :PREROUTING ACCEPT [0:0] :INPUT ACCEPT [0:0] :OUTPUT ACCEPT [0:0] :POSTROUTING ACCEPT [0:0] -A PREROUTING -i eth0 -j DNAT --to-destination 10.240.0.65 -A POSTROUTING -o macvlan0 -j SNAT --to-source 172.18.6.68 COMMIT # Completed on Fri Jun 1 16:12:47 2018 # Generated by iptables-save v1.4.21 on Fri Jun 1 16:12:47 2018 *filter :INPUT ACCEPT [0:0] :FORWARD ACCEPT [0:0] :OUTPUT ACCEPT [0:0] COMMIT # Completed on Fri Jun 1 16:12:47 2018 [root@ip-172-18-6-68 ~]# For the new fixed iptables rule, should it be -A POSTROUTING -o macvlan0 -j SNAT --to-source 10.129.0.1(SDN tunnel) not -A POSTROUTING -o macvlan0 -j SNAT --to-source 172.18.6.68(egress interface)?
The rule is correct; it's not really a "new" rule, it's just a fix to the old rule. It used to be that *all* outgoing traffic got NATted to the EGRESS_SOURCE, but that ended up meaning that the egress-router couldn't send to addresses on the SDN. The fix was to add "-o macvlan0" to the NAT rule, so that it only applies to traffic that is going out the macvlan interface, which is what we'd intended all along.
Test on v3.10.0-0.58.0 with egress router images. The egress router and egress http proxy features works well with dnsmasq enabled. And the following iptables rule found in egress router pod: -A POSTROUTING -o macvlan0 -j SNAT --to-source 10.66.140.201
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:1816