Bug 1458849
Summary: | Deny 0.0.0.0/0 blocks all DNS resolution to local nameserver | ||||||
---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Weibin Liang <weliang> | ||||
Component: | Networking | Assignee: | Jacob Tanenbaum <jtanenba> | ||||
Status: | CLOSED ERRATA | QA Contact: | Meng Bo <bmeng> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | high | ||||||
Version: | 3.6.0 | CC: | aloughla, aos-bugs, bbennett, danw, eparis, gpei, weliang, xtian, yadu | ||||
Target Milestone: | --- | ||||||
Target Release: | 3.7.0 | ||||||
Hardware: | Unspecified | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: |
Cause:
The nodes local IP address is not part of the ovsrules
Consequence:
If you Deny 0.0.0.0/0 and allow a dns name in the egress network policy, the node will not be able to reach that allowed address because dns name resolution is blocked
Fix:
Adding the local node IP to the ovs allowed rule so that the name resolution will not be blocked. Also adding a note to the docs for the case when dns resolution does not happen on the node.
Result:
can successfully block 0.0.0.0/0 as a cidrSelector and allow specific dns names through
|
Story Points: | --- | ||||
Clone Of: | Environment: | ||||||
Last Closed: | 2017-11-28 21:56:17 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Weibin Liang
2017-06-05 15:56:36 UTC
Test failed because deny 0.0.0.0/0 will block all DNS resolution to local nameserver [root@ip-172-18-3-73 ~]# oc get pods NAME READY STATUS RESTARTS AGE hello-openshift-4-322jq 1/1 Running 0 1m hello-openshift-4-4r7dg 1/1 Running 0 1m hello-openshift-4-ht0kh 1/1 Running 0 1m hello-openshift-4-s2rm7 1/1 Running 0 1m hello-openshift-4-wn7dg 1/1 Running 0 1m hello-pod-4-39stw 1/1 Running 0 1m hello-pod-4-760sj 1/1 Running 0 1m hello-pod-4-g478k 1/1 Running 0 1m hello-pod-4-rhq8n 1/1 Running 0 1m hello-pod-4-sg3ld 1/1 Running 0 1m [root@ip-172-18-3-73 ~]# oc rsh hello-openshift-4-322jq / $ cat /etc/resolv.conf nameserver 172.18.1.118 search test.svc.cluster.local svc.cluster.local cluster.local ec2.internal options ndots:5 / $ Suggest to allow traffic to nameserver(172.18.1.118) by default even using deny 0.0.0.0/0, then above test case will pass. We need to work out why an node ip address is being used in the resolv.conf, not the node sdn address. Alternatively, it would not be unreasonable to allow all traffic to the local node ip addresses (since we already allow traffic to the node SDN address). Replying to myself: we can't (easily) make the installer set up resolv.conf with the sdn address because we haven't got one until the node starts up and registers itself. So I think we should just add a rule into OVS when EgressNetworkPolicy is used that allows the local host's default ip address (or perhaps all addresses on that node) by default. Commit pushed to master at https://github.com/openshift/origin https://github.com/openshift/origin/commit/74f0bafa0351a08e45bdb735302032ecb2494c9d add the nodes local IP address to OVS rules this change adds the nodes local IP address to the ovs rules when using egressnetworkpolicies to limit egress from the cluster. Adding the nodes local IP allows for dns resolution when dns is accessable on the node. bug 1458849 changelog: - changed the rules creation to SetupOVS() - made both udp and tcp rules the same priority Test on latest OCP-3.6 env, seems the issue still could be reproduced. openshift v3.6.135 kubernetes v1.6.1+5115d708d7 etcd 3.2.1 [root@host-8-174-52 ~]# oc describe egressnetworkpolicy policy-test Name: policy-test Namespace: d2 Created: 9 minutes ago Labels: <none> Annotations: <none> Rule: Allow to www.baidu.com Rule: Deny to 0.0.0.0/0 [root@host-8-174-52 ~]# oc rsh hello-pod / # ping www.baidu.com ping: bad address 'www.baidu.com' what does /etc/resolv.conf contain inside hello-pod? what is the output of "ovs-ofctl -O OpenFlow13 dump-flows br0" on the node? openshift v3.6.135 kubernetes v1.6.1+5115d708d7 etcd 3.2.1 [root@host-8-174-52 ~]# oc get pod NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 14s [root@host-8-174-52 ~]# oc rsh hello-pod / # ping www.baidu.com PING www.baidu.com (103.235.46.39): 56 data bytes 64 bytes from 103.235.46.39: seq=0 ttl=36 time=252.119 ms 64 bytes from 103.235.46.39: seq=1 ttl=36 time=251.419 ms [root@host-8-174-52 ~]# oc create -f policy.json egressnetworkpolicy "policy-test" created [root@host-8-174-52 ~]# oc describe egressnetworkpolicy policy-test Name: policy-test Namespace: d3 Created: 11 seconds ago Labels: <none> Annotations: <none> Rule: Allow to www.baidu.com Rule: Deny to 0.0.0.0/0 [root@host-8-174-52 ~]# oc rsh hello-pod / # ping www.baidu.com ping: bad address 'www.baidu.com' / # cat /etc/resolv.conf nameserver 172.16.120.15 search d3.svc.cluster.local svc.cluster.local cluster.local openstacklocal host.centralci.eng.rdu2.redhat.com options ndots:5 Created attachment 1295159 [details]
openflow log
Could you verify if this works if you use the installer to set up the cluster? That should set up some dnsmasq's that allow this to work. Actually the env in #comment 9 was set up by installer. And here is the installer package version: openshift-ansible-3.6.138-1.git.0.2c647a9.el7.noarch.rpm Can we get the output from 'ip a' on the node where that openflow dump came from, or if the env doesn't exist any more, can you get the following from a new node that exhibits the bug: - ip a - cat /etc/resolv.conf - ovs-ofctl -O OpenFlow13 dump-flows br0 Thanks Interestingly, 172.16.120.15 is the address of a different node... it looks like the installer didn't set up a local dnsmasq, or perhaps the pod had moved to a different node than the one that had the ovs flow dump? Anyway, I will look at the installer to see what it does. Can you post the hosts file that you used so I can see what options were set (if any)? Also, can you please post the info from the previous comment too. Thanks. Tested on latest OCP v3.6.140 and works fine. [root@ip-172-18-3-187 ~]# oc create -f policy.json egressnetworkpolicy "policy-test" created [root@ip-172-18-3-187 ~]# oc create -f https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/networking/pod-for-ping.json pod "hello-pod" created [root@ip-172-18-3-187 ~]# cat policy.json { "kind": "EgressNetworkPolicy", "apiVersion": "v1", "metadata": { "name": "policy-test" }, "spec": { "egress": [ { "type": "Allow", "to": { "dnsName": "www.baidu.com" } }, { "type": "Deny", "to": { "cidrSelector": "0.0.0.0/0" } } ] } } [root@ip-172-18-3-187 ~]# oc get pod NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 15s [root@ip-172-18-3-187 ~]# oc describe egressnetworkpolicy policy-test Name: policy-test Namespace: p1 Created: About a minute ago Labels: <none> Annotations: <none> Rule: Allow to www.baidu.com Rule: Deny to 0.0.0.0/0 [root@ip-172-18-3-187 ~]# oc rsh hello-pod / # ping www.baidu.com PING www.baidu.com (103.235.46.39): 56 data bytes 64 bytes from 103.235.46.39: seq=0 ttl=37 time=240.932 ms 64 bytes from 103.235.46.39: seq=1 ttl=37 time=240.692 ms ^C --- www.baidu.com ping statistics --- 2 packets transmitted, 2 packets received, 0% packet loss round-trip min/avg/max = 240.692/240.812/240.932 ms / # ping www.cisco.com PING www.cisco.com (23.196.96.28): 56 data bytes ^C --- www.cisco.com ping statistics --- 4 packets transmitted, 0 packets received, 100% packet loss / # exit command terminated with exit code 1 [root@ip-172-18-3-187 ~]# oc version oc v3.6.140 kubernetes v1.6.1+5115d708d7 features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://ip-172-18-3-187.ec2.internal:8443 openshift v3.6.140 kubernetes v1.6.1+5115d708d7 [root@ip-172-18-3-187 ~]# Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:3188 |