Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1626387

Summary: [OVN] container cannot access the dns server
Product: OpenShift Container Platform Reporter: zhaozhanqi <zzhao>
Component: NetworkingAssignee: Casey Callendrello <cdc>
Status: CLOSED CURRENTRELEASE QA Contact: zhaozhanqi <zzhao>
Severity: high Docs Contact:
Priority: medium    
Version: 3.11.0CC: aos-bugs, bbennett, weliang, wmeng
Target Milestone: ---   
Target Release: 4.2.0   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-06-18 15:24:43 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Testing logs
none
iptables rules from OVS and OVN none

Description zhaozhanqi 2018-09-07 08:12:43 UTC
Description of problem:
Setup cluster with OVN plugin. found it cannot be access the dns server in the container.

Version-Release number of selected component (if applicable):
openshift3/ose-ovn-kubernetes:v3.11.0-0.28.0(id=a9bcd525611e)

How reproducible:
always

Steps to Reproduce:
1. setup the OCP with OVN plugin
2. Create pod in project
   oc create -f https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/networking/list_for_pods.json
3. ping internal and external in the pod
  oc rsh test-pod
  # $ ping kubernets.default.svc
ping: unknown host kubernets.default.svc
  $ ping www.google.com
ping: unknown host www.google.com

Actual results:


Expected results:

container should access the dns server.

Additional info:
the public ip can be accessed from the container
#ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
64 bytes from 8.8.8.8: icmp_seq=1 ttl=111 time=2.30 ms
64 bytes from 8.8.8.8: icmp_seq=2 ttl=111 time=1.40 ms

Comment 1 Phil Cameron 2018-09-21 17:47:13 UTC
Recreated test on my 3 node cluster.

# oc get no
NAME                                        STATUS    ROLES          AGE       VERSION
wsfd-netdev22.ntdv.lab.eng.bos.redhat.com   Ready     infra,master   1d        v1.11.0+d4cacc0
wsfd-netdev28.ntdv.lab.eng.bos.redhat.com   Ready     compute        1d        v1.11.0+d4cacc0
wsfd-netdev35.ntdv.lab.eng.bos.redhat.com   Ready     compute        1d        v1.11.0+d4cacc0

# oc new-project test1

# oc create -f https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/networking/list_for_pods.json
replicationcontroller/test-rc created
service/test-service created
# oc get all
NAME                READY     STATUS    RESTARTS   AGE
pod/test-rc-992wn   1/1       Running   0          29s
pod/test-rc-cg4t9   1/1       Running   0          29s

NAME                            DESIRED   CURRENT   READY     AGE
replicationcontroller/test-rc   2         2         2         29s

NAME                   TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)     AGE
service/test-service   ClusterIP   172.30.219.98   <none>        27017/TCP   29s
# oc get po -o wide
NAME            READY     STATUS    RESTARTS   AGE       IP           NODE                                        NOMINATED NODE
test-rc-992wn   1/1       Running   0          1m        10.128.2.5   wsfd-netdev35.ntdv.lab.eng.bos.redhat.com   <none>
test-rc-cg4t9   1/1       Running   0          1m        10.128.1.4   wsfd-netdev28.ntdv.lab.eng.bos.redhat.com   <none>

# curl 10.128.2.5:8080
Hello OpenShift!
# ping 10.128.2.5
PING 10.128.2.5 (10.128.2.5) 56(84) bytes of data.
64 bytes from 10.128.2.5: icmp_seq=1 ttl=63 time=2.18 ms
...
# oc rsh test-rc-992wn
/ $ curl 10.128.1.4:8080
Hello OpenShift!
/ $ ping 10.128.1.4
PING 10.128.1.4 (10.128.1.4) 56(84) bytes of data.
64 bytes from 10.128.1.4: icmp_seq=1 ttl=63 time=2.48 ms
...
$ ping www.google.com
PING www.google.com (172.217.15.68) 56(84) bytes of data.
64 bytes from iad23s63-in-f4.1e100.net (172.217.15.68): icmp_seq=1 ttl=41 time=38.6 ms
....
$ ping  kubernetes.default.svc
ping: unknown host kubernetes.default.svc
$ exit

# curl 172.30.219.98:27017
Hello OpenShift!

There is no route for kubernetes.default.svc

This is working on my cluster.

Comment 2 Phil Cameron 2018-09-21 21:45:42 UTC
Spent some time with Weibin. There are significant differences between his clusters on aws and my lab cluster. The above problem is on his lab cluster when he installs with ovn, but not when he installs sdn. There is a route certificate problem on his cluster and not on mine. He occasionally see a ovnkube panic, not sure if its the same cause each time or what I have seen. Analysis is in progress, more investigation needed.

Comment 3 Weibin Liang 2018-09-27 19:38:16 UTC
Attached a detailed test log which testing from dns passed SDN cluster and dns failed OVN cluster.

Comment 4 Weibin Liang 2018-09-27 19:38:48 UTC
Created attachment 1487896 [details]
Testing logs

Comment 5 Weibin Liang 2018-10-16 18:35:58 UTC
Please ignore above attachment 1487896 [details].

See new test logs in new attachment.

In ovs, after creating the new pod/svc, the node will add the rules for port 53

[root@ip-172-18-10-166 ec2-user]# iptables-save | grep 53
-A KUBE-SEP-DKYVUOI2CXZAJVDR -p tcp -m comment --comment "default/kubernetes:dns-tcp" -m tcp -j DNAT --to-destination 172.18.7.59:8053
-A KUBE-SEP-KDT7ZLRJZTMDLVLE -p udp -m comment --comment "default/kubernetes:dns" -m udp -j DNAT --to-destination 172.18.7.59:8053
-A KUBE-SERVICES ! -s 10.128.0.0/14 -d 172.30.0.1/32 -p tcp -m comment --comment "default/kubernetes:dns-tcp cluster IP" -m tcp --dport 53 -j KUBE-MARK-MASQ
-A KUBE-SERVICES -d 172.30.0.1/32 -p tcp -m comment --comment "default/kubernetes:dns-tcp cluster IP" -m tcp --dport 53 -j KUBE-SVC-BA6I5HTZKAAAJT56
-A KUBE-SERVICES ! -s 10.128.0.0/14 -d 172.30.126.53/32 -p tcp -m comment --comment "default/router:1936-tcp cluster IP" -m tcp --dport 1936 -j KUBE-MARK-MASQ
-A KUBE-SERVICES -d 172.30.126.53/32 -p tcp -m comment --comment "default/router:1936-tcp cluster IP" -m tcp --dport 1936 -j KUBE-SVC-4JCRTMMYZAAYMIJ2
-A KUBE-SERVICES ! -s 10.128.0.0/14 -d 172.30.0.1/32 -p udp -m comment --comment "default/kubernetes:dns cluster IP" -m udp --dport 53 -j KUBE-MARK-MASQ
-A KUBE-SERVICES -d 172.30.0.1/32 -p udp -m comment --comment "default/kubernetes:dns cluster IP" -m udp --dport 53 -j KUBE-SVC-3VQ6B3MLH7E2SZT4
-A KUBE-SERVICES ! -s 10.128.0.0/14 -d 172.30.126.53/32 -p tcp -m comment --comment "default/router:80-tcp cluster IP" -m tcp --dport 80 -j KUBE-MARK-MASQ
-A KUBE-SERVICES -d 172.30.126.53/32 -p tcp -m comment --comment "default/router:80-tcp cluster IP" -m tcp --dport 80 -j KUBE-SVC-GQKZAHCS5DTMHUQ6
-A KUBE-SERVICES ! -s 10.128.0.0/14 -d 172.30.126.53/32 -p tcp -m comment --comment "default/router:443-tcp cluster IP" -m tcp --dport 443 -j KUBE-MARK-MASQ
-A KUBE-SERVICES -d 172.30.126.53/32 -p tcp -m comment --comment "default/router:443-tcp cluster IP" -m tcp --dport 443 -j KUBE-SVC-IKV43KYNCXS2W7KZ

In ovn, adding the new rules for port 53 does not happen.
[root@ip-172-18-7-211 ec2-user]#  iptables-save | grep 53
# Generated by iptables-save v1.4.21 on Tue Oct 16 13:53:49 2018
# Completed on Tue Oct 16 13:53:49 2018
# Generated by iptables-save v1.4.21 on Tue Oct 16 13:53:49 2018
:OUTPUT ACCEPT [5313:492235]
# Completed on Tue Oct 16 13:53:49 2018

Comment 6 Weibin Liang 2018-10-16 18:37:00 UTC
Created attachment 1494514 [details]
iptables rules from OVS and OVN

Comment 7 zhaozhanqi 2018-10-17 06:15:40 UTC
yes, for openshift sdn, the node will DNAT all requests to skydns in the master by: 

-A KUBE-SEP-DKYVUOI2CXZAJVDR -p tcp -m comment --comment "default/kubernetes:dns-tcp" -m tcp -j DNAT --to-destination 172.18.7.59:8053
-A KUBE-SEP-KDT7ZLRJZTMDLVLE -p udp -m comment --comment "default/kubernetes:dns" -m udp -j DNAT --to-destination 172.18.7.59:8053


For OVN, I'm not sure it's also use this kind of way?

Comment 8 Phil Cameron 2018-10-19 15:42:29 UTC
https://github.com/openvswitch/ovn-kubernetes/pull/456

Added iptables rule to permit traffic from pod to external network.

Comment 9 Weibin Liang 2018-10-19 18:29:19 UTC
Tested above PR and container can ping outside hostname (yahoo.com) now.

Comment 10 zhaozhanqi 2018-10-22 05:32:15 UTC
seems the PR 456 only fixed the public dns issue, How about the internal dns, eg:

#ping  kubernetes.default.svc

Comment 11 Phil Cameron 2018-11-12 14:26:40 UTC
PR does only fix the public dns issue. 
There is ongoing discussion about whether to support internal cluster DNS in the dev preview and if we do which dns solution to use. Openshift 4.0 will use CoreDNS.

Comment 12 Phil Cameron 2019-01-17 19:50:24 UTC
This is a network edge issue. Please reassign.

Comment 13 Casey Callendrello 2019-06-18 15:24:43 UTC
This is no longer an issue in 4.x