Bug 1261923 - Cannot connect to the pods in default project from other projects
Summary: Cannot connect to the pods in default project from other projects
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OKD
Classification: Red Hat
Component: Networking
Version: 3.x
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 3.x
Assignee: Dan Winship
QA Contact: Meng Bo
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-09-10 12:47 UTC by Meng Bo
Modified: 2015-11-23 21:14 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-11-23 21:14:42 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
debug_logs (3.29 MB, application/x-gzip)
2015-09-21 07:28 UTC, Meng Bo
no flags Details

Description Meng Bo 2015-09-10 12:47:43 UTC
Description of problem:
Setup OSE env with redhat/openshift-ovs-multitenant network plugin via ansible, Create router/registry in the default project, create pods in the user owned project. 
The pods in default project and pods in the user owned project cannot connect to each other.


Version-Release number of selected component (if applicable):
openshift v3.0.1.900-185-g2f7757a
kubernetes v1.1.0-alpha.0-1605-g44c91b1


How reproducible:
always


Steps to Reproduce:
1. Setup multi-node env with above OSE build
2. Create registry/router in the default project
* should set host-network to false for router
3. Create pod in the user owned project 
$ oc new-project u1p1
$ oc create -f https://raw.githubusercontent.com/bmeng/mytestfiles/master/pod_bmenghelloopenshift.json
4. Try to ping the registry/router pod from inside the user's pod


Actual results:
The pod cannot connect to the router/registry pod.

Expected results:
Pods in default project can be reached from all the pods in other projects.

Additional info:
dump on the two nodes:

node1:
# ovs-ofctl dump-flows br0 -O OpenFlow13
OFPST_FLOW reply (OF1.3) (xid=0x2):
 cookie=0x0, duration=92867.238s, table=0, n_packets=2905, n_bytes=1262282, actions=learn(table=8,hard_timeout=900,priority=200,NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[],load:NXM_NX_TUN_IPV4_SRC[]->NXM_NX_TUN_IPV4_DST[],output:NXM_OF_IN_PORT[]),goto_table:1
 cookie=0x0, duration=92867.223s, table=1, n_packets=1173, n_bytes=275677, actions=goto_table:3
 cookie=0x0, duration=92867.232s, table=1, n_packets=8, n_bytes=784, in_port=1 actions=goto_table:2
 cookie=0x0, duration=92867.226s, table=1, n_packets=107, n_bytes=12759, in_port=9 actions=goto_table:5
 cookie=0x0, duration=92867.229s, table=1, n_packets=1543, n_bytes=969954, in_port=2 actions=goto_table:5
 cookie=0x0, duration=92867.235s, table=1, n_packets=74, n_bytes=3108, arp actions=goto_table:8
 cookie=0x0, duration=92867.215s, table=2, n_packets=0, n_bytes=0, tun_id=0 actions=goto_table:5
 cookie=0x0, duration=92867.218s, table=2, n_packets=0, n_bytes=0, priority=200,ip,nw_dst=10.1.1.1 actions=output:2
 cookie=0x0, duration=92867.212s, table=2, n_packets=8, n_bytes=784, priority=100,ip,nw_dst=10.1.1.0/24 actions=move:NXM_NX_TUN_ID[0..31]->NXM_NX_REG0[],goto_table:6
 cookie=0x0, duration=92867.220s, table=2, n_packets=0, n_bytes=0, arp actions=goto_table:8
 cookie=0x4, duration=92816.773s, table=3, n_packets=0, n_bytes=0, priority=100,ip,in_port=4,nw_src=10.1.1.3 actions=load:0xa->NXM_NX_REG0[],goto_table:4
 cookie=0x15, duration=824.244s, table=3, n_packets=414, n_bytes=53768, priority=100,ip,in_port=15,nw_src=10.1.1.14 actions=load:0xa->NXM_NX_REG0[],goto_table:4
 cookie=0x0, duration=92867.206s, table=4, n_packets=1098, n_bytes=269767, priority=0 actions=goto_table:5
 cookie=0x0, duration=6729.105s, table=4, n_packets=0, n_bytes=0, priority=200,tcp,reg0=0xd,nw_dst=172.30.99.48,tp_dst=5432 actions=output:2
 cookie=0x0, duration=6725.106s, table=4, n_packets=0, n_bytes=0, priority=200,tcp,reg0=0xd,nw_dst=172.30.3.121,tp_dst=5434 actions=output:2
 cookie=0x0, duration=6392.114s, table=4, n_packets=0, n_bytes=0, priority=200,tcp,reg0=0xd,nw_dst=172.30.41.199,tp_dst=5434 actions=output:2
 cookie=0x0, duration=1398.598s, table=4, n_packets=0, n_bytes=0, priority=200,tcp,reg0=0xd,nw_dst=172.30.180.205,tp_dst=5434 actions=output:2
 cookie=0x0, duration=6395.610s, table=4, n_packets=0, n_bytes=0, priority=200,tcp,reg0=0xd,nw_dst=172.30.164.253,tp_dst=5432 actions=output:2
 cookie=0x0, duration=10462.128s, table=4, n_packets=0, n_bytes=0, priority=200,tcp,reg0=0xd,nw_dst=172.30.155.141,tp_dst=5434 actions=output:2
 cookie=0x0, duration=6311.598s, table=4, n_packets=0, n_bytes=0, priority=200,tcp,reg0=0xd,nw_dst=172.30.161.148,tp_dst=5434 actions=output:2
 cookie=0x0, duration=1398.631s, table=4, n_packets=0, n_bytes=0, priority=200,tcp,reg0=0xa,nw_dst=172.30.0.1,tp_dst=443 actions=output:2
 cookie=0x0, duration=6315.602s, table=4, n_packets=0, n_bytes=0, priority=200,tcp,reg0=0xd,nw_dst=172.30.36.179,tp_dst=5432 actions=output:2
 cookie=0x0, duration=1398.615s, table=4, n_packets=0, n_bytes=0, priority=200,tcp,reg0=0xa,nw_dst=172.30.238.161,tp_dst=80 actions=output:2
 cookie=0x0, duration=1398.647s, table=4, n_packets=0, n_bytes=0, priority=200,tcp,reg0=0xa,nw_dst=172.30.211.182,tp_dst=5000 actions=output:2
 cookie=0x0, duration=1398.576s, table=4, n_packets=0, n_bytes=0, priority=200,tcp,reg0=0xd,nw_dst=172.30.113.237,tp_dst=5432 actions=output:2
 cookie=0x0, duration=10466.137s, table=4, n_packets=0, n_bytes=0, priority=200,tcp,reg0=0xd,nw_dst=172.30.36.251,tp_dst=5432 actions=output:2
 cookie=0x0, duration=92867.209s, table=4, n_packets=0, n_bytes=0, priority=100,ip,nw_dst=172.30.0.0/16 actions=drop
 cookie=0x0, duration=92867.204s, table=5, n_packets=203, n_bytes=29396, priority=200,ip,nw_dst=10.1.1.1 actions=output:2
 cookie=0x0, duration=92867.198s, table=5, n_packets=0, n_bytes=0, priority=100,ip,nw_dst=10.1.0.0/16 actions=goto_table:7
 cookie=0x0, duration=92867.195s, table=5, n_packets=939, n_bytes=247600, priority=0,ip actions=output:2
 cookie=0x0, duration=92867.201s, table=5, n_packets=1546, n_bytes=970384, priority=150,ip,nw_dst=10.1.1.0/24 actions=goto_table:6
 cookie=0x0, duration=92867.192s, table=6, n_packets=1535, n_bytes=969306, priority=200,ip,reg0=0 actions=goto_table:8
 cookie=0x4, duration=92816.770s, table=6, n_packets=0, n_bytes=0, priority=100,ip,reg0=0xa,nw_dst=10.1.1.3 actions=output:4
 cookie=0x15, duration=824.238s, table=6, n_packets=0, n_bytes=0, priority=100,ip,reg0=0xa,nw_dst=10.1.1.14 actions=output:15
 cookie=0xa424f8c, duration=193.469s, table=7, n_packets=0, n_bytes=0, priority=100,ip,nw_dst=10.1.2.0/24 actions=move:NXM_NX_REG0[]->NXM_NX_TUN_ID[0..31],set_field:10.66.79.140->tun_dst,output:1
 cookie=0xa424f8a, duration=193.498s, table=7, n_packets=0, n_bytes=0, priority=100,ip,nw_dst=10.1.0.0/24 actions=move:NXM_NX_REG0[]->NXM_NX_TUN_ID[0..31],set_field:10.66.79.138->tun_dst,output:1
 cookie=0x0, duration=824.575s, table=8, n_packets=519, n_bytes=62811, hard_timeout=900, priority=200,dl_dst=02:42:0a:01:01:0e actions=load:0->NXM_NX_TUN_IPV4_DST[],output:15
 cookie=0x0, duration=1152.116s, table=8, n_packets=3, n_bytes=126, hard_timeout=900, priority=200,dl_dst=02:42:0a:01:01:0c actions=load:0->NXM_NX_TUN_IPV4_DST[],output:13
 cookie=0x0, duration=622.705s, table=8, n_packets=1, n_bytes=42, hard_timeout=900, priority=200,dl_dst=02:42:0a:01:00:17 actions=load:0xa424f8a->NXM_NX_TUN_IPV4_DST[],output:1
 cookie=0x0, duration=841.830s, table=8, n_packets=2, n_bytes=84, hard_timeout=900, priority=200,dl_dst=be:fb:17:83:4b:93 actions=load:0->NXM_NX_TUN_IPV4_DST[],output:2
 cookie=0x0, duration=842.634s, table=8, n_packets=367, n_bytes=450763, hard_timeout=900, priority=200,dl_dst=02:42:0a:01:01:0d actions=load:0->NXM_NX_TUN_IPV4_DST[],output:14
 cookie=0xa424f8c, duration=193.457s, table=8, n_packets=0, n_bytes=0, priority=100,arp,arp_tpa=10.1.2.0/24 actions=move:NXM_NX_REG0[]->NXM_NX_TUN_ID[0..31],set_field:10.66.79.140->tun_dst,output:1
 cookie=0xa424f8a, duration=193.479s, table=8, n_packets=0, n_bytes=0, priority=100,arp,arp_tpa=10.1.0.0/24 actions=move:NXM_NX_REG0[]->NXM_NX_TUN_ID[0..31],set_field:10.66.79.138->tun_dst,output:1
 cookie=0x0, duration=92867.190s, table=8, n_packets=20, n_bytes=840, priority=0,arp actions=FLOOD


node2:
# ovs-ofctl dump-flows br0 -O OpenFlow13
OFPST_FLOW reply (OF1.3) (xid=0x2):
 cookie=0x0, duration=92893.426s, table=0, n_packets=3105, n_bytes=1208996, actions=learn(table=8,hard_timeout=900,priority=200,NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[],load:NXM_NX_TUN_IPV4_SRC[]->NXM_NX_TUN_IPV4_DST[],output:NXM_OF_IN_PORT[]),goto_table:1
 cookie=0x0, duration=92893.411s, table=1, n_packets=1196, n_bytes=226685, actions=goto_table:3
 cookie=0x0, duration=92893.419s, table=1, n_packets=0, n_bytes=0, in_port=1 actions=goto_table:2
 cookie=0x0, duration=92893.414s, table=1, n_packets=245, n_bytes=33129, in_port=9 actions=goto_table:5
 cookie=0x0, duration=92893.416s, table=1, n_packets=1593, n_bytes=946200, in_port=2 actions=goto_table:5
 cookie=0x0, duration=92893.423s, table=1, n_packets=71, n_bytes=2982, arp actions=goto_table:8
 cookie=0x0, duration=92893.403s, table=2, n_packets=0, n_bytes=0, tun_id=0 actions=goto_table:5
 cookie=0x0, duration=92893.406s, table=2, n_packets=0, n_bytes=0, priority=200,ip,nw_dst=10.1.0.1 actions=output:2
 cookie=0x0, duration=92893.401s, table=2, n_packets=0, n_bytes=0, priority=100,ip,nw_dst=10.1.0.0/24 actions=move:NXM_NX_TUN_ID[0..31]->NXM_NX_REG0[],goto_table:6
 cookie=0x0, duration=92893.409s, table=2, n_packets=0, n_bytes=0, arp actions=goto_table:8
 cookie=0x23, duration=865.018s, table=3, n_packets=389, n_bytes=51400, priority=100,ip,in_port=23,nw_src=10.1.0.24 actions=load:0xa->NXM_NX_REG0[],goto_table:4
 cookie=0x20, duration=5347.861s, table=3, n_packets=0, n_bytes=0, priority=100,ip,in_port=20,nw_src=10.1.0.21 actions=load:0xd->NXM_NX_REG0[],goto_table:4
 cookie=0x0, duration=92893.396s, table=4, n_packets=1067, n_bytes=216395, priority=0 actions=goto_table:5
 cookie=0x0, duration=6755.416s, table=4, n_packets=0, n_bytes=0, priority=200,tcp,reg0=0xd,nw_dst=172.30.99.48,tp_dst=5432 actions=output:2
 cookie=0x0, duration=6751.414s, table=4, n_packets=0, n_bytes=0, priority=200,tcp,reg0=0xd,nw_dst=172.30.3.121,tp_dst=5434 actions=output:2
 cookie=0x0, duration=6418.423s, table=4, n_packets=0, n_bytes=0, priority=200,tcp,reg0=0xd,nw_dst=172.30.41.199,tp_dst=5434 actions=output:2
 cookie=0x0, duration=1420.068s, table=4, n_packets=0, n_bytes=0, priority=200,tcp,reg0=0xd,nw_dst=172.30.180.205,tp_dst=5434 actions=output:2
 cookie=0x0, duration=6421.923s, table=4, n_packets=0, n_bytes=0, priority=200,tcp,reg0=0xd,nw_dst=172.30.164.253,tp_dst=5432 actions=output:2
 cookie=0x0, duration=10488.437s, table=4, n_packets=0, n_bytes=0, priority=200,tcp,reg0=0xd,nw_dst=172.30.155.141,tp_dst=5434 actions=output:2
 cookie=0x0, duration=6337.916s, table=4, n_packets=0, n_bytes=0, priority=200,tcp,reg0=0xd,nw_dst=172.30.161.148,tp_dst=5434 actions=output:2
 cookie=0x0, duration=1420.105s, table=4, n_packets=0, n_bytes=0, priority=200,tcp,reg0=0xa,nw_dst=172.30.0.1,tp_dst=443 actions=output:2
 cookie=0x0, duration=6341.909s, table=4, n_packets=0, n_bytes=0, priority=200,tcp,reg0=0xd,nw_dst=172.30.36.179,tp_dst=5432 actions=output:2
 cookie=0x0, duration=1420.080s, table=4, n_packets=0, n_bytes=0, priority=200,tcp,reg0=0xa,nw_dst=172.30.238.161,tp_dst=80 actions=output:2
 cookie=0x0, duration=1420.132s, table=4, n_packets=0, n_bytes=0, priority=200,tcp,reg0=0xa,nw_dst=172.30.211.182,tp_dst=5000 actions=output:2
 cookie=0x0, duration=1420.048s, table=4, n_packets=0, n_bytes=0, priority=200,tcp,reg0=0xd,nw_dst=172.30.113.237,tp_dst=5432 actions=output:2
 cookie=0x0, duration=10492.440s, table=4, n_packets=0, n_bytes=0, priority=200,tcp,reg0=0xd,nw_dst=172.30.36.251,tp_dst=5432 actions=output:2
 cookie=0x0, duration=92893.398s, table=4, n_packets=0, n_bytes=0, priority=100,ip,nw_dst=172.30.0.0/16 actions=drop
 cookie=0x0, duration=92893.393s, table=5, n_packets=201, n_bytes=30036, priority=200,ip,nw_dst=10.1.0.1 actions=output:2
 cookie=0x0, duration=92893.388s, table=5, n_packets=8, n_bytes=784, priority=100,ip,nw_dst=10.1.0.0/16 actions=goto_table:7
 cookie=0x0, duration=92893.386s, table=5, n_packets=1021, n_bytes=211712, priority=0,ip actions=output:2
 cookie=0x0, duration=92893.391s, table=5, n_packets=1589, n_bytes=945944, priority=150,ip,nw_dst=10.1.0.0/24 actions=goto_table:6
 cookie=0x0, duration=92893.383s, table=6, n_packets=1585, n_bytes=945552, priority=200,ip,reg0=0 actions=goto_table:8
 cookie=0x20, duration=5347.858s, table=6, n_packets=0, n_bytes=0, priority=100,ip,reg0=0xd,nw_dst=10.1.0.21 actions=output:20
 cookie=0x23, duration=865.016s, table=6, n_packets=0, n_bytes=0, priority=100,ip,reg0=0xa,nw_dst=10.1.0.24 actions=output:23
 cookie=0xa424f73, duration=214.995s, table=7, n_packets=8, n_bytes=784, priority=100,ip,nw_dst=10.1.1.0/24 actions=move:NXM_NX_REG0[]->NXM_NX_TUN_ID[0..31],set_field:10.66.79.115->tun_dst,output:1
 cookie=0xa424f8c, duration=214.959s, table=7, n_packets=0, n_bytes=0, priority=100,ip,nw_dst=10.1.2.0/24 actions=move:NXM_NX_REG0[]->NXM_NX_TUN_ID[0..31],set_field:10.66.79.140->tun_dst,output:1
 cookie=0x0, duration=649.014s, table=8, n_packets=0, n_bytes=0, hard_timeout=900, priority=200,dl_dst=02:42:0a:01:01:0e actions=load:0xa424f73->NXM_NX_TUN_IPV4_DST[],output:1
 cookie=0x0, duration=863.157s, table=8, n_packets=2, n_bytes=84, hard_timeout=900, priority=200,dl_dst=8e:25:c2:77:c2:b0 actions=load:0->NXM_NX_TUN_IPV4_DST[],output:2
 cookie=0x0, duration=865.354s, table=8, n_packets=513, n_bytes=60491, hard_timeout=900, priority=200,dl_dst=02:42:0a:01:00:18 actions=load:0->NXM_NX_TUN_IPV4_DST[],output:23
 cookie=0x0, duration=1165.776s, table=8, n_packets=11, n_bytes=1051, hard_timeout=900, priority=200,dl_dst=02:42:0a:01:00:17 actions=load:0->NXM_NX_TUN_IPV4_DST[],output:22
 cookie=0xa424f8c, duration=214.953s, table=8, n_packets=0, n_bytes=0, priority=100,arp,arp_tpa=10.1.2.0/24 actions=move:NXM_NX_REG0[]->NXM_NX_TUN_ID[0..31],set_field:10.66.79.140->tun_dst,output:1
 cookie=0xa424f73, duration=214.967s, table=8, n_packets=1, n_bytes=42, priority=100,arp,arp_tpa=10.1.1.0/24 actions=move:NXM_NX_REG0[]->NXM_NX_TUN_ID[0..31],set_field:10.66.79.115->tun_dst,output:1
 cookie=0x0, duration=92893.381s, table=8, n_packets=15, n_bytes=630, priority=0,arp actions=FLOOD

Comment 2 Dan Winship 2015-09-16 13:14:17 UTC
So is this the same problem as bug 1257864 then?

Comment 3 Meng Bo 2015-09-17 09:53:51 UTC
No, this still can be reproduced when using the correct openshift-sdn configuration. 
And this bug can be reproduced in all my testing environments. (Origin on KVM, OSE on OpenStack, AEP on OpenStack, Origin on Vagrant)

Comment 4 Dan Winship 2015-09-18 14:17:38 UTC
(In reply to Meng Bo from comment #0)
> 2. Create registry/router in the default project
> * should set host-network to false for router

what are the exact steps here?

> 4. Try to ping the registry/router pod from inside the user's pod

and here?


Also, if you want to beta-test our exciting new magic-debug-info-gathering-tool, download https://raw.githubusercontent.com/danwinship/openshift-sdn/debug/hack/debug.sh and try running it on the master as root. (You'll need to set things up so root@master can ssh to root@nodes; https://github.com/openshift/openshift-sdn/pull/154 gives quick-and-dirty commands for doing that in the default vagrant setup; you should be able to do something similar if the ansible setup doesn't already allow this.)

Comment 5 Meng Bo 2015-09-21 07:26:59 UTC
@Dan,

1. To create registry and router, via system-admin under default namespace:
# oadm registry --create --credentials=/root/openshift.local.config/master/openshift-registry.kubeconfig
# oadm router --create --credentials=/root/openshift.local.config/master/openshift-router.kubeconfig --latest-images --service-account=default --replicas=3 --host-network=false

# oc get po -o json |grep podIP



2. To access router/registry from user's pod, create pod via user in his own namespace

$ oc create -f https://raw.githubusercontent.com/bmeng/mytestfiles/master/pod_bmenghelloopenshift.json
$ oc rsh hello-pod
-$ ping <IP_of_router_pod>

The last step cannot be succeeded due to the network unreachable.

bash-4.3$ ping 10.1.2.9 -c 10
PING 10.1.2.9 (10.1.2.9) 56(84) bytes of data.
--- 10.1.2.9 ping statistics ---
10 packets transmitted, 0 received, 100% packet loss, time 8999ms



This will also cause the router dose not work when using --host-network=false.

Comment 6 Meng Bo 2015-09-21 07:28:42 UTC
Created attachment 1075408 [details]
debug_logs

Attached the information collected by the debug tool.

This tool is awesome.

Comment 7 Meng Bo 2015-09-21 08:11:22 UTC
Cannot access the service in default namespace from pod in user's project too.

# oc get svc -n default
NAME              CLUSTER_IP       EXTERNAL_IP   PORT(S)    SELECTOR                  AGE
docker-registry   172.30.206.101   <none>        5000/TCP   docker-registry=default   4m
kubernetes        172.30.0.1       <none>        443/TCP    <none>                    29m
router            172.30.3.176     <none>        80/TCP     router=router             28m

# oc get po -n u1p1
NAME                   READY     STATUS    RESTARTS   AGE
hello-nginx-docker     1/1       Running   0          7m
hello-nginx-docker-2   1/1       Running   0          7m

# oc project u1p1

# oc rsh hello-nginx-docker
Access in pod:
-$ curl 172.30.206.101:5000/v2/
curl: (7) Failed to connect to 172.30.206.101 port 5000: Connection timed out


Access on node:
[root@node1 ~]# curl 172.30.206.101:5000/v2/
{"errors":[{"code":"UNAUTHORIZED","message":"access to the requested resource is not authorized","detail":null}]}

Comment 8 Dan Winship 2015-09-21 22:01:32 UTC
OK, currently "default" is not actually being treated as an admin namespace...

Comment 9 Dan Winship 2015-09-22 12:04:18 UTC
Should be fixed in master.

Comment 10 Meng Bo 2015-09-23 03:08:05 UTC
The pods in default namespace can be reached from pods in other namespace.

But the service in default namespace still cannot be accessed, as comment#7 above.

@Dan, Do the two issues have the same root cause? Or I need to open a separate issue for the service?

Comment 11 Dan Winship 2015-09-23 17:21:36 UTC
OK, that turns out to be a separate issue, but you can keep it as part of this bugzilla bug if you want. (https://github.com/openshift/openshift-sdn/issues/158)

Comment 12 Dan Winship 2015-10-02 14:30:20 UTC
This should now be fixed in openshift-sdn master, though it hasn't yet been merged from there into origin. However, if you check out origin and openshift-sdn, and then in the openshift-sdn checkout run "./hack/sync-to-origin.sh -r PATH_TO_ORIGIN_CHECKOUT", it will copy the current state of openshift-sdn over to origin, and then you can build origin to test it

Comment 13 Meng Bo 2015-10-08 10:42:54 UTC
I have tried after sync the latest openshift-sdn code to origin, the issue has been fixed.

Will move the bug to verified once the changes are merged into origin.

Comment 14 Meng Bo 2015-10-14 09:43:43 UTC
The change has been merged into origin, both the pods and services in default namespace can be accessed by the other project now.

Verify the bug.


Note You need to log in before you can comment on or make changes to this bug.