Bug 1261923 - Cannot connect to the pods in default project from other projects
Cannot connect to the pods in default project from other projects
Status: CLOSED CURRENTRELEASE
Product: OpenShift Origin
Classification: Red Hat
Component: Networking (Show other bugs)
3.x
Unspecified Unspecified
medium Severity medium
: ---
: 3.x
Assigned To: Dan Winship
Meng Bo
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2015-09-10 08:47 EDT by Meng Bo
Modified: 2015-11-23 16:14 EST (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2015-11-23 16:14:42 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
debug_logs (3.29 MB, application/x-gzip)
2015-09-21 03:28 EDT, Meng Bo
no flags Details

  None (edit)
Description Meng Bo 2015-09-10 08:47:43 EDT
Description of problem:
Setup OSE env with redhat/openshift-ovs-multitenant network plugin via ansible, Create router/registry in the default project, create pods in the user owned project. 
The pods in default project and pods in the user owned project cannot connect to each other.


Version-Release number of selected component (if applicable):
openshift v3.0.1.900-185-g2f7757a
kubernetes v1.1.0-alpha.0-1605-g44c91b1


How reproducible:
always


Steps to Reproduce:
1. Setup multi-node env with above OSE build
2. Create registry/router in the default project
* should set host-network to false for router
3. Create pod in the user owned project 
$ oc new-project u1p1
$ oc create -f https://raw.githubusercontent.com/bmeng/mytestfiles/master/pod_bmenghelloopenshift.json
4. Try to ping the registry/router pod from inside the user's pod


Actual results:
The pod cannot connect to the router/registry pod.

Expected results:
Pods in default project can be reached from all the pods in other projects.

Additional info:
dump on the two nodes:

node1:
# ovs-ofctl dump-flows br0 -O OpenFlow13
OFPST_FLOW reply (OF1.3) (xid=0x2):
 cookie=0x0, duration=92867.238s, table=0, n_packets=2905, n_bytes=1262282, actions=learn(table=8,hard_timeout=900,priority=200,NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[],load:NXM_NX_TUN_IPV4_SRC[]->NXM_NX_TUN_IPV4_DST[],output:NXM_OF_IN_PORT[]),goto_table:1
 cookie=0x0, duration=92867.223s, table=1, n_packets=1173, n_bytes=275677, actions=goto_table:3
 cookie=0x0, duration=92867.232s, table=1, n_packets=8, n_bytes=784, in_port=1 actions=goto_table:2
 cookie=0x0, duration=92867.226s, table=1, n_packets=107, n_bytes=12759, in_port=9 actions=goto_table:5
 cookie=0x0, duration=92867.229s, table=1, n_packets=1543, n_bytes=969954, in_port=2 actions=goto_table:5
 cookie=0x0, duration=92867.235s, table=1, n_packets=74, n_bytes=3108, arp actions=goto_table:8
 cookie=0x0, duration=92867.215s, table=2, n_packets=0, n_bytes=0, tun_id=0 actions=goto_table:5
 cookie=0x0, duration=92867.218s, table=2, n_packets=0, n_bytes=0, priority=200,ip,nw_dst=10.1.1.1 actions=output:2
 cookie=0x0, duration=92867.212s, table=2, n_packets=8, n_bytes=784, priority=100,ip,nw_dst=10.1.1.0/24 actions=move:NXM_NX_TUN_ID[0..31]->NXM_NX_REG0[],goto_table:6
 cookie=0x0, duration=92867.220s, table=2, n_packets=0, n_bytes=0, arp actions=goto_table:8
 cookie=0x4, duration=92816.773s, table=3, n_packets=0, n_bytes=0, priority=100,ip,in_port=4,nw_src=10.1.1.3 actions=load:0xa->NXM_NX_REG0[],goto_table:4
 cookie=0x15, duration=824.244s, table=3, n_packets=414, n_bytes=53768, priority=100,ip,in_port=15,nw_src=10.1.1.14 actions=load:0xa->NXM_NX_REG0[],goto_table:4
 cookie=0x0, duration=92867.206s, table=4, n_packets=1098, n_bytes=269767, priority=0 actions=goto_table:5
 cookie=0x0, duration=6729.105s, table=4, n_packets=0, n_bytes=0, priority=200,tcp,reg0=0xd,nw_dst=172.30.99.48,tp_dst=5432 actions=output:2
 cookie=0x0, duration=6725.106s, table=4, n_packets=0, n_bytes=0, priority=200,tcp,reg0=0xd,nw_dst=172.30.3.121,tp_dst=5434 actions=output:2
 cookie=0x0, duration=6392.114s, table=4, n_packets=0, n_bytes=0, priority=200,tcp,reg0=0xd,nw_dst=172.30.41.199,tp_dst=5434 actions=output:2
 cookie=0x0, duration=1398.598s, table=4, n_packets=0, n_bytes=0, priority=200,tcp,reg0=0xd,nw_dst=172.30.180.205,tp_dst=5434 actions=output:2
 cookie=0x0, duration=6395.610s, table=4, n_packets=0, n_bytes=0, priority=200,tcp,reg0=0xd,nw_dst=172.30.164.253,tp_dst=5432 actions=output:2
 cookie=0x0, duration=10462.128s, table=4, n_packets=0, n_bytes=0, priority=200,tcp,reg0=0xd,nw_dst=172.30.155.141,tp_dst=5434 actions=output:2
 cookie=0x0, duration=6311.598s, table=4, n_packets=0, n_bytes=0, priority=200,tcp,reg0=0xd,nw_dst=172.30.161.148,tp_dst=5434 actions=output:2
 cookie=0x0, duration=1398.631s, table=4, n_packets=0, n_bytes=0, priority=200,tcp,reg0=0xa,nw_dst=172.30.0.1,tp_dst=443 actions=output:2
 cookie=0x0, duration=6315.602s, table=4, n_packets=0, n_bytes=0, priority=200,tcp,reg0=0xd,nw_dst=172.30.36.179,tp_dst=5432 actions=output:2
 cookie=0x0, duration=1398.615s, table=4, n_packets=0, n_bytes=0, priority=200,tcp,reg0=0xa,nw_dst=172.30.238.161,tp_dst=80 actions=output:2
 cookie=0x0, duration=1398.647s, table=4, n_packets=0, n_bytes=0, priority=200,tcp,reg0=0xa,nw_dst=172.30.211.182,tp_dst=5000 actions=output:2
 cookie=0x0, duration=1398.576s, table=4, n_packets=0, n_bytes=0, priority=200,tcp,reg0=0xd,nw_dst=172.30.113.237,tp_dst=5432 actions=output:2
 cookie=0x0, duration=10466.137s, table=4, n_packets=0, n_bytes=0, priority=200,tcp,reg0=0xd,nw_dst=172.30.36.251,tp_dst=5432 actions=output:2
 cookie=0x0, duration=92867.209s, table=4, n_packets=0, n_bytes=0, priority=100,ip,nw_dst=172.30.0.0/16 actions=drop
 cookie=0x0, duration=92867.204s, table=5, n_packets=203, n_bytes=29396, priority=200,ip,nw_dst=10.1.1.1 actions=output:2
 cookie=0x0, duration=92867.198s, table=5, n_packets=0, n_bytes=0, priority=100,ip,nw_dst=10.1.0.0/16 actions=goto_table:7
 cookie=0x0, duration=92867.195s, table=5, n_packets=939, n_bytes=247600, priority=0,ip actions=output:2
 cookie=0x0, duration=92867.201s, table=5, n_packets=1546, n_bytes=970384, priority=150,ip,nw_dst=10.1.1.0/24 actions=goto_table:6
 cookie=0x0, duration=92867.192s, table=6, n_packets=1535, n_bytes=969306, priority=200,ip,reg0=0 actions=goto_table:8
 cookie=0x4, duration=92816.770s, table=6, n_packets=0, n_bytes=0, priority=100,ip,reg0=0xa,nw_dst=10.1.1.3 actions=output:4
 cookie=0x15, duration=824.238s, table=6, n_packets=0, n_bytes=0, priority=100,ip,reg0=0xa,nw_dst=10.1.1.14 actions=output:15
 cookie=0xa424f8c, duration=193.469s, table=7, n_packets=0, n_bytes=0, priority=100,ip,nw_dst=10.1.2.0/24 actions=move:NXM_NX_REG0[]->NXM_NX_TUN_ID[0..31],set_field:10.66.79.140->tun_dst,output:1
 cookie=0xa424f8a, duration=193.498s, table=7, n_packets=0, n_bytes=0, priority=100,ip,nw_dst=10.1.0.0/24 actions=move:NXM_NX_REG0[]->NXM_NX_TUN_ID[0..31],set_field:10.66.79.138->tun_dst,output:1
 cookie=0x0, duration=824.575s, table=8, n_packets=519, n_bytes=62811, hard_timeout=900, priority=200,dl_dst=02:42:0a:01:01:0e actions=load:0->NXM_NX_TUN_IPV4_DST[],output:15
 cookie=0x0, duration=1152.116s, table=8, n_packets=3, n_bytes=126, hard_timeout=900, priority=200,dl_dst=02:42:0a:01:01:0c actions=load:0->NXM_NX_TUN_IPV4_DST[],output:13
 cookie=0x0, duration=622.705s, table=8, n_packets=1, n_bytes=42, hard_timeout=900, priority=200,dl_dst=02:42:0a:01:00:17 actions=load:0xa424f8a->NXM_NX_TUN_IPV4_DST[],output:1
 cookie=0x0, duration=841.830s, table=8, n_packets=2, n_bytes=84, hard_timeout=900, priority=200,dl_dst=be:fb:17:83:4b:93 actions=load:0->NXM_NX_TUN_IPV4_DST[],output:2
 cookie=0x0, duration=842.634s, table=8, n_packets=367, n_bytes=450763, hard_timeout=900, priority=200,dl_dst=02:42:0a:01:01:0d actions=load:0->NXM_NX_TUN_IPV4_DST[],output:14
 cookie=0xa424f8c, duration=193.457s, table=8, n_packets=0, n_bytes=0, priority=100,arp,arp_tpa=10.1.2.0/24 actions=move:NXM_NX_REG0[]->NXM_NX_TUN_ID[0..31],set_field:10.66.79.140->tun_dst,output:1
 cookie=0xa424f8a, duration=193.479s, table=8, n_packets=0, n_bytes=0, priority=100,arp,arp_tpa=10.1.0.0/24 actions=move:NXM_NX_REG0[]->NXM_NX_TUN_ID[0..31],set_field:10.66.79.138->tun_dst,output:1
 cookie=0x0, duration=92867.190s, table=8, n_packets=20, n_bytes=840, priority=0,arp actions=FLOOD


node2:
# ovs-ofctl dump-flows br0 -O OpenFlow13
OFPST_FLOW reply (OF1.3) (xid=0x2):
 cookie=0x0, duration=92893.426s, table=0, n_packets=3105, n_bytes=1208996, actions=learn(table=8,hard_timeout=900,priority=200,NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[],load:NXM_NX_TUN_IPV4_SRC[]->NXM_NX_TUN_IPV4_DST[],output:NXM_OF_IN_PORT[]),goto_table:1
 cookie=0x0, duration=92893.411s, table=1, n_packets=1196, n_bytes=226685, actions=goto_table:3
 cookie=0x0, duration=92893.419s, table=1, n_packets=0, n_bytes=0, in_port=1 actions=goto_table:2
 cookie=0x0, duration=92893.414s, table=1, n_packets=245, n_bytes=33129, in_port=9 actions=goto_table:5
 cookie=0x0, duration=92893.416s, table=1, n_packets=1593, n_bytes=946200, in_port=2 actions=goto_table:5
 cookie=0x0, duration=92893.423s, table=1, n_packets=71, n_bytes=2982, arp actions=goto_table:8
 cookie=0x0, duration=92893.403s, table=2, n_packets=0, n_bytes=0, tun_id=0 actions=goto_table:5
 cookie=0x0, duration=92893.406s, table=2, n_packets=0, n_bytes=0, priority=200,ip,nw_dst=10.1.0.1 actions=output:2
 cookie=0x0, duration=92893.401s, table=2, n_packets=0, n_bytes=0, priority=100,ip,nw_dst=10.1.0.0/24 actions=move:NXM_NX_TUN_ID[0..31]->NXM_NX_REG0[],goto_table:6
 cookie=0x0, duration=92893.409s, table=2, n_packets=0, n_bytes=0, arp actions=goto_table:8
 cookie=0x23, duration=865.018s, table=3, n_packets=389, n_bytes=51400, priority=100,ip,in_port=23,nw_src=10.1.0.24 actions=load:0xa->NXM_NX_REG0[],goto_table:4
 cookie=0x20, duration=5347.861s, table=3, n_packets=0, n_bytes=0, priority=100,ip,in_port=20,nw_src=10.1.0.21 actions=load:0xd->NXM_NX_REG0[],goto_table:4
 cookie=0x0, duration=92893.396s, table=4, n_packets=1067, n_bytes=216395, priority=0 actions=goto_table:5
 cookie=0x0, duration=6755.416s, table=4, n_packets=0, n_bytes=0, priority=200,tcp,reg0=0xd,nw_dst=172.30.99.48,tp_dst=5432 actions=output:2
 cookie=0x0, duration=6751.414s, table=4, n_packets=0, n_bytes=0, priority=200,tcp,reg0=0xd,nw_dst=172.30.3.121,tp_dst=5434 actions=output:2
 cookie=0x0, duration=6418.423s, table=4, n_packets=0, n_bytes=0, priority=200,tcp,reg0=0xd,nw_dst=172.30.41.199,tp_dst=5434 actions=output:2
 cookie=0x0, duration=1420.068s, table=4, n_packets=0, n_bytes=0, priority=200,tcp,reg0=0xd,nw_dst=172.30.180.205,tp_dst=5434 actions=output:2
 cookie=0x0, duration=6421.923s, table=4, n_packets=0, n_bytes=0, priority=200,tcp,reg0=0xd,nw_dst=172.30.164.253,tp_dst=5432 actions=output:2
 cookie=0x0, duration=10488.437s, table=4, n_packets=0, n_bytes=0, priority=200,tcp,reg0=0xd,nw_dst=172.30.155.141,tp_dst=5434 actions=output:2
 cookie=0x0, duration=6337.916s, table=4, n_packets=0, n_bytes=0, priority=200,tcp,reg0=0xd,nw_dst=172.30.161.148,tp_dst=5434 actions=output:2
 cookie=0x0, duration=1420.105s, table=4, n_packets=0, n_bytes=0, priority=200,tcp,reg0=0xa,nw_dst=172.30.0.1,tp_dst=443 actions=output:2
 cookie=0x0, duration=6341.909s, table=4, n_packets=0, n_bytes=0, priority=200,tcp,reg0=0xd,nw_dst=172.30.36.179,tp_dst=5432 actions=output:2
 cookie=0x0, duration=1420.080s, table=4, n_packets=0, n_bytes=0, priority=200,tcp,reg0=0xa,nw_dst=172.30.238.161,tp_dst=80 actions=output:2
 cookie=0x0, duration=1420.132s, table=4, n_packets=0, n_bytes=0, priority=200,tcp,reg0=0xa,nw_dst=172.30.211.182,tp_dst=5000 actions=output:2
 cookie=0x0, duration=1420.048s, table=4, n_packets=0, n_bytes=0, priority=200,tcp,reg0=0xd,nw_dst=172.30.113.237,tp_dst=5432 actions=output:2
 cookie=0x0, duration=10492.440s, table=4, n_packets=0, n_bytes=0, priority=200,tcp,reg0=0xd,nw_dst=172.30.36.251,tp_dst=5432 actions=output:2
 cookie=0x0, duration=92893.398s, table=4, n_packets=0, n_bytes=0, priority=100,ip,nw_dst=172.30.0.0/16 actions=drop
 cookie=0x0, duration=92893.393s, table=5, n_packets=201, n_bytes=30036, priority=200,ip,nw_dst=10.1.0.1 actions=output:2
 cookie=0x0, duration=92893.388s, table=5, n_packets=8, n_bytes=784, priority=100,ip,nw_dst=10.1.0.0/16 actions=goto_table:7
 cookie=0x0, duration=92893.386s, table=5, n_packets=1021, n_bytes=211712, priority=0,ip actions=output:2
 cookie=0x0, duration=92893.391s, table=5, n_packets=1589, n_bytes=945944, priority=150,ip,nw_dst=10.1.0.0/24 actions=goto_table:6
 cookie=0x0, duration=92893.383s, table=6, n_packets=1585, n_bytes=945552, priority=200,ip,reg0=0 actions=goto_table:8
 cookie=0x20, duration=5347.858s, table=6, n_packets=0, n_bytes=0, priority=100,ip,reg0=0xd,nw_dst=10.1.0.21 actions=output:20
 cookie=0x23, duration=865.016s, table=6, n_packets=0, n_bytes=0, priority=100,ip,reg0=0xa,nw_dst=10.1.0.24 actions=output:23
 cookie=0xa424f73, duration=214.995s, table=7, n_packets=8, n_bytes=784, priority=100,ip,nw_dst=10.1.1.0/24 actions=move:NXM_NX_REG0[]->NXM_NX_TUN_ID[0..31],set_field:10.66.79.115->tun_dst,output:1
 cookie=0xa424f8c, duration=214.959s, table=7, n_packets=0, n_bytes=0, priority=100,ip,nw_dst=10.1.2.0/24 actions=move:NXM_NX_REG0[]->NXM_NX_TUN_ID[0..31],set_field:10.66.79.140->tun_dst,output:1
 cookie=0x0, duration=649.014s, table=8, n_packets=0, n_bytes=0, hard_timeout=900, priority=200,dl_dst=02:42:0a:01:01:0e actions=load:0xa424f73->NXM_NX_TUN_IPV4_DST[],output:1
 cookie=0x0, duration=863.157s, table=8, n_packets=2, n_bytes=84, hard_timeout=900, priority=200,dl_dst=8e:25:c2:77:c2:b0 actions=load:0->NXM_NX_TUN_IPV4_DST[],output:2
 cookie=0x0, duration=865.354s, table=8, n_packets=513, n_bytes=60491, hard_timeout=900, priority=200,dl_dst=02:42:0a:01:00:18 actions=load:0->NXM_NX_TUN_IPV4_DST[],output:23
 cookie=0x0, duration=1165.776s, table=8, n_packets=11, n_bytes=1051, hard_timeout=900, priority=200,dl_dst=02:42:0a:01:00:17 actions=load:0->NXM_NX_TUN_IPV4_DST[],output:22
 cookie=0xa424f8c, duration=214.953s, table=8, n_packets=0, n_bytes=0, priority=100,arp,arp_tpa=10.1.2.0/24 actions=move:NXM_NX_REG0[]->NXM_NX_TUN_ID[0..31],set_field:10.66.79.140->tun_dst,output:1
 cookie=0xa424f73, duration=214.967s, table=8, n_packets=1, n_bytes=42, priority=100,arp,arp_tpa=10.1.1.0/24 actions=move:NXM_NX_REG0[]->NXM_NX_TUN_ID[0..31],set_field:10.66.79.115->tun_dst,output:1
 cookie=0x0, duration=92893.381s, table=8, n_packets=15, n_bytes=630, priority=0,arp actions=FLOOD
Comment 2 Dan Winship 2015-09-16 09:14:17 EDT
So is this the same problem as bug 1257864 then?
Comment 3 Meng Bo 2015-09-17 05:53:51 EDT
No, this still can be reproduced when using the correct openshift-sdn configuration. 
And this bug can be reproduced in all my testing environments. (Origin on KVM, OSE on OpenStack, AEP on OpenStack, Origin on Vagrant)
Comment 4 Dan Winship 2015-09-18 10:17:38 EDT
(In reply to Meng Bo from comment #0)
> 2. Create registry/router in the default project
> * should set host-network to false for router

what are the exact steps here?

> 4. Try to ping the registry/router pod from inside the user's pod

and here?


Also, if you want to beta-test our exciting new magic-debug-info-gathering-tool, download https://raw.githubusercontent.com/danwinship/openshift-sdn/debug/hack/debug.sh and try running it on the master as root. (You'll need to set things up so root@master can ssh to root@nodes; https://github.com/openshift/openshift-sdn/pull/154 gives quick-and-dirty commands for doing that in the default vagrant setup; you should be able to do something similar if the ansible setup doesn't already allow this.)
Comment 5 Meng Bo 2015-09-21 03:26:59 EDT
@Dan,

1. To create registry and router, via system-admin under default namespace:
# oadm registry --create --credentials=/root/openshift.local.config/master/openshift-registry.kubeconfig
# oadm router --create --credentials=/root/openshift.local.config/master/openshift-router.kubeconfig --latest-images --service-account=default --replicas=3 --host-network=false

# oc get po -o json |grep podIP



2. To access router/registry from user's pod, create pod via user in his own namespace

$ oc create -f https://raw.githubusercontent.com/bmeng/mytestfiles/master/pod_bmenghelloopenshift.json
$ oc rsh hello-pod
-$ ping <IP_of_router_pod>

The last step cannot be succeeded due to the network unreachable.

bash-4.3$ ping 10.1.2.9 -c 10
PING 10.1.2.9 (10.1.2.9) 56(84) bytes of data.
--- 10.1.2.9 ping statistics ---
10 packets transmitted, 0 received, 100% packet loss, time 8999ms



This will also cause the router dose not work when using --host-network=false.
Comment 6 Meng Bo 2015-09-21 03:28 EDT
Created attachment 1075408 [details]
debug_logs

Attached the information collected by the debug tool.

This tool is awesome.
Comment 7 Meng Bo 2015-09-21 04:11:22 EDT
Cannot access the service in default namespace from pod in user's project too.

# oc get svc -n default
NAME              CLUSTER_IP       EXTERNAL_IP   PORT(S)    SELECTOR                  AGE
docker-registry   172.30.206.101   <none>        5000/TCP   docker-registry=default   4m
kubernetes        172.30.0.1       <none>        443/TCP    <none>                    29m
router            172.30.3.176     <none>        80/TCP     router=router             28m

# oc get po -n u1p1
NAME                   READY     STATUS    RESTARTS   AGE
hello-nginx-docker     1/1       Running   0          7m
hello-nginx-docker-2   1/1       Running   0          7m

# oc project u1p1

# oc rsh hello-nginx-docker
Access in pod:
-$ curl 172.30.206.101:5000/v2/
curl: (7) Failed to connect to 172.30.206.101 port 5000: Connection timed out


Access on node:
[root@node1 ~]# curl 172.30.206.101:5000/v2/
{"errors":[{"code":"UNAUTHORIZED","message":"access to the requested resource is not authorized","detail":null}]}
Comment 8 Dan Winship 2015-09-21 18:01:32 EDT
OK, currently "default" is not actually being treated as an admin namespace...
Comment 9 Dan Winship 2015-09-22 08:04:18 EDT
Should be fixed in master.
Comment 10 Meng Bo 2015-09-22 23:08:05 EDT
The pods in default namespace can be reached from pods in other namespace.

But the service in default namespace still cannot be accessed, as comment#7 above.

@Dan, Do the two issues have the same root cause? Or I need to open a separate issue for the service?
Comment 11 Dan Winship 2015-09-23 13:21:36 EDT
OK, that turns out to be a separate issue, but you can keep it as part of this bugzilla bug if you want. (https://github.com/openshift/openshift-sdn/issues/158)
Comment 12 Dan Winship 2015-10-02 10:30:20 EDT
This should now be fixed in openshift-sdn master, though it hasn't yet been merged from there into origin. However, if you check out origin and openshift-sdn, and then in the openshift-sdn checkout run "./hack/sync-to-origin.sh -r PATH_TO_ORIGIN_CHECKOUT", it will copy the current state of openshift-sdn over to origin, and then you can build origin to test it
Comment 13 Meng Bo 2015-10-08 06:42:54 EDT
I have tried after sync the latest openshift-sdn code to origin, the issue has been fixed.

Will move the bug to verified once the changes are merged into origin.
Comment 14 Meng Bo 2015-10-14 05:43:43 EDT
The change has been merged into origin, both the pods and services in default namespace can be accessed by the other project now.

Verify the bug.

Note You need to log in before you can comment on or make changes to this bug.