Bug 1389706 - [networking_public_157] Pods cannot connect to F5 server via vxlan
Summary: [networking_public_157] Pods cannot connect to F5 server via vxlan
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 3.4.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: ---
Assignee: Rajat Chopra
QA Contact: zhaozhanqi
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-10-28 09:37 UTC by Hongan Li
Modified: 2022-08-04 22:20 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
undefined
Clone Of:
Environment:
Last Closed: 2017-01-18 12:47:27 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Origin (Github) 11742 0 None None None 2016-11-03 13:00:50 UTC
Origin (Github) 11817 0 None None None 2016-11-09 14:48:40 UTC
Red Hat Product Errata RHBA-2017:0066 0 normal SHIPPED_LIVE Red Hat OpenShift Container Platform 3.4 RPM Release Advisory 2017-01-18 17:23:26 UTC

Description Hongan Li 2016-10-28 09:37:33 UTC
Description of problem:
Pods cannot connect to F5 server via vxlan

Version-Release number of selected component (if applicable):
openshift v3.4.0.16+cc70b72
kubernetes v1.4.0+776c994
etcd 3.1.0-rc.0
F5 BIG-IP: 12.1.1.0.0.184

How reproducible:
always

Steps to Reproduce:
1. Create hostsubnet for f5 server
   [root@hongli-34-master ~]# oc get hostsubnet 
NAME              HOST              HOST IP           SUBNET
192.168.122.224   192.168.122.224   192.168.122.224   10.1.1.0/24
192.168.122.235   192.168.122.235   192.168.122.235   10.1.0.0/24
f5-server         f5-server         192.168.122.111   10.1.3.0/24

2. Create openshfit f5 router
oadm router f5router --replicas=1 --type=f5-router --external-host=10.66.144.115 --external-host-username=admin --external-host-password=openshiftqe --external-host-http-vserver=ose-vserver --external-host-https-vserver=https-ose-vserver --external-host-private-key=/root/.ssh/id_rsa.pub  --service-account=router --external-host-insecure=true

3. Add ENV to dc/f5router to enable vxlan connection
ROUTER_EXTERNAL_HOST_INTERNAL_ADDRESS=192.168.122.111
ROUTER_EXTERNAL_HOST_VXLAN_GW_CIDR=10.1.3.1/16

4. Create some pods and check the connection between pods and F5 server 

Actual results:
Pods cannot connect to F5 server, ping failed between them.
And no any checking steps after installing vxlan (see logs in additional info)

Expected results:
Pods should be reachable from F5 server and vice versa.
It's better to add periodically vxlan connection checking for f5 router.      

Additional info:
openshift-f5-router logs:
I1028 04:17:28.593512       1 f5.go:490] Checking and installing VxLAN setup
I1028 04:17:28.593630       1 f5.go:421] Request sent: &{POST https://10.66.144.115/mgmt/tm/net/tunnels/vxlan HTTP/1.1 1
1 map[Authorization:[Basic YWRtaW46b3BlbnNoaWZ0cWU=] Content-Type:[application/json] Accept:[application/json]]
{{"name":"vxlan-ose","partition":"/Common","floodingType":"multipoint","port":4789}} 82 [] false 10.66.144.115 map[]
map[] <nil> map[]   <nil> <nil> <nil> <nil>}
I1028 04:17:28.633234       1 f5.go:421] Request sent: &{POST https://10.66.144.115/mgmt/tm/net/tunnels/tunnel HTTP/1.1
1 1 map[Content-Type:[application/json] Accept:[application/json] Authorization:[Basic YWRtaW46b3BlbnNoaWZ0cWU=]]
{{"name":"vxlan5000","partition":"/Common","key":0,"localAddress":"192.168.122.111","mode":"bidirectional","mtu":"0","profile":"/Common/vxlan-ose","tos":"preserve","transparent":"disabled","usePmtu":"enabled"}}
208 [] false 10.66.144.115 map[] map[] <nil> map[]   <nil> <nil> <nil> <nil>}
I1028 04:17:28.663857       1 f5.go:421] Request sent: &{POST https://10.66.144.115/mgmt/tm/net/self HTTP/1.1 1 1
map[Content-Type:[application/json] Accept:[application/json] Authorization:[Basic YWRtaW46b3BlbnNoaWZ0cWU=]]
{{"name":"10.1.3.1/16","partition":"/Common","address":"10.1.3.1/16","addressSource":"from-user","floating":"disabled","inheritedTrafficGroup":"false","trafficGroup":"/Common/traffic-group-local-only","unit":0,"vlan":"/Common/vxlan5000","allowService":"all"}}
261 [] false 10.66.144.115 map[] map[] <nil> map[]   <nil> <nil> <nil> <nil>}
I1028 04:17:28.688945       1 f5.go:938] F5 initialization is complete.

Comment 3 Rajat Chopra 2016-10-31 20:08:02 UTC
Two issues possible (because it works with my setup):

1. The router does not have watchNodes capability, so it will not add any nodes to the f5 vxlan FDB
2. The f5 instance does not have the required 'sdn_services' license

Comment 5 Hongan Li 2016-11-01 01:32:48 UTC
(In reply to Rajat Chopra from comment #3)
> Two issues possible (because it works with my setup):
> 
> 1. The router does not have watchNodes capability, so it will not add any
> nodes to the f5 vxlan FDB
> 2. The f5 instance does not have the required 'sdn_services' license

For #2, I checked the F5 license and sure the SDN service is in active modules.
For #1, I'm not sure how to check or enable router watchNodes capability. Could you give more details? And I've found many logs in f5 router pod below:

E1031 15:32:00.859667       1 reflector.go:203] github.com/openshift/origin/pkg/router/controller/factory/factory.go:76:
Failed to list *api.Node: User "system:serviceaccount:default:router" cannot list all nodes in the cluster

Maybe this is means router does not have watchNodes capability ?

Comment 6 Rajat Chopra 2016-11-01 01:40:49 UTC
Correct. The router does not have the right role to list/watch nodes. This was removed from the default system:router role, and we plan to create another role for F5 router now.
Will mark this bug fixed when I create that PR.

Comment 7 Rajat Chopra 2016-11-02 22:18:26 UTC
PR https://github.com/openshift/origin/pull/11742
Also you need to start the router with more privileges e.g. 
oadm policy add-cluster-role-to-user system:sdn-reader system:serviceaccount:default:router

Comment 8 Ben Bennett 2016-11-03 13:00:31 UTC
Rajat, I assume we are putting that in the F5 router docs?  Will you please put the link to the docs PR here too.

Comment 9 Xiaoli Tian 2016-11-04 03:23:32 UTC
Looks like PR 11742 is already merged in ocp-3.4.0.21, please give it a try.

Comment 12 Rajat Chopra 2016-11-04 23:33:26 UTC
PR https://github.com/openshift/origin/pull/11788 fixes the periodic error messages that you keep seeing on router re-launch.

Comment 14 Rajat Chopra 2016-11-08 03:34:49 UTC
PR for fixing the multitenancy issue: https://github.com/openshift/origin/pull/11817

Comment 15 Troy Dawson 2016-11-09 19:55:30 UTC
This has been merged into ose and is in OSE v3.4.0.24 or newer.

Comment 17 Hongan Li 2016-11-10 02:42:54 UTC
verified in 3.4.0.24 and the issue has been fixed.

test steps:
1. oc annotate hostsubnet f5-server pod.network.openshift.io/fixed-vnid-host="true"
2. restart all openshift node service
3. ovs-ofctl dump-flows -O openflow13 br0 | grep table=8
 cookie=0x0, duration=1569.500s, table=8, n_packets=133, n_bytes=5586, priority=100,arp,arp_tpa=10.1.5.0/24 actions=load:0->NXM_NX_TUN_ID[0..31],set_field:192.168.122.111->tun_dst,output:1
 cookie=0x0, duration=1569.459s, table=8, n_packets=14963, n_bytes=1091250, priority=100,ip,nw_dst=10.1.5.0/24 actions=load:0->NXM_NX_TUN_ID[0..31],set_field:192.168.122.111->tun_dst,output:1

4. ping between pods in non-default namespace and F5 is reachable.

Comment 19 errata-xmlrpc 2017-01-18 12:47:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:0066


Note You need to log in before you can comment on or make changes to this bug.