Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1389706 - [networking_public_157] Pods cannot connect to F5 server via vxlan
[networking_public_157] Pods cannot connect to F5 server via vxlan
Status: CLOSED ERRATA
Product: OpenShift Container Platform
Classification: Red Hat
Component: Routing (Show other bugs)
3.4.0
Unspecified Unspecified
high Severity high
: ---
: ---
Assigned To: Rajat Chopra
zhaozhanqi
: TestBlocker
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2016-10-28 05:37 EDT by hongli
Modified: 2017-03-08 13 EST (History)
8 users (show)

See Also:
Fixed In Version:
Doc Type: No Doc Update
Doc Text:
undefined
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-01-18 07:47:27 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Origin (Github) 11742 None None None 2016-11-03 09:00 EDT
Origin (Github) 11817 None None None 2016-11-09 09:48 EST
Red Hat Product Errata RHBA-2017:0066 normal SHIPPED_LIVE Red Hat OpenShift Container Platform 3.4 RPM Release Advisory 2017-01-18 12:23:26 EST

  None (edit)
Description hongli 2016-10-28 05:37:33 EDT
Description of problem:
Pods cannot connect to F5 server via vxlan

Version-Release number of selected component (if applicable):
openshift v3.4.0.16+cc70b72
kubernetes v1.4.0+776c994
etcd 3.1.0-rc.0
F5 BIG-IP: 12.1.1.0.0.184

How reproducible:
always

Steps to Reproduce:
1. Create hostsubnet for f5 server
   [root@hongli-34-master ~]# oc get hostsubnet 
NAME              HOST              HOST IP           SUBNET
192.168.122.224   192.168.122.224   192.168.122.224   10.1.1.0/24
192.168.122.235   192.168.122.235   192.168.122.235   10.1.0.0/24
f5-server         f5-server         192.168.122.111   10.1.3.0/24

2. Create openshfit f5 router
oadm router f5router --replicas=1 --type=f5-router --external-host=10.66.144.115 --external-host-username=admin --external-host-password=openshiftqe --external-host-http-vserver=ose-vserver --external-host-https-vserver=https-ose-vserver --external-host-private-key=/root/.ssh/id_rsa.pub  --service-account=router --external-host-insecure=true

3. Add ENV to dc/f5router to enable vxlan connection
ROUTER_EXTERNAL_HOST_INTERNAL_ADDRESS=192.168.122.111
ROUTER_EXTERNAL_HOST_VXLAN_GW_CIDR=10.1.3.1/16

4. Create some pods and check the connection between pods and F5 server 

Actual results:
Pods cannot connect to F5 server, ping failed between them.
And no any checking steps after installing vxlan (see logs in additional info)

Expected results:
Pods should be reachable from F5 server and vice versa.
It's better to add periodically vxlan connection checking for f5 router.      

Additional info:
openshift-f5-router logs:
I1028 04:17:28.593512       1 f5.go:490] Checking and installing VxLAN setup
I1028 04:17:28.593630       1 f5.go:421] Request sent: &{POST https://10.66.144.115/mgmt/tm/net/tunnels/vxlan HTTP/1.1 1
1 map[Authorization:[Basic YWRtaW46b3BlbnNoaWZ0cWU=] Content-Type:[application/json] Accept:[application/json]]
{{"name":"vxlan-ose","partition":"/Common","floodingType":"multipoint","port":4789}} 82 [] false 10.66.144.115 map[]
map[] <nil> map[]   <nil> <nil> <nil> <nil>}
I1028 04:17:28.633234       1 f5.go:421] Request sent: &{POST https://10.66.144.115/mgmt/tm/net/tunnels/tunnel HTTP/1.1
1 1 map[Content-Type:[application/json] Accept:[application/json] Authorization:[Basic YWRtaW46b3BlbnNoaWZ0cWU=]]
{{"name":"vxlan5000","partition":"/Common","key":0,"localAddress":"192.168.122.111","mode":"bidirectional","mtu":"0","profile":"/Common/vxlan-ose","tos":"preserve","transparent":"disabled","usePmtu":"enabled"}}
208 [] false 10.66.144.115 map[] map[] <nil> map[]   <nil> <nil> <nil> <nil>}
I1028 04:17:28.663857       1 f5.go:421] Request sent: &{POST https://10.66.144.115/mgmt/tm/net/self HTTP/1.1 1 1
map[Content-Type:[application/json] Accept:[application/json] Authorization:[Basic YWRtaW46b3BlbnNoaWZ0cWU=]]
{{"name":"10.1.3.1/16","partition":"/Common","address":"10.1.3.1/16","addressSource":"from-user","floating":"disabled","inheritedTrafficGroup":"false","trafficGroup":"/Common/traffic-group-local-only","unit":0,"vlan":"/Common/vxlan5000","allowService":"all"}}
261 [] false 10.66.144.115 map[] map[] <nil> map[]   <nil> <nil> <nil> <nil>}
I1028 04:17:28.688945       1 f5.go:938] F5 initialization is complete.
Comment 3 Rajat Chopra 2016-10-31 16:08:02 EDT
Two issues possible (because it works with my setup):

1. The router does not have watchNodes capability, so it will not add any nodes to the f5 vxlan FDB
2. The f5 instance does not have the required 'sdn_services' license
Comment 5 hongli 2016-10-31 21:32:48 EDT
(In reply to Rajat Chopra from comment #3)
> Two issues possible (because it works with my setup):
> 
> 1. The router does not have watchNodes capability, so it will not add any
> nodes to the f5 vxlan FDB
> 2. The f5 instance does not have the required 'sdn_services' license

For #2, I checked the F5 license and sure the SDN service is in active modules.
For #1, I'm not sure how to check or enable router watchNodes capability. Could you give more details? And I've found many logs in f5 router pod below:

E1031 15:32:00.859667       1 reflector.go:203] github.com/openshift/origin/pkg/router/controller/factory/factory.go:76:
Failed to list *api.Node: User "system:serviceaccount:default:router" cannot list all nodes in the cluster

Maybe this is means router does not have watchNodes capability ?
Comment 6 Rajat Chopra 2016-10-31 21:40:49 EDT
Correct. The router does not have the right role to list/watch nodes. This was removed from the default system:router role, and we plan to create another role for F5 router now.
Will mark this bug fixed when I create that PR.
Comment 7 Rajat Chopra 2016-11-02 18:18:26 EDT
PR https://github.com/openshift/origin/pull/11742
Also you need to start the router with more privileges e.g. 
oadm policy add-cluster-role-to-user system:sdn-reader system:serviceaccount:default:router
Comment 8 Ben Bennett 2016-11-03 09:00:31 EDT
Rajat, I assume we are putting that in the F5 router docs?  Will you please put the link to the docs PR here too.
Comment 9 Xiaoli Tian 2016-11-03 23:23:32 EDT
Looks like PR 11742 is already merged in ocp-3.4.0.21, please give it a try.
Comment 12 Rajat Chopra 2016-11-04 19:33:26 EDT
PR https://github.com/openshift/origin/pull/11788 fixes the periodic error messages that you keep seeing on router re-launch.
Comment 14 Rajat Chopra 2016-11-07 22:34:49 EST
PR for fixing the multitenancy issue: https://github.com/openshift/origin/pull/11817
Comment 15 Troy Dawson 2016-11-09 14:55:30 EST
This has been merged into ose and is in OSE v3.4.0.24 or newer.
Comment 17 hongli 2016-11-09 21:42:54 EST
verified in 3.4.0.24 and the issue has been fixed.

test steps:
1. oc annotate hostsubnet f5-server pod.network.openshift.io/fixed-vnid-host="true"
2. restart all openshift node service
3. ovs-ofctl dump-flows -O openflow13 br0 | grep table=8
 cookie=0x0, duration=1569.500s, table=8, n_packets=133, n_bytes=5586, priority=100,arp,arp_tpa=10.1.5.0/24 actions=load:0->NXM_NX_TUN_ID[0..31],set_field:192.168.122.111->tun_dst,output:1
 cookie=0x0, duration=1569.459s, table=8, n_packets=14963, n_bytes=1091250, priority=100,ip,nw_dst=10.1.5.0/24 actions=load:0->NXM_NX_TUN_ID[0..31],set_field:192.168.122.111->tun_dst,output:1

4. ping between pods in non-default namespace and F5 is reachable.
Comment 19 errata-xmlrpc 2017-01-18 07:47:27 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:0066

Note You need to log in before you can comment on or make changes to this bug.