Description of problem: Pods cannot connect to F5 server via vxlan Version-Release number of selected component (if applicable): openshift v3.4.0.16+cc70b72 kubernetes v1.4.0+776c994 etcd 3.1.0-rc.0 F5 BIG-IP: 12.1.1.0.0.184 How reproducible: always Steps to Reproduce: 1. Create hostsubnet for f5 server [root@hongli-34-master ~]# oc get hostsubnet NAME HOST HOST IP SUBNET 192.168.122.224 192.168.122.224 192.168.122.224 10.1.1.0/24 192.168.122.235 192.168.122.235 192.168.122.235 10.1.0.0/24 f5-server f5-server 192.168.122.111 10.1.3.0/24 2. Create openshfit f5 router oadm router f5router --replicas=1 --type=f5-router --external-host=10.66.144.115 --external-host-username=admin --external-host-password=openshiftqe --external-host-http-vserver=ose-vserver --external-host-https-vserver=https-ose-vserver --external-host-private-key=/root/.ssh/id_rsa.pub --service-account=router --external-host-insecure=true 3. Add ENV to dc/f5router to enable vxlan connection ROUTER_EXTERNAL_HOST_INTERNAL_ADDRESS=192.168.122.111 ROUTER_EXTERNAL_HOST_VXLAN_GW_CIDR=10.1.3.1/16 4. Create some pods and check the connection between pods and F5 server Actual results: Pods cannot connect to F5 server, ping failed between them. And no any checking steps after installing vxlan (see logs in additional info) Expected results: Pods should be reachable from F5 server and vice versa. It's better to add periodically vxlan connection checking for f5 router. Additional info: openshift-f5-router logs: I1028 04:17:28.593512 1 f5.go:490] Checking and installing VxLAN setup I1028 04:17:28.593630 1 f5.go:421] Request sent: &{POST https://10.66.144.115/mgmt/tm/net/tunnels/vxlan HTTP/1.1 1 1 map[Authorization:[Basic YWRtaW46b3BlbnNoaWZ0cWU=] Content-Type:[application/json] Accept:[application/json]] {{"name":"vxlan-ose","partition":"/Common","floodingType":"multipoint","port":4789}} 82 [] false 10.66.144.115 map[] map[] <nil> map[] <nil> <nil> <nil> <nil>} I1028 04:17:28.633234 1 f5.go:421] Request sent: &{POST https://10.66.144.115/mgmt/tm/net/tunnels/tunnel HTTP/1.1 1 1 map[Content-Type:[application/json] Accept:[application/json] Authorization:[Basic YWRtaW46b3BlbnNoaWZ0cWU=]] {{"name":"vxlan5000","partition":"/Common","key":0,"localAddress":"192.168.122.111","mode":"bidirectional","mtu":"0","profile":"/Common/vxlan-ose","tos":"preserve","transparent":"disabled","usePmtu":"enabled"}} 208 [] false 10.66.144.115 map[] map[] <nil> map[] <nil> <nil> <nil> <nil>} I1028 04:17:28.663857 1 f5.go:421] Request sent: &{POST https://10.66.144.115/mgmt/tm/net/self HTTP/1.1 1 1 map[Content-Type:[application/json] Accept:[application/json] Authorization:[Basic YWRtaW46b3BlbnNoaWZ0cWU=]] {{"name":"10.1.3.1/16","partition":"/Common","address":"10.1.3.1/16","addressSource":"from-user","floating":"disabled","inheritedTrafficGroup":"false","trafficGroup":"/Common/traffic-group-local-only","unit":0,"vlan":"/Common/vxlan5000","allowService":"all"}} 261 [] false 10.66.144.115 map[] map[] <nil> map[] <nil> <nil> <nil> <nil>} I1028 04:17:28.688945 1 f5.go:938] F5 initialization is complete.
Two issues possible (because it works with my setup): 1. The router does not have watchNodes capability, so it will not add any nodes to the f5 vxlan FDB 2. The f5 instance does not have the required 'sdn_services' license
(In reply to Rajat Chopra from comment #3) > Two issues possible (because it works with my setup): > > 1. The router does not have watchNodes capability, so it will not add any > nodes to the f5 vxlan FDB > 2. The f5 instance does not have the required 'sdn_services' license For #2, I checked the F5 license and sure the SDN service is in active modules. For #1, I'm not sure how to check or enable router watchNodes capability. Could you give more details? And I've found many logs in f5 router pod below: E1031 15:32:00.859667 1 reflector.go:203] github.com/openshift/origin/pkg/router/controller/factory/factory.go:76: Failed to list *api.Node: User "system:serviceaccount:default:router" cannot list all nodes in the cluster Maybe this is means router does not have watchNodes capability ?
Correct. The router does not have the right role to list/watch nodes. This was removed from the default system:router role, and we plan to create another role for F5 router now. Will mark this bug fixed when I create that PR.
PR https://github.com/openshift/origin/pull/11742 Also you need to start the router with more privileges e.g. oadm policy add-cluster-role-to-user system:sdn-reader system:serviceaccount:default:router
Rajat, I assume we are putting that in the F5 router docs? Will you please put the link to the docs PR here too.
Looks like PR 11742 is already merged in ocp-3.4.0.21, please give it a try.
PR https://github.com/openshift/origin/pull/11788 fixes the periodic error messages that you keep seeing on router re-launch.
PR for fixing the multitenancy issue: https://github.com/openshift/origin/pull/11817
This has been merged into ose and is in OSE v3.4.0.24 or newer.
verified in 3.4.0.24 and the issue has been fixed. test steps: 1. oc annotate hostsubnet f5-server pod.network.openshift.io/fixed-vnid-host="true" 2. restart all openshift node service 3. ovs-ofctl dump-flows -O openflow13 br0 | grep table=8 cookie=0x0, duration=1569.500s, table=8, n_packets=133, n_bytes=5586, priority=100,arp,arp_tpa=10.1.5.0/24 actions=load:0->NXM_NX_TUN_ID[0..31],set_field:192.168.122.111->tun_dst,output:1 cookie=0x0, duration=1569.459s, table=8, n_packets=14963, n_bytes=1091250, priority=100,ip,nw_dst=10.1.5.0/24 actions=load:0->NXM_NX_TUN_ID[0..31],set_field:192.168.122.111->tun_dst,output:1 4. ping between pods in non-default namespace and F5 is reachable.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:0066