Assigining to @
Will use same steps in https://bugzilla.redhat.com/show_bug.cgi?id=2031141 to create LSGW cluster to verify this 4.9 bug Waiting for https://github.com/openshift/ovn-kubernetes/pull/892 to let QE create a LSGW from QE Flexy.
Verifying failed in 4.9.0-0.nightly-2022-01-19-045232 [weliang@weliang openshift-tests-private]$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.9.0-0.nightly-2022-01-19-045232 True False 54m Cluster version is 4.9.0-0.nightly-2022-01-19-045232 sh-4.4# ovn-nbctl lr-policy-list ovn_cluster_router |grep 501 501 inport == "rtos-weliang-49-b2vbf-worker-xptmt" && ip4.src == 10.131.0.29 && ip4.dst != 10.128.0.0/14 reroute 100.64.0.5 #### Expect the output being like: 501 inport == "rtos-ovn-worker" && ip4.src == $a6757214591200017842 && ip4.dst != 10.244.0.0/16
(In reply to Weibin Liang from comment #8) > Verifying failed in 4.9.0-0.nightly-2022-01-19-045232 > > [weliang@weliang openshift-tests-private]$ oc get clusterversion > NAME VERSION AVAILABLE PROGRESSING > SINCE STATUS > version 4.9.0-0.nightly-2022-01-19-045232 True False 54m > Cluster version is 4.9.0-0.nightly-2022-01-19-045232 > > sh-4.4# ovn-nbctl lr-policy-list ovn_cluster_router |grep 501 > 501 inport == "rtos-weliang-49-b2vbf-worker-xptmt" && ip4.src == > 10.131.0.29 && ip4.dst != 10.128.0.0/14 reroute > 100.64.0.5 > > > #### Expect the output being like: > 501 inport == "rtos-ovn-worker" && ip4.src == $a6757214591200017842 > && ip4.dst != 10.244.0.0/16 #### The testing cluster is LGW: [weliang@weliang openshift-tests-private]$ oc logs ovnkube-master-c97xr ovnkube-master | grep mode + gateway_mode_flags='--gateway-mode local --gateway-interface br-ex' + exec /usr/bin/ovnkube --init-master weliang-49-b2vbf-master-1 --config-file=/run/ovnkube-config/ovnkube.conf --ovn-empty-lb-events --loglevel 4 --metrics-bind-address 127.0.0.1:29102 --metrics-enable-pprof --gateway-mode local --gateway-interface br-ex --sb-address ssl:172.31.249.122:9642,ssl:172.31.249.193:9642,ssl:172.31.249.233:9642 --sb-client-privkey /ovn-cert/tls.key --sb-client-cert /ovn-cert/tls.crt --sb-client-cacert /ovn-ca/ca-bundle.crt --sb-cert-common-name ovn --nb-address ssl:172.31.249.122:9641,ssl:172.31.249.193:9641,ssl:172.31.249.233:9641 --nb-client-privkey /ovn-cert/tls.key --nb-client-cert /ovn-cert/tls.crt --nb-client-cacert /ovn-ca/ca-bundle.crt --nbctl-daemon-mode --nb-cert-common-name ovn --enable-multicast --acl-logging-rate-limit 20 I0119 21:15:43.321608 1 master.go:112] Lost the election to weliang-49-b2vbf-master-0; in standby mode
Still failed in 4.9.0-0.nightly-2022-01-20-052841 [weliang@weliang Test]$ oc get pod -n exgw2 NAME READY STATUS RESTARTS AGE podexgw2 1/1 Running 0 10s [weliang@weliang Test]$ [weliang@weliang Test]$ oc rsh ovnkube-master-tv6bc Defaulted container "northd" out of: northd, nbdb, kube-rbac-proxy, sbdb, ovnkube-master, ovn-dbchecker sh-4.4# ovn-nbctl list logical_router_port rtos-weliang-491-ztnxh-worker-knmnz _uuid : af09ba13-31c2-4ee7-818c-b589be41105f enabled : [] external_ids : {} gateway_chassis : [] ha_chassis_group : [] ipv6_prefix : [] ipv6_ra_configs : {} mac : "0a:58:0a:83:00:01" name : rtos-weliang-491-ztnxh-worker-knmnz networks : ["10.131.0.1/23"] options : {} peer : [] sh-4.4# ovn-nbctl lr-policy-list ovn_cluster_router |grep 501 501 inport == "rtos-weliang-491-ztnxh-worker-knmnz" && ip4.src == 10.131.0.15 && ip4.dst != 10.128.0.0/14 reroute 100.64.0.5 sh-4.4#
Hi Weibin, Using address set for ipv4.src match criteria was only done in 4.10. Therefore you should expect to only see this format: 501 inport == "rtos-weliang-49-b2vbf-worker-xptmt" && ip4.src == 10.131.0.29 && ip4.dst != 10.128.0.0/14 reroute 100.64.0.5 For verifying this in 4.9 I think you can do: 1. add a stale/dummy logical route policy to ovn_cluster_router like: [root@ovn-control-plane ~]# ovn-nbctl lr-policy-add ovn_cluster_router 501 "ip4.src==254.254.254.254" reroute 10.244.1.2 2. Ensure the dummy policy was added: [root@ovn-control-plane ~]# ovn-nbctl lr-policy-list ovn_cluster_router | grep 501 501 ip4.src==254.254.254.254 reroute 10.244.1.2 3. Kill ovnkube-master leader pod 4. Verify on restart/standby takeover that it removes the dummy policy: [root@ovn-control-plane ~]# ovn-nbctl lr-policy-list ovn_cluster_router | grep 501 [root@ovn-control-plane ~]#
Test failed and new ovnkube-master leader pod does not removes the dummy policy Tested in OVN-LGW with 4.9.0-0.nightly-2022-02-02-073652 ## ovnkube-master-29g5s is ovnkube-master leader pod [weliang@weliang ~]$ oc rsh ovnkube-master-29g5s sh-4.4# ovn-nbctl lr-policy-list ovn_cluster_router |grep 501 501 inport == "rtos-weliang-221-s9j2s-worker-88hv6" && ip4.src == 10.128.2.31 && ip4.dst != 10.128.0.0/14 reroute 100.64.0.6 sh-4.4# ovn-nbctl lr-policy-add ovn_cluster_router 501 "ip4.src==254.254.254.254" reroute 10.244.1.2 sh-4.4# ovn-nbctl lr-policy-list ovn_cluster_router | grep 501 501 inport == "rtos-weliang-221-s9j2s-worker-88hv6" && ip4.src == 10.128.2.31 && ip4.dst != 10.128.0.0/14 reroute 100.64.0.6 501 ip4.src==254.254.254.254 reroute 10.244.1.2 sh-4.4# exit ## Kill ovnkube-master leader pod [weliang@weliang ~]$ oc delete pod ovnkube-master-29g5s pod "ovnkube-master-29g5s" deleted ## now ovnkube-master-crvn6 is ovnkube-master leader pod [weliang@weliang ~]$ oc rsh ovnkube-master-crvn6 Defaulted container "northd" out of: northd, nbdb, kube-rbac-proxy, sbdb, ovnkube-master, ovn-dbchecker sh-4.4# ovn-nbctl lr-policy-list ovn_cluster_router | grep 501 501 inport == "rtos-weliang-221-s9j2s-worker-88hv6" && ip4.src == 10.128.2.31 && ip4.dst != 10.128.0.0/14 reroute 100.64.0.6 501 ip4.src==254.254.254.254 reroute 10.244.1.2 sh-4.4#
Sorry Weibin, I gave you the wrong procedure. The right one is the one I originally listed: To verify this the cluster MUST be deployed with local gateway mode (aka route via host in new 4.10 API). Shared gateway mode (default) does not need/use 501 logical route policies. 1. Create an ICNI enabled namespace: apiVersion: v1 kind: Namespace metadata: name: exgw2 annotations: k8s.ovn.org/routing-external-gws: 172.18.0.4,172.18.0.5,172.18.0.6 2. Create an ICNI enabled pod: [trozet@fedora contrib]$ cat ~/exgw_basic1.yaml --- apiVersion: v1 kind: Pod metadata: namespace: exgw2 name: podexgw2 labels: role: webserver pod-name: client spec: containers: - name: podexgw2 image: docker.io/centos/tools:latest command: - /sbin/init nodeSelector: kubernetes.io/hostname: ovn-worker 3. Verify the 501 policy is created: [root@ovn-control-plane ~]# kubectl get pod -n exgw2 -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES podexgw2 1/1 Running 0 2m4s 10.244.0.3 ovn-worker <none> <none> [root@ovn-control-plane ~]# ovn-nbctl lr-policy-list ovn_cluster_router |grep 501 501 inport == "rtos-ovn-worker" && ip4.src == $a6757214591200017842 && ip4.dst != 10.244.0.0/16 reroute 100.64.0.3 [root@ovn-control-plane ~]# ovn-nbctl list address_set a6757214591200017842 _uuid : 9523e2b2-7351-4b0e-b733-2cdfc6553608 addresses : ["10.244.0.3"] external_ids : {name=hybrid-route-pods-ovn-worker_v4} name : a6757214591200017842 4.Modify the route to point to some other dummy GW: [root@ovn-control-plane ~]# ovn-nbctl find logical_router_policy priority=501 _uuid : 34407b28-8539-4e0c-94e8-96f6e2029359 action : reroute external_ids : {} match : "inport == \"rtos-ovn-worker\" && ip4.src == $a6757214591200017842 && ip4.dst != 10.244.0.0/16" nexthop : [] nexthops : ["100.64.0.3"] options : {} priority : 501 [root@ovn-control-plane ~]# ovn-nbctl set logical_router_policy 34407b28-8539-4e0c-94e8-96f6e2029359 nexthops=254.254.254.254 [root@ovn-control-plane ~]# ovn-nbctl find logical_router_policy priority=501 _uuid : 34407b28-8539-4e0c-94e8-96f6e2029359 action : reroute external_ids : {} match : "inport == \"rtos-ovn-worker\" && ip4.src == $a6757214591200017842 && ip4.dst != 10.244.0.0/16" nexthop : [] nexthops : ["254.254.254.254"] options : {} priority : 501 5. Now update the exgw2 namespace and add/remove a gateway ( to cause an update): [trozet@fedora contrib]$ kubectl edit ns exgw2 namespace/exgw2 edited [trozet@fedora contrib]$ kubectl get ns exgw2 -o yaml apiVersion: v1 kind: Namespace metadata: annotations: k8s.ovn.org/routing-external-gws: 172.18.0.4,172.18.0.5 6. The logical route policy should have been modified to point back to 100.64.0.3: [root@ovn-control-plane ~]# ovn-nbctl find logical_router_policy priority=501 _uuid : 2ed85e26-e13a-434a-99ee-f57f8e429d3e action : reroute external_ids : {} match : "inport == \"rtos-ovn-worker\" && ip4.src == $a6757214591200017842 && ip4.dst != 10.244.0.0/16" nexthop : [] nexthops : ["100.64.0.3"] options : {} priority : 501 I tested this on 4.9 nightly: [root@ip-10-0-195-128 ~]# ovn-nbctl find logical_router_policy priority=501 _uuid : 8a8c9781-0d05-4e39-93ea-7886cd7d7ea3 action : reroute external_ids : {} match : "inport == \"rtos-ip-10-0-188-42.us-west-2.compute.internal\" && ip4.src == 10.128.2.16 && ip4.dst != 10.128.0.0/14" nexthop : [] nexthops : ["100.64.0.6"] <-------------------Correct nexthop options : {} priority : 501 Update to invalid nexthop: [root@ip-10-0-195-128 ~]# ovn-nbctl set logical_router_policy 8a8c9781-0d05-4e39-93ea-7886cd7d7ea3 nexthops=254.254.254.254 [root@ip-10-0-195-128 ~]# ovn-nbctl find logical_router_policy priority=501 _uuid : 8a8c9781-0d05-4e39-93ea-7886cd7d7ea3 action : reroute external_ids : {} match : "inport == \"rtos-ip-10-0-188-42.us-west-2.compute.internal\" && ip4.src == 10.128.2.16 && ip4.dst != 10.128.0.0/14" nexthop : [] nexthops : ["254.254.254.254"] <------------------ invalid nexthop options : {} priority : 501 Edit the exgw2 namespace. Then recheck OVN: [root@ip-10-0-195-128 ~]# ovn-nbctl find logical_router_policy priority=501 _uuid : e163005d-a5f2-4536-82f1-9c9d71b1d9fc action : reroute external_ids : {} match : "inport == \"rtos-ip-10-0-188-42.us-west-2.compute.internal\" && ip4.src == 10.128.2.16 && ip4.dst != 10.128.0.0/14" nexthop : [] nexthops : ["100.64.0.6"] <------------------- nexthop is correct again options : {} priority : 501
Thanks Tim for your detail steps. Follow your 4.9 nightly testing steps, testing passed in 4.9.0-0.nightly-2022-02-15-094518 [weliang@weliang networking]$ oc rsh ovnkube-master-tkhbz Defaulted container "northd" out of: northd, nbdb, kube-rbac-proxy, sbdb, ovnkube-master, ovn-dbchecker sh-4.4# ovn-nbctl lr-policy-list ovn_cluster_router |grep 501 501 inport == "rtos-weliang-151-rb5vt-worker-rkl4d" && ip4.src == 10.131.0.21 && ip4.dst != 10.128.0.0/14 reroute 100.64.0.5 sh-4.4# ovn-nbctl find logical_router_policy priority=501 _uuid : 4e7cf730-b134-4ab4-a7e0-e7fb371ac210 action : reroute external_ids : {} match : "inport == \"rtos-weliang-151-rb5vt-worker-rkl4d\" && ip4.src == 10.131.0.21 && ip4.dst != 10.128.0.0/14" nexthop : [] nexthops : ["100.64.0.5"] options : {} priority : 501 sh-4.4# ovn-nbctl set logical_router_policy 4e7cf730-b134-4ab4-a7e0-e7fb371ac210 nexthops=254.254.254.254 sh-4.4# ovn-nbctl find logical_router_policy priority=501 _uuid : 4e7cf730-b134-4ab4-a7e0-e7fb371ac210 action : reroute external_ids : {} match : "inport == \"rtos-weliang-151-rb5vt-worker-rkl4d\" && ip4.src == 10.131.0.21 && ip4.dst != 10.128.0.0/14" nexthop : [] nexthops : ["254.254.254.254"] options : {} priority : 501 sh-4.4# kubectl edit ns exgw2 namespace/exgw2 edited sh-4.4# ovn-nbctl find logical_router_policy priority=501 _uuid : fb6fcd62-e9c3-4a56-8aa3-3d71e62685e6 action : reroute external_ids : {} match : "inport == \"rtos-weliang-151-rb5vt-worker-rkl4d\" && ip4.src == 10.131.0.21 && ip4.dst != 10.128.0.0/14" nexthop : [] nexthops : ["100.64.0.5"] options : {} priority : 501 sh-4.4#
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.9.22 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0561