Description of problem: With the move to shared gateway mode in OCP we changed GR to use the host ct zone to avoid conflicts with sharing the same addresses/ct entries with the host. However due to https://mail.openvswitch.org/pipermail/ovs-dev/2016-September/323402.html all entries are being flushed. OVN-Controller should only flush entries for zones it dynamically allocates that for ephemeral things like pod ips are not in use by any other entities in the system. Background on why it was added: https://bugs.launchpad.net/networking-ovn/+bug/1538696
Also, we should probably further look into why we need to flush CT zones at all. If the CT zone persists across ovn-controller restarts, then I don't see why ovn-controller would need to flush an entire zone. Perhaps on start up it should just reconcile port removal with flushing ct for those specific addresses. I'm not sure what the exact behavior is today, but it would be good if we can get some explanation in this bz. Flushing all entries on a zone during ovn-controller could lead to traffic disruption in the datapath.
Zone 0 is indeed likely to be used by the host but there's no way to tell if any of the other zones used by ovn-controller are not also used by the host. I think it should be the responsibility of the CMS to indicate that such a "shared" zone is in use. Some options: a. assume that all zone numbers explicitly requested by the CMS via LRP.options:snat-ct-zone are shared and never flush these zones. As ovn-kubernetes is likely the only current user of snat-ct-zone we could just document it and make the behavior change. b. add a new flag (ovn-controller cmdline/local ovsdb/LRP.option/etc) to mark a zone as "do not flush". The drawback is we add yet another variable to the already large OVN test matrix. (In reply to Tim Rozet from comment #1) > Also, we should probably further look into why we need to flush CT zones at > all. If the CT zone persists across ovn-controller restarts, then I don't > see why ovn-controller would need to flush an entire zone. Perhaps on start > up it should just reconcile port removal with flushing ct for those specific > addresses. I'm not sure what the exact behavior is today, but it would be > good if we can get some explanation in this bz. Flushing all entries on a > zone during ovn-controller could lead to traffic disruption in the datapath. However, ovn-controller is supposed to flush zones only the first time it starts using them. In general, the flush on zone 0 ovn-kubernetes is seeing should only happen the very first time an ovn-controller instance claims the local gateway router. The way ovn-controller operates is: 1. ovn-controller claims the gw router the very first time, so it "allocates" zone 0, flushes it and stores this information in the local ovsdb (conf.db, Bridge record). 2. ovn-controller is restarted. 3. new ovn-controller connects to the local ovsdb and to the SB. 4. new ovn-controller reconciles the gw router name + requested zone id with what's in the local ovsdb. They match so no flush happens. The fact that when ovn-kubernetes is upgraded we see zone 0 being flushed seems to indicate that either: a. the local ovsdb was deleted OR b. the local ovsdb br-int Bridge record was deleted Tim, can you please confirm what's going on with the local ovsdb in your scenario? All of the above is working fine with conditional monitoring disabled (ovn-monitor-all=true), which is how ovn-kubernetes deploys ovn. In general, in the non-ovn-kubernetes case, when we enable conditional monitoring, if ovn-controller restarts, its initial SB monitor condition is "empty" so no logical datapaths will be local. This will make ovn-controller assume that the zone mappings are stale and they'll be removed from the local DB. When the local ports and routers are reclaimed, ovn-controller reassigns (potentially different) zone ids and flushes those. We should try to fix this but it shouldn't affect ovn-kubernetes in any way. I suggest we focus on figuring out why the old ct-zone-<gw-router>-snat=0 mapping disappears from the local ovsdb or why the local ovsdb or br-int Bridge record disappear all together and lowering the priority/severity of this BZ.
Talked with Dumitru on slack and showed him this problem happens with simply restarting ovn-controller and not touching anything else. He was able to reproduce and is working on root causing. Bug should stay urgent. This problem does not happen for me on an older version of OVN: =============== ovn-controller start_controller 2022-05-18T14:17:59Z|00001|vlog|INFO|opened log file /var/log/ovn/ovn-controller.log Starting ovn-controller. run as: /usr/share/ovn/scripts/ovn-ctl --no-monitor start_controller --ovn-controller-log=-vconsole:info =============== ovn-controller ========== running 2022-05-18T14:17:59.015Z|00002|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connecting... 2022-05-18T14:17:59.015Z|00003|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connected 2022-05-18T14:17:59.017Z|00004|main|INFO|OVN internal version is : [21.03.1-20.16.1-56.0] 2022-05-18T14:17:59.017Z|00005|main|INFO|OVS IDL reconnected, force recompute. 2022-05-18T14:17:59.017Z|00006|reconnect|INFO|tcp:172.18.0.3:6642: connecting... 2022-05-18T14:17:59.017Z|00007|main|INFO|OVNSB IDL reconnected, force recompute. 2022-05-18T14:17:59.017Z|00008|reconnect|INFO|tcp:172.18.0.3:6642: connected 2022-05-18T14:17:59.027Z|00009|ofctrl|INFO|unix:/var/run/openvswitch/br-int.mgmt: connecting to switch 2022-05-18T14:17:59.027Z|00010|rconn|INFO|unix:/var/run/openvswitch/br-int.mgmt: connecting... 2022-05-18T14:17:59.027Z|00011|rconn|INFO|unix:/var/run/openvswitch/br-int.mgmt: connected 2022-05-18T14:17:59.035Z|00001|pinctrl(ovn_pinctrl0)|INFO|unix:/var/run/openvswitch/br-int.mgmt: connecting to switch 2022-05-18T14:17:59.036Z|00002|rconn(ovn_pinctrl0)|INFO|unix:/var/run/openvswitch/br-int.mgmt: connecting... 2022-05-18T14:17:59.036Z|00003|rconn(ovn_pinctrl0)|INFO|unix:/var/run/openvswitch/br-int.mgmt: connected 2022-05-18T14:18:14.048Z|00012|memory|INFO|8440 kB peak resident set size after 15.0 seconds 2022-05-18T14:18:14.048Z|00013|memory|INFO|lflow-cache-entries-cache-expr:5 lflow-cache-entries-cache-matches:409 lflow-cache-size-KB:2074
It turns out that even in the ovn-monitor-all=true case there was a bug causing ovn-controller to fail restoring CT zone with ID 0. I posted a patch to fix that: https://patchwork.ozlabs.org/project/ovn/list/?series=300921&state=*
My initial testing didn't show this because I was using a zone-id different from 0.
Clarify (In reply to Tim Rozet from comment #3) > Talked with Dumitru on slack and showed him this problem happens with simply > restarting ovn-controller and not touching anything else. He was able to > reproduce and is working on root causing. Bug should stay urgent. This > problem does not happen for me on an older version of OVN: > > =============== ovn-controller start_controller > 2022-05-18T14:17:59Z|00001|vlog|INFO|opened log file > /var/log/ovn/ovn-controller.log > Starting ovn-controller. > run as: /usr/share/ovn/scripts/ovn-ctl --no-monitor start_controller > --ovn-controller-log=-vconsole:info > =============== ovn-controller ========== running > 2022-05-18T14:17:59.015Z|00002|reconnect|INFO|unix:/var/run/openvswitch/db. > sock: connecting... > 2022-05-18T14:17:59.015Z|00003|reconnect|INFO|unix:/var/run/openvswitch/db. > sock: connected > 2022-05-18T14:17:59.017Z|00004|main|INFO|OVN internal version is : > [21.03.1-20.16.1-56.0] > 2022-05-18T14:17:59.017Z|00005|main|INFO|OVS IDL reconnected, force > recompute. > 2022-05-18T14:17:59.017Z|00006|reconnect|INFO|tcp:172.18.0.3:6642: > connecting... > 2022-05-18T14:17:59.017Z|00007|main|INFO|OVNSB IDL reconnected, force > recompute. > 2022-05-18T14:17:59.017Z|00008|reconnect|INFO|tcp:172.18.0.3:6642: connected > 2022-05-18T14:17:59.027Z|00009|ofctrl|INFO|unix:/var/run/openvswitch/br-int. > mgmt: connecting to switch > 2022-05-18T14:17:59.027Z|00010|rconn|INFO|unix:/var/run/openvswitch/br-int. > mgmt: connecting... > 2022-05-18T14:17:59.027Z|00011|rconn|INFO|unix:/var/run/openvswitch/br-int. > mgmt: connected > 2022-05-18T14:17:59.035Z|00001|pinctrl(ovn_pinctrl0)|INFO|unix:/var/run/ > openvswitch/br-int.mgmt: connecting to switch > 2022-05-18T14:17:59.036Z|00002|rconn(ovn_pinctrl0)|INFO|unix:/var/run/ > openvswitch/br-int.mgmt: connecting... > 2022-05-18T14:17:59.036Z|00003|rconn(ovn_pinctrl0)|INFO|unix:/var/run/ > openvswitch/br-int.mgmt: connected > 2022-05-18T14:18:14.048Z|00012|memory|INFO|8440 kB peak resident set size > after 15.0 seconds > 2022-05-18T14:18:14.048Z|00013|memory|INFO|lflow-cache-entries-cache-expr:5 > lflow-cache-entries-cache-matches:409 lflow-cache-size-KB:2074 It does actually happen in the older OVN version. In kind back then we were defaulting to local gw mode, so I couldn't reproduce because we don't set CT zone 0 on local gw setups.
V2: https://patchwork.ozlabs.org/project/ovn/list/?series=300935&state=*
reproduced on version: ovn-2021-21.03.0-21.el8fdp.x86_64 [root@dell-per740-53 ls-lr]# ovn-nbctl set logical_router rtr options:snat-ct-zone=0 [root@dell-per740-53 ls-lr]# ovn-appctl ct-zone-list 0307b0bd-19de-4366-a285-78a695e0a291_snat 1 vm1 2 0307b0bd-19de-4366-a285-78a695e0a291_dnat 3 [root@dell-per740-53 ls-lr]# ovn-sbctl show Chassis hv1 hostname: dell-per740-53.rhts.eng.pek2.redhat.com Encap geneve ip: "127.0.0.1" options: {csum="true"} Port_Binding vm1 [root@dell-per740-53 ls-lr]# ovn-nbctl lr-add rtr -- set logical_router rtr options:chassis=hv1 ovn-nbctl: rtr: a router with this name already exists [root@dell-per740-53 ls-lr]# ovn-nbctl set logical_router rtr options:chassis=hv1 [root@dell-per740-53 ls-lr]# ovn-appctl ct-zone-list 0275178c-a738-4620-8d07-f162c580e2d7_snat 0 rtr-ls 4 0307b0bd-19de-4366-a285-78a695e0a291_snat 1 vm1 2 ls-rtr 5 0307b0bd-19de-4366-a285-78a695e0a291_dnat 3 0275178c-a738-4620-8d07-f162c580e2d7_dnat 6 [root@dell-per740-53 ls-lr]# vn-appctl exit -bash: vn-appctl: command not found [root@dell-per740-53 ls-lr]# ovn-appctl exit [root@dell-per740-53 ls-lr]# ovs-appctl vlog/disable-rate-limit [root@dell-per740-53 ls-lr]# ovs-appctl vlog/set vconn:dbg [root@dell-per740-53 ls-lr]# > /var/lo local/ lock/ log/ [root@dell-per740-53 ls-lr]# > /var/lo local/ lock/ log/ [root@dell-per740-53 ls-lr]# > /var/log/openvswitch/ovs ovsdb-server.log ovs-vswitchd.log [root@dell-per740-53 ls-lr]# > /var/log/openvswitch/ovs ovsdb-server.log ovs-vswitchd.log [root@dell-per740-53 ls-lr]# > /var/log/openvswitch/ovs-vswitchd.log [root@dell-per740-53 ls-lr]# systemctl start ovn-controller [root@dell-per740-53 ls-lr]# grep -i flush /var/log/openvswitch/ovs-vswitchd.log |grep zone_id=0 2022-06-10T01:44:36.421Z|00056|vconn|DBG|unix#4: received: NXT_CT_FLUSH_ZONE (OF1.5) (xid=0x6): zone_id=0 -------------------should not flush this zone_id=0 verified on version: ovn-2021-21.12.0-73.el8fdp.x86_64 ovn22.03-22.03.0-52.el8fdp.x86_64 :: [ 22:03:08 ] :: [ BEGIN ] :: Running 'ovn-appctl ct-zone-list' e09c3b23-286d-40c0-a0b7-1d3463614a5f_snat 1 e09c3b23-286d-40c0-a0b7-1d3463614a5f_dnat 2 a53f9487-27e0-4d4c-884f-3bd3314bccfb_dnat 3 a53f9487-27e0-4d4c-884f-3bd3314bccfb_snat 0 :: [ 22:03:08 ] :: [ PASS ] :: Command 'ovn-appctl ct-zone-list' (Expected 0, got 0) :: [ 22:03:13 ] :: [ BEGIN ] :: Running '> /var/log/openvswitch/ovs-vswitchd.log' :: [ 22:03:13 ] :: [ PASS ] :: Command '> /var/log/openvswitch/ovs-vswitchd.log' (Expected 0, got 0) :: [ 22:03:13 ] :: [ BEGIN ] :: Running 'systemctl start ovn-controller' :: [ 22:03:13 ] :: [ PASS ] :: Command 'systemctl start ovn-controller' (Expected 0, got 0) :: [ 22:03:18 ] :: [ BEGIN ] :: Running 'grep -i flush /var/log/openvswitch/ovs-vswitchd.log |grep zone_id=0' :: [ 22:03:18 ] :: [ PASS ] :: Command 'grep -i flush /var/log/openvswitch/ovs-vswitchd.log |grep zone_id=0' (Expected 1, got 1) ------------no flush now
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (ovn bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:5446