Bug 2087194

Summary: ovn-controller flushes all conntrack entries in ct zone 0 (host)
Product: Red Hat Enterprise Linux Fast Datapath Reporter: Tim Rozet <trozet>
Component: OVNAssignee: Dumitru Ceara <dceara>
Status: CLOSED ERRATA QA Contact: ying xu <yinxu>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: FDP 19.03CC: ctrautma, dceara, jiji, jishi, miabbott
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ovn-2021-21.12.0-58.el8fdp ovn22.03-22.03.0-37.el8fdp Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-06-30 18:00:08 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2081069    

Description Tim Rozet 2022-05-17 14:32:54 UTC
Description of problem:
With the move to shared gateway mode in OCP we changed GR to use the host ct zone to avoid conflicts with sharing the same addresses/ct entries with the host. However due to 
https://mail.openvswitch.org/pipermail/ovs-dev/2016-September/323402.html

all entries are being flushed. OVN-Controller should only flush entries for zones it dynamically allocates that for ephemeral things like pod ips are not in use by any other entities in the system. Background on why it was added:

https://bugs.launchpad.net/networking-ovn/+bug/1538696

Comment 1 Tim Rozet 2022-05-17 19:04:18 UTC
Also, we should probably further look into why we need to flush CT zones at all. If the CT zone persists across ovn-controller restarts, then I don't see why ovn-controller would need to flush an entire zone. Perhaps on start up it should just reconcile port removal with flushing ct for those specific addresses. I'm not sure what the exact behavior is today, but it would be good if we can get some explanation in this bz. Flushing all entries on a zone during ovn-controller could lead to traffic disruption in the datapath.

Comment 2 Dumitru Ceara 2022-05-18 10:09:03 UTC
Zone 0 is indeed likely to be used by the host but there's no way to
tell if any of the other zones used by ovn-controller are not also used
by the host. I think it should be the responsibility of the CMS to
indicate that such a "shared" zone is in use.  Some options:

a. assume that all zone numbers explicitly requested by the CMS via
LRP.options:snat-ct-zone are shared and never flush these zones. As
ovn-kubernetes is likely the only current user of snat-ct-zone we
could just document it and make the behavior change.

b. add a new flag (ovn-controller cmdline/local ovsdb/LRP.option/etc)
   to mark a zone as "do not flush". The drawback is we add yet another
   variable to the already large OVN test matrix.

(In reply to Tim Rozet from comment #1)
> Also, we should probably further look into why we need to flush CT zones at
> all. If the CT zone persists across ovn-controller restarts, then I don't
> see why ovn-controller would need to flush an entire zone. Perhaps on start
> up it should just reconcile port removal with flushing ct for those specific
> addresses. I'm not sure what the exact behavior is today, but it would be
> good if we can get some explanation in this bz. Flushing all entries on a
> zone during ovn-controller could lead to traffic disruption in the datapath.

However, ovn-controller is supposed to flush zones only the first time
it starts using them. In general, the flush on zone 0 ovn-kubernetes is
seeing should only happen the very first time an ovn-controller instance
claims the local gateway router. The way ovn-controller operates is:

1. ovn-controller claims the gw router the very first time, so it
   "allocates" zone 0, flushes it and stores this information in the
   local ovsdb (conf.db, Bridge record).
2. ovn-controller is restarted.
3. new ovn-controller connects to the local ovsdb and to the SB.
4. new ovn-controller reconciles the gw router name + requested zone id
   with what's in the local ovsdb.  They match so no flush happens.

The fact that when ovn-kubernetes is upgraded we see zone 0 being
flushed seems to indicate that either:
a. the local ovsdb was deleted
OR
b. the local ovsdb br-int Bridge record was deleted

Tim, can you please confirm what's going on with the local ovsdb in your
scenario?

All of the above is working fine with conditional monitoring disabled
(ovn-monitor-all=true), which is how ovn-kubernetes deploys ovn.

In general, in the non-ovn-kubernetes case, when we enable conditional
monitoring, if ovn-controller restarts, its initial SB monitor condition
is "empty" so no logical datapaths will be local. This will make
ovn-controller assume that the zone mappings are stale and they'll be
removed from the local DB.  When the local ports and routers are
reclaimed, ovn-controller reassigns (potentially different) zone ids and
flushes those.

We should try to fix this but it shouldn't affect ovn-kubernetes in any
way. I suggest we focus on figuring out why the old
ct-zone-<gw-router>-snat=0 mapping disappears from the
local ovsdb or why the local ovsdb or br-int Bridge record disappear
all together and lowering the priority/severity of this BZ.

Comment 3 Tim Rozet 2022-05-18 14:31:57 UTC
Talked with Dumitru on slack and showed him this problem happens with simply restarting ovn-controller and not touching anything else. He was able to reproduce and is working on root causing. Bug should stay urgent. This problem does not happen for me on an older version of OVN:

=============== ovn-controller  start_controller
2022-05-18T14:17:59Z|00001|vlog|INFO|opened log file /var/log/ovn/ovn-controller.log
Starting ovn-controller.
run as: /usr/share/ovn/scripts/ovn-ctl --no-monitor start_controller --ovn-controller-log=-vconsole:info
=============== ovn-controller ========== running
2022-05-18T14:17:59.015Z|00002|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connecting...
2022-05-18T14:17:59.015Z|00003|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connected
2022-05-18T14:17:59.017Z|00004|main|INFO|OVN internal version is : [21.03.1-20.16.1-56.0]
2022-05-18T14:17:59.017Z|00005|main|INFO|OVS IDL reconnected, force recompute.
2022-05-18T14:17:59.017Z|00006|reconnect|INFO|tcp:172.18.0.3:6642: connecting...
2022-05-18T14:17:59.017Z|00007|main|INFO|OVNSB IDL reconnected, force recompute.
2022-05-18T14:17:59.017Z|00008|reconnect|INFO|tcp:172.18.0.3:6642: connected
2022-05-18T14:17:59.027Z|00009|ofctrl|INFO|unix:/var/run/openvswitch/br-int.mgmt: connecting to switch
2022-05-18T14:17:59.027Z|00010|rconn|INFO|unix:/var/run/openvswitch/br-int.mgmt: connecting...
2022-05-18T14:17:59.027Z|00011|rconn|INFO|unix:/var/run/openvswitch/br-int.mgmt: connected
2022-05-18T14:17:59.035Z|00001|pinctrl(ovn_pinctrl0)|INFO|unix:/var/run/openvswitch/br-int.mgmt: connecting to switch
2022-05-18T14:17:59.036Z|00002|rconn(ovn_pinctrl0)|INFO|unix:/var/run/openvswitch/br-int.mgmt: connecting...
2022-05-18T14:17:59.036Z|00003|rconn(ovn_pinctrl0)|INFO|unix:/var/run/openvswitch/br-int.mgmt: connected
2022-05-18T14:18:14.048Z|00012|memory|INFO|8440 kB peak resident set size after 15.0 seconds
2022-05-18T14:18:14.048Z|00013|memory|INFO|lflow-cache-entries-cache-expr:5 lflow-cache-entries-cache-matches:409 lflow-cache-size-KB:2074

Comment 4 Dumitru Ceara 2022-05-18 16:12:33 UTC
It turns out that even in the ovn-monitor-all=true case there was a bug causing ovn-controller to fail restoring CT zone with ID 0.

I posted a patch to fix that:
https://patchwork.ozlabs.org/project/ovn/list/?series=300921&state=*

Comment 5 Dumitru Ceara 2022-05-18 16:13:11 UTC
My initial testing didn't show this because I was using a zone-id different from 0.

Comment 6 Tim Rozet 2022-05-18 16:55:02 UTC
Clarify (In reply to Tim Rozet from comment #3)
> Talked with Dumitru on slack and showed him this problem happens with simply
> restarting ovn-controller and not touching anything else. He was able to
> reproduce and is working on root causing. Bug should stay urgent. This
> problem does not happen for me on an older version of OVN:
> 
> =============== ovn-controller  start_controller
> 2022-05-18T14:17:59Z|00001|vlog|INFO|opened log file
> /var/log/ovn/ovn-controller.log
> Starting ovn-controller.
> run as: /usr/share/ovn/scripts/ovn-ctl --no-monitor start_controller
> --ovn-controller-log=-vconsole:info
> =============== ovn-controller ========== running
> 2022-05-18T14:17:59.015Z|00002|reconnect|INFO|unix:/var/run/openvswitch/db.
> sock: connecting...
> 2022-05-18T14:17:59.015Z|00003|reconnect|INFO|unix:/var/run/openvswitch/db.
> sock: connected
> 2022-05-18T14:17:59.017Z|00004|main|INFO|OVN internal version is :
> [21.03.1-20.16.1-56.0]
> 2022-05-18T14:17:59.017Z|00005|main|INFO|OVS IDL reconnected, force
> recompute.
> 2022-05-18T14:17:59.017Z|00006|reconnect|INFO|tcp:172.18.0.3:6642:
> connecting...
> 2022-05-18T14:17:59.017Z|00007|main|INFO|OVNSB IDL reconnected, force
> recompute.
> 2022-05-18T14:17:59.017Z|00008|reconnect|INFO|tcp:172.18.0.3:6642: connected
> 2022-05-18T14:17:59.027Z|00009|ofctrl|INFO|unix:/var/run/openvswitch/br-int.
> mgmt: connecting to switch
> 2022-05-18T14:17:59.027Z|00010|rconn|INFO|unix:/var/run/openvswitch/br-int.
> mgmt: connecting...
> 2022-05-18T14:17:59.027Z|00011|rconn|INFO|unix:/var/run/openvswitch/br-int.
> mgmt: connected
> 2022-05-18T14:17:59.035Z|00001|pinctrl(ovn_pinctrl0)|INFO|unix:/var/run/
> openvswitch/br-int.mgmt: connecting to switch
> 2022-05-18T14:17:59.036Z|00002|rconn(ovn_pinctrl0)|INFO|unix:/var/run/
> openvswitch/br-int.mgmt: connecting...
> 2022-05-18T14:17:59.036Z|00003|rconn(ovn_pinctrl0)|INFO|unix:/var/run/
> openvswitch/br-int.mgmt: connected
> 2022-05-18T14:18:14.048Z|00012|memory|INFO|8440 kB peak resident set size
> after 15.0 seconds
> 2022-05-18T14:18:14.048Z|00013|memory|INFO|lflow-cache-entries-cache-expr:5
> lflow-cache-entries-cache-matches:409 lflow-cache-size-KB:2074

It does actually happen in the older OVN version. In kind back then we were defaulting to local gw mode, so I couldn't reproduce because we don't set CT zone 0 on local gw setups.

Comment 10 ying xu 2022-06-10 02:31:45 UTC
reproduced on version:
ovn-2021-21.03.0-21.el8fdp.x86_64

[root@dell-per740-53 ls-lr]# ovn-nbctl set logical_router rtr options:snat-ct-zone=0
[root@dell-per740-53 ls-lr]# ovn-appctl ct-zone-list
0307b0bd-19de-4366-a285-78a695e0a291_snat 1
vm1 2
0307b0bd-19de-4366-a285-78a695e0a291_dnat 3
[root@dell-per740-53 ls-lr]# ovn-sbctl show
Chassis hv1
    hostname: dell-per740-53.rhts.eng.pek2.redhat.com
    Encap geneve
        ip: "127.0.0.1"
        options: {csum="true"}
    Port_Binding vm1
[root@dell-per740-53 ls-lr]# ovn-nbctl lr-add rtr -- set logical_router rtr options:chassis=hv1
ovn-nbctl: rtr: a router with this name already exists
[root@dell-per740-53 ls-lr]# ovn-nbctl  set logical_router rtr options:chassis=hv1
[root@dell-per740-53 ls-lr]# ovn-appctl ct-zone-list
0275178c-a738-4620-8d07-f162c580e2d7_snat 0
rtr-ls 4
0307b0bd-19de-4366-a285-78a695e0a291_snat 1
vm1 2
ls-rtr 5
0307b0bd-19de-4366-a285-78a695e0a291_dnat 3
0275178c-a738-4620-8d07-f162c580e2d7_dnat 6
[root@dell-per740-53 ls-lr]# vn-appctl exit
-bash: vn-appctl: command not found
[root@dell-per740-53 ls-lr]# ovn-appctl exit
[root@dell-per740-53 ls-lr]# ovs-appctl vlog/disable-rate-limit
[root@dell-per740-53 ls-lr]# ovs-appctl vlog/set vconn:dbg
[root@dell-per740-53 ls-lr]# > /var/lo
local/ lock/  log/   
[root@dell-per740-53 ls-lr]# > /var/lo
local/ lock/  log/   
[root@dell-per740-53 ls-lr]# > /var/log/openvswitch/ovs
ovsdb-server.log  ovs-vswitchd.log  
[root@dell-per740-53 ls-lr]# > /var/log/openvswitch/ovs
ovsdb-server.log  ovs-vswitchd.log  
[root@dell-per740-53 ls-lr]# > /var/log/openvswitch/ovs-vswitchd.log 
[root@dell-per740-53 ls-lr]# systemctl start ovn-controller
[root@dell-per740-53 ls-lr]# grep -i flush /var/log/openvswitch/ovs-vswitchd.log |grep zone_id=0
2022-06-10T01:44:36.421Z|00056|vconn|DBG|unix#4: received: NXT_CT_FLUSH_ZONE (OF1.5) (xid=0x6): zone_id=0  -------------------should not flush this zone_id=0

verified on version:
ovn-2021-21.12.0-73.el8fdp.x86_64

ovn22.03-22.03.0-52.el8fdp.x86_64

:: [ 22:03:08 ] :: [  BEGIN   ] :: Running 'ovn-appctl ct-zone-list'
e09c3b23-286d-40c0-a0b7-1d3463614a5f_snat 1
e09c3b23-286d-40c0-a0b7-1d3463614a5f_dnat 2
a53f9487-27e0-4d4c-884f-3bd3314bccfb_dnat 3
a53f9487-27e0-4d4c-884f-3bd3314bccfb_snat 0
:: [ 22:03:08 ] :: [   PASS   ] :: Command 'ovn-appctl ct-zone-list' (Expected 0, got 0)
:: [ 22:03:13 ] :: [  BEGIN   ] :: Running '> /var/log/openvswitch/ovs-vswitchd.log'
:: [ 22:03:13 ] :: [   PASS   ] :: Command '> /var/log/openvswitch/ovs-vswitchd.log' (Expected 0, got 0)
:: [ 22:03:13 ] :: [  BEGIN   ] :: Running 'systemctl start ovn-controller'
:: [ 22:03:13 ] :: [   PASS   ] :: Command 'systemctl start ovn-controller' (Expected 0, got 0)
:: [ 22:03:18 ] :: [  BEGIN   ] :: Running 'grep -i flush /var/log/openvswitch/ovs-vswitchd.log |grep zone_id=0'
:: [ 22:03:18 ] :: [   PASS   ] :: Command 'grep -i flush /var/log/openvswitch/ovs-vswitchd.log |grep zone_id=0' (Expected 1, got 1)  ------------no flush now

Comment 12 errata-xmlrpc 2022-06-30 18:00:08 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (ovn bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:5446