Bug 2224199
| Summary: | ovn-controller replace CT zone UUID names with LR/LS names | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux Fast Datapath | Reporter: | Surya Seetharaman <surya> |
| Component: | ovn23.09 | Assignee: | Ales Musil <amusil> |
| Status: | CLOSED ERRATA | QA Contact: | Jianlin Shi <jishi> |
| Severity: | urgent | Docs Contact: | |
| Priority: | urgent | ||
| Version: | FDP 23.A | CC: | amusil, ctrautma, dcbw, dceara, jiji, mmichels, ovn-bot |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | ovn23.09-23.09.0-alpha.102.el9fdp | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2024-01-24 11:17:50 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Surya Seetharaman
2023-07-20 07:35:00 UTC
FWIW, I think the title is misleading. There's nothing blocking the host from using *any* CT zone it wants. IMO ovn-controller should flush all zones it uses. There is one special case, when ovn-controller shares a zone with the host. That's normally when LR.options:snat-ct-zone=<ZONE> is set in the NB. Only in that case it *might* be acceptable to not flush the zone, if the CMS explicitly requests that. Summarizing what we discussed offline: The real underlying problem is that ovn-controller was flushing CT zone 0 during an ovn-kubernetes migration from non-IC to IC deployments. That's because the SB database was being reconstructed and the datapath associated to the gateway router that had snat-ct-zone=0 set was changing UUID; the ovn-controller mechanism that avoids flushing zones that were already in use by OVN matches on datapath UUIDs and not names so ovn-controller was incorrectly assuming that the logical datapath had changed. It's not desirable to add more configuration knobs for avoiding CT zone flush (those would have to be per switch/router and would over-complicate the code); instead we can change ovn-controller to try matching both on UUID and on switch/router name when mapping required CT zones to already existing ones. Once ovn-kubernetes upgrades to a version of OVN that supports both mappings no flush will happen anymore when the SB UUID changes unless the SB datapath name changes too. That should properly fix any traffic disruption issues caused by conntrack flush when upgrading from non-IC to IC deployments. ovn23.09 fast-datapath-rhel-9 clone created at https://bugzilla.redhat.com/show_bug.cgi?id=2227121 ovn23.06 fast-datapath-rhel-8 clone created at https://bugzilla.redhat.com/show_bug.cgi?id=2227122 ovn23.06 fast-datapath-rhel-9 clone created at https://bugzilla.redhat.com/show_bug.cgi?id=2227123 Upstream patch applied: http://patchwork.ozlabs.org/project/ovn/patch/20230726124239.66275-1-amusil@redhat.com/ *** Bug 2227121 has been marked as a duplicate of this bug. *** with reproducer in https://bugzilla.redhat.com/show_bug.cgi?id=2227123#c3 Verified on ovn23.09-23.09.0-73.el9: [root@kvm-01-guest22 bz2224199]# rpm -qa | grep -E "openvswitch3.1|ovn23.09" openvswitch3.1-3.1.0-70.el9fdp.x86_64 ovn23.09-23.09.0-73.el9fdp.x86_64 ovn23.09-central-23.09.0-73.el9fdp.x86_64 ovn23.09-host-23.09.0-73.el9fdp.x86_64 + ovn-nbctl set logical_router lr1 options:chassis=hv1 + ovn-nbctl lr-nat-add lr1 snat 172.17.1.1 192.168.1.1 + ovn-nbctl --wait=hv sync + ovs-vsctl list bridge _uuid : 5df93910-da9a-4805-9875-a616b9f9e7e9 auto_attach : [] controller : [] datapath_id : "00000aa33b1ec712" datapath_type : system datapath_version : "<unknown>" external_ids : {ct-zone-lr1-ls1="10", ct-zone-lr1-ls2="9", ct-zone-lr1_dnat="1", ct-zone-lr1_snat="11", ct-zone-ls1-lr1="8", ct-zone-ls1_dnat="12", ct-zone-ls1_snat="5", ct-zone-ls1p1="3", ct-zone-ls1p2="7", ct-zone-ls2-lr1="6", ct-zone-ls2_dnat="4", ct-zone-ls2_snat="2", ct-zone-ls2p1="13", ovn-nb-cfg="1", ovn-nb-cfg-ts="1701742776848", ovn-startup-ts="1701742766697"} <=== ct-zone name is related to ls/lr name fail_mode : secure flood_vlans : [] flow_tables : {} ipfix : [] mcast_snooping_enable: false mirrors : [] name : br-int netflow : [] other_config : {disable-in-band="true", hwaddr="0a:a3:3b:1e:c7:12"} ports : [1c98001b-be02-40c9-960d-a1599d3edc64, 258e41ca-f789-4c4f-862d-1d802075874b, 71938d9c-0784-495b-a803-913768e001a9, fe17825f-f75b-4112-a3f6-cacb412ba3ae] protocols : [] rstp_enable : false rstp_status : {} sflow : [] status : {} stp_enable : false Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (ovn23.09 bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2024:0392 |