Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
The FDP team is no longer accepting new bugs in Bugzilla. Please report your issues under FDP project in Jira. Thanks.

Bug 2126406

Summary: ovn-controller seems to have installed two ct-zones with same ID
Product: Red Hat Enterprise Linux Fast Datapath Reporter: Surya Seetharaman <surya>
Component: ovn22.03Assignee: Mark Michelson <mmichels>
Status: CLOSED ERRATA QA Contact: Jianlin Shi <jishi>
Severity: high Docs Contact:
Priority: high    
Version: FDP 22.LCC: amusil, ctrautma, gparente, jiji, mmichels, palonsor
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ovn22.03-22.03.0-120.el8fdp Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-12-15 15:26:33 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Comment 4 Mark Michelson 2022-10-20 19:04:23 UTC
Hi, I have an update on this.

In the example provided in the private comment, there was a logical port and an SNAT zone that were both assigned zone "0".

To preface, zone 0 is typically never assigned automatically by ovn-controller since it is the default conntrack zone in the system, and using zone 0 has the potential for issues. THe only way that zone 0 should be used by ovn-controller is if it is specifically requested for SNAT. If a CMS is specifically requesting zone 0, then they know what they're doing and ovn-controller allows it. This means that zone 0 should still never be assigned to a logical port.

Looking at logs, it appears that when ovn-controller started up, it restored CT zones from the OVS database. At that point, the SNAT zone and logical port already had zone 0 assigned to them. Looking at the code, it's very clear how once this situation occurs, it will not get cleared up. This will be easy to fix.

What's not so clear is how the OVS database had those assignments in the first place. One possible way this could happen is through manual manipulation of the database. If someone were to manually add external_ids:ct-zone-<logical_port_name>=0 to the OVS database and then restart ovn-controller, then that would assign zone 0 to the port at startup. However, I highly doubt that is what happened here.

Looking further in the code, I see in the ovn-controller incremental processing, there is a bug that can allow for a logical port to get assigned zone 0. This, on its own, is another easy thing to fix. However, this on its own does not explain how both a logical port and an SNAT zone were able to get the same zone assignment. If a logical port were assigned zone 0, and then later we came across a requested SNAT zone 0, then the auto-assigned zone 0 on the logical port should be re-assigned.

I already have the easy fixes taken care of, but I'm going to look further into why it is possible to get into the broken state to begin with. There are some certain code smells I'm spotting but I haven't actually formulated a scenario where we would assign the same zone ID multiple times.

Comment 5 Mark Michelson 2022-10-21 19:12:13 UTC
Mitigation (and potential fix) posted here: https://patchwork.ozlabs.org/project/ovn/patch/20221021183759.4192249-1-mmichels@redhat.com/

As the message states, I couldn't find how it was possible with current OVN code for conflicting CT zones to be assigned like this. However, I have added measures that should fix a conflict if we load one from the OVSDB. I also have added measures to prevent zone zero from being assigned to a logical port.

Comment 7 OVN Bot 2022-11-22 19:30:08 UTC
ovn22.03 fast-datapath-rhel-9 clone created at https://bugzilla.redhat.com/show_bug.cgi?id=2144933
ovn22.06 fast-datapath-rhel-8 clone created at https://bugzilla.redhat.com/show_bug.cgi?id=2144939
ovn22.06 fast-datapath-rhel-9 clone created at https://bugzilla.redhat.com/show_bug.cgi?id=2144940
ovn22.09 fast-datapath-rhel-8 clone created at https://bugzilla.redhat.com/show_bug.cgi?id=2144946
ovn22.09 fast-datapath-rhel-9 clone created at https://bugzilla.redhat.com/show_bug.cgi?id=2144947

Comment 10 Jianlin Shi 2022-12-02 05:50:03 UTC
tested with reproducer in https://bugzilla.redhat.com/show_bug.cgi?id=2144940#c3:

reproduced on ovn22.03-22.03.0-118.el8:

+ ovn-nbctl --wait=hv set logical_router lr0 options:snat-ct-zone=0                                   
+ ovn-appctl -t ovn-controller exit --restart                                                         
+ ovs-vsctl set bridge br-int external_ids:ct-zone-ls0-hv=0                                           
+ ovs-vsctl set bridge br-int external_ids:ct-zone-ls0-hv1=0                                          
+ ovs-vsctl set bridge br-int external_ids:ct-zone-ls0-hv2=0                                          
+ systemctl start ovn-controller                                                                      
+ ovn-nbctl --wait=hv sync                                                                            
+ sleep 2                                                                                             
+ ovn-appctl ct-zone-list                                                                             
ls0-hv2 0                                                                                             
ec1d7f9b-0ee9-4da1-8c2e-d1147b940832_snat 1                                                           
ls0-hv1 0                                                                                             
31c0b51b-0d08-47c2-954a-6f8cc1423a45_dnat 5                                                           
ec1d7f9b-0ee9-4da1-8c2e-d1147b940832_dnat 4                                                           
31c0b51b-0d08-47c2-954a-6f8cc1423a45_snat 0                                                           
ls0-hv 0     

<=== multiple zone=0
                                                                                         
[root@dell-per750-18 bz2126406]#  rpm -qa | grep -E "openvswitch2.17|ovn22.03"                        
openvswitch2.17-2.17.0-65.el8fdp.x86_64                                                               
ovn22.03-22.03.0-118.el8fdp.x86_64                                                                    
ovn22.03-host-22.03.0-118.el8fdp.x86_64                                                               
ovn22.03-central-22.03.0-118.el8fdp.x86_64

Verified on ovn22.03-22.03.0-125.el8:

+ ovn-nbctl --wait=hv set logical_router lr0 options:snat-ct-zone=0                                   
+ ovn-appctl -t ovn-controller exit --restart
+ ovs-vsctl set bridge br-int external_ids:ct-zone-ls0-hv=0
+ ovs-vsctl set bridge br-int external_ids:ct-zone-ls0-hv1=0                                          
+ ovs-vsctl set bridge br-int external_ids:ct-zone-ls0-hv2=0                                          
+ systemctl start ovn-controller                                                                      
+ ovn-nbctl --wait=hv sync                                                                            
+ sleep 2
+ ovn-appctl ct-zone-list                                                                             
ls0-hv2 2
f9cc1f68-da6a-4d85-ae37-622ae005e4e8_snat 0                                                           
3eba066d-1d52-46f5-9914-232c83ebf01e_snat 1                                                           
ls0-hv1 3
3eba066d-1d52-46f5-9914-232c83ebf01e_dnat 5                                                           
ls0-hv 4                                                                                              
f9cc1f68-da6a-4d85-ae37-622ae005e4e8_dnat 6

<=== only one zone=0

[root@dell-per750-18 bz2126406]# rpm -qa | grep -E "openvswitch2.17|ovn22.03"                         
ovn22.03-22.03.0-125.el8fdp.x86_64                                                                    
openvswitch2.17-2.17.0-65.el8fdp.x86_64                                                               
ovn22.03-central-22.03.0-125.el8fdp.x86_64                                                            
ovn22.03-host-22.03.0-125.el8fdp.x86_64

Comment 12 errata-xmlrpc 2022-12-15 15:26:33 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (ovn22.03 bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:9059