Description What problem/issue/behavior are you having trouble with? What do you expect to see? In our Shift-On-Stack production environment, a number of VMs in couple of tenant projects are suffering from traffic loss to/from the DC. It appears to be affecting VMs connected to tenant running OCP where it was recently upgraded from version 4.10.52 to 4.11.34. The OCP upgrade overlaps with our own work to upgrade Cisco firmware for the nodes running OpenStack 16.1.6 where the firmware breaks the NICs requiring us to manually remap the NICs back to their original names with udev rules. On the surface, these changes appear to have worked well but we're starting to see effects where some of the OCP worker VMs are not able to reach the DC (time sync is not working, nslookups are failing and ping is not going through), traffic through the geneve tunnel doesn't seem to be getting over to the tenant router and we have a number of tls related "protocol errors" in the SBDB logs on our controllers. We attempted to sycn the northbound database with the following command: ``` podman exec -it neutron_api neutron-ovn-db-sync-util --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/plugins/ml2/ml2_conf.ini --ovn-neutron_sync_mode repair --debug Deprecated: Option "vif_type" from group "ovn" is deprecated for removal (The port VIF type is now determined based on the OVN chassis information when the port is bound to a host.). Its value may be silently ignored in the future. Deprecated: Option "ovn_l3_mode" from group "ovn" is deprecated for removal (This option is no longer used. Native L3 support in OVN is always used.). Its value may be silently ignored in the future. Error: non zero exit code: 1: OCI runtime error ``` but it broke with an exception: ``` 2023-05-03 14:54:55.727 250864 ERROR ovsdbapp.backend.ovs_idl.transaction [req-564e4e5e-bd95-4475-bcb4-cb326a12bd16 - - - - -] Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/connection.py", line 128, in run txn.results.put(txn.do_commit()) File "/usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/transaction.py", line 86, in do_commit command.run_idl(txn) File "/usr/lib/python3.6/site-packages/ovsdbapp/schema/ovn_northbound/commands.py", line 121, in run_idl self.direction, self.priority, self.match)) RuntimeError: ACL (to-lport, 1002, outport == @pg_ea91cc4b_8bb6_4fa5_8758_7940349667dc && ip4 && ip4.src == 10.30.186.0/22 && tcp && tcp.dst == 27017) already exists 2023-05-03 14:54:55.727 250864 CRITICAL neutron_ovn_db_sync_util [req-564e4e5e-bd95-4475-bcb4-cb326a12bd16 - - - - -] Unhandled error: RuntimeError: ACL (to-lport, 1002, outport == @pg_ea91cc4b_8bb6_4fa5_8758_7$ 40349667dc && ip4 && ip4.src == 10.30.186.0/22 && tcp && tcp.dst == 27017) already exists 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util Traceback (most recent call last): 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util File "/usr/lib/python3.6/site-packages/ovsdbapp/api.py", line 111, in transaction 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util yield self._nested_txns_map[cur_thread_id] 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util KeyError: 140104017450816 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util During handling of the above exception, another exception occurred: 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util Traceback (most recent call last): 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util File "/usr/bin/neutron-ovn-db-sync-util", line 10, in <module> 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util sys.exit(main()) 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util File "/usr/lib/python3.6/site-packages/networking_ovn/cmd/neutron_ovn_db_sync_util.py", line 221, in main 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util synchronizer.do_sync() 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util File "/usr/lib/python3.6/site-packages/networking_ovn/ovn_db_sync.py", line 99, in do_sync 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util self.sync_acls(ctx) 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util File "/usr/lib/python3.6/site-packages/networking_ovn/ovn_db_sync.py", line 283, in sync_acls 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util txn.add(self.ovn_api.pg_acl_add(**acla)) 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util File "/usr/lib64/python3.6/contextlib.py", line 88, in __exit__ 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util next(self.gen) 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util File "/usr/lib/python3.6/site-packages/networking_ovn/ovsdb/impl_idl_ovn.py", line 196, in transaction 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util yield t 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util File "/usr/lib64/python3.6/contextlib.py", line 88, in __exit__ 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util next(self.gen) 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util File "/usr/lib/python3.6/site-packages/ovsdbapp/api.py", line 119, in transaction 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util del self._nested_txns_map[cur_thread_id] 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util File "/usr/lib/python3.6/site-packages/ovsdbapp/api.py", line 69, in __exit__ 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util self.result = self.commit() 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util File "/usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/transaction.py", line 62, in commit 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util raise result.ex 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util File "/usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/connection.py", line 128, in run 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util txn.results.put(txn.do_commit()) 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util File "/usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/transaction.py", line 86, in do_commit 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util command.run_idl(txn) 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util File "/usr/lib/python3.6/site-packages/ovsdbapp/schema/ovn_northbound/commands.py", line 121, in run_idl 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util self.direction, self.priority, self.match)) 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util RuntimeError: ACL (to-lport, 1002, outport == @pg_ea91cc4b_8bb6_4fa5_8758_7940349667dc && ip4 && ip4.src == 10.30.186.0/22 && tcp && tcp.dst == 2701$ ) already exists 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util This is on 16.1.7 ...
(In reply to David Hill from comment #0) > We attempted to sycn the northbound database with the following command: > ``` > podman exec -it neutron_api neutron-ovn-db-sync-util --config-file > /etc/neutron/neutron.conf --config-file > /etc/neutron/plugins/ml2/ml2_conf.ini --ovn-neutron_sync_mode repair --debug > Deprecated: Option "vif_type" from group "ovn" is deprecated for removal > (The port VIF type is now determined based on the OVN chassis information > when the port is bound to a host.). Its value may be silently ignored in > the future. > Deprecated: Option "ovn_l3_mode" from group "ovn" is deprecated for removal > (This option is no longer used. Native L3 support in OVN is always used.). > Its value may be silently ignored in the future. > Error: non zero exit code: 1: OCI runtime error > ``` > but it broke with an exception: > ``` > 2023-05-03 14:54:55.727 250864 ERROR ovsdbapp.backend.ovs_idl.transaction > [req-564e4e5e-bd95-4475-bcb4-cb326a12bd16 - - - - -] Traceback (most recent > call last): > File > "/usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/connection.py", > line 128, in run > txn.results.put(txn.do_commit()) > File > "/usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/transaction.py", > line 86, in do_commit > command.run_idl(txn) > File > "/usr/lib/python3.6/site-packages/ovsdbapp/schema/ovn_northbound/commands. > py", line 121, in run_idl > self.direction, self.priority, self.match)) > RuntimeError: ACL (to-lport, 1002, outport == > @pg_ea91cc4b_8bb6_4fa5_8758_7940349667dc && ip4 && ip4.src == 10.30.186.0/22 > && tcp && tcp.dst == 27017) already exists > > 2023-05-03 14:54:55.727 250864 CRITICAL neutron_ovn_db_sync_util > [req-564e4e5e-bd95-4475-bcb4-cb326a12bd16 - - - - -] Unhandled error: > RuntimeError: ACL (to-lport, 1002, outport == @pg_ea91cc4b_8bb6_4fa5_8758_7$ > 40349667dc && ip4 && ip4.src == 10.30.186.0/22 && tcp && tcp.dst == 27017) > already exists > 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util Traceback > (most recent call last): > 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util File > "/usr/lib/python3.6/site-packages/ovsdbapp/api.py", line 111, in transaction > 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util yield > self._nested_txns_map[cur_thread_id] > 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util KeyError: > 140104017450816 > 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util > 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util During > handling of the above exception, another exception occurred: > 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util > 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util Traceback > (most recent call last): > 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util File > "/usr/bin/neutron-ovn-db-sync-util", line 10, in <module> > 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util > sys.exit(main()) > 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util File > "/usr/lib/python3.6/site-packages/networking_ovn/cmd/ > neutron_ovn_db_sync_util.py", line 221, in main > 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util > synchronizer.do_sync() > 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util File > "/usr/lib/python3.6/site-packages/networking_ovn/ovn_db_sync.py", line 99, > in do_sync > 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util > self.sync_acls(ctx) > 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util File > "/usr/lib/python3.6/site-packages/networking_ovn/ovn_db_sync.py", line 283, > in sync_acls > 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util > txn.add(self.ovn_api.pg_acl_add(**acla)) > 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util File > "/usr/lib64/python3.6/contextlib.py", line 88, in __exit__ > 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util > next(self.gen) > 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util File > "/usr/lib/python3.6/site-packages/networking_ovn/ovsdb/impl_idl_ovn.py", > line 196, in transaction > 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util yield t > 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util File > "/usr/lib64/python3.6/contextlib.py", line 88, in __exit__ > 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util > next(self.gen) > 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util File > "/usr/lib/python3.6/site-packages/ovsdbapp/api.py", line 119, in transaction > 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util del > self._nested_txns_map[cur_thread_id] > 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util File > "/usr/lib/python3.6/site-packages/ovsdbapp/api.py", line 69, in __exit__ > 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util > self.result = self.commit() > 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util File > "/usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/transaction.py", > line 62, in commit > 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util raise > result.ex > 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util File > "/usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/connection.py", > line 128, in run > 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util > txn.results.put(txn.do_commit()) > 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util File > "/usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/transaction.py", > line 86, in do_commit > 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util > command.run_idl(txn) > 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util File > "/usr/lib/python3.6/site-packages/ovsdbapp/schema/ovn_northbound/commands. > py", line 121, in run_idl > 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util > self.direction, self.priority, self.match)) > 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util RuntimeError: > ACL (to-lport, 1002, outport == @pg_ea91cc4b_8bb6_4fa5_8758_7940349667dc && > ip4 && ip4.src == 10.30.186.0/22 && tcp && tcp.dst == 2701$ > ) already exists > 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util > > This is on 16.1.7 ... This is fixed with bug 2080224 in 16.1.9 I'll look at the sosreports, the protocol errors might mean a connection string was changed or certificates were re-generated. Then ovn-controller won't be able to connect to the SB DBs and won't perform any changes in the data plane.
To summarize, the problem seems to be that ovn-controller process/container was not restarted after new certificates were installed. That led to failures in connections from ovn-controller to the southbound database. We should revisit why the containers were not restarted when new certificates were installed.