Bug 2193212 - traffic not passing through geneve tunnel for some tenants breaking traffic flow to DC
Summary: traffic not passing through geneve tunnel for some tenants breaking traffic f...
Keywords:
Status: ASSIGNED
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 16.1 (Train)
Hardware: x86_64
OS: All
high
medium
Target Milestone: z6
: 16.2 (Train on RHEL 8.4)
Assignee: Jakub Libosvar
QA Contact: Joe H. Rahme
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-05-04 17:40 UTC by David Hill
Modified: 2023-07-31 15:09 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker OSP-24794 0 None None None 2023-05-04 17:42:52 UTC

Description David Hill 2023-05-04 17:40:01 UTC
Description	
What problem/issue/behavior are you having trouble with?  What do you expect to see?
In our Shift-On-Stack production environment, a number of VMs in couple of tenant projects are suffering from traffic loss to/from the DC. It appears to be affecting VMs connected to tenant running OCP where it was recently upgraded from version 4.10.52 to 4.11.34. The OCP upgrade overlaps with our own work to upgrade Cisco firmware for the nodes running OpenStack 16.1.6 where the firmware breaks the NICs requiring us to manually remap the NICs back to their original names with udev rules. On the surface, these changes appear to have worked well but we're starting to see effects where some of the OCP worker VMs are not able to reach the DC (time sync is not working, nslookups are failing and ping is not going through), traffic through the geneve tunnel doesn't seem to be getting over to the tenant router and we have a number of tls related "protocol errors" in the SBDB logs on our controllers.

We attempted to sycn the northbound database with the following command:
```
podman exec -it neutron_api neutron-ovn-db-sync-util --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/plugins/ml2/ml2_conf.ini --ovn-neutron_sync_mode repair --debug
Deprecated: Option "vif_type" from group "ovn" is deprecated for removal (The port VIF type is now determined based on the OVN chassis information when the port is bound to a host.).  Its value may be silently ignored in the future.
Deprecated: Option "ovn_l3_mode" from group "ovn" is deprecated for removal (This option is no longer used. Native L3 support in OVN is always used.).  Its value may be silently ignored in the future.
Error: non zero exit code: 1: OCI runtime error
```
but it broke with an exception:
```
2023-05-03 14:54:55.727 250864 ERROR ovsdbapp.backend.ovs_idl.transaction [req-564e4e5e-bd95-4475-bcb4-cb326a12bd16 - - - - -] Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/connection.py", line 128, in run
    txn.results.put(txn.do_commit())
  File "/usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/transaction.py", line 86, in do_commit
    command.run_idl(txn)
  File "/usr/lib/python3.6/site-packages/ovsdbapp/schema/ovn_northbound/commands.py", line 121, in run_idl
    self.direction, self.priority, self.match))
RuntimeError: ACL (to-lport, 1002, outport == @pg_ea91cc4b_8bb6_4fa5_8758_7940349667dc && ip4 && ip4.src == 10.30.186.0/22 && tcp && tcp.dst == 27017) already exists

2023-05-03 14:54:55.727 250864 CRITICAL neutron_ovn_db_sync_util [req-564e4e5e-bd95-4475-bcb4-cb326a12bd16 - - - - -] Unhandled error: RuntimeError: ACL (to-lport, 1002, outport == @pg_ea91cc4b_8bb6_4fa5_8758_7$
40349667dc && ip4 && ip4.src == 10.30.186.0/22 && tcp && tcp.dst == 27017) already exists
2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util Traceback (most recent call last):
2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util   File "/usr/lib/python3.6/site-packages/ovsdbapp/api.py", line 111, in transaction
2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util     yield self._nested_txns_map[cur_thread_id]
2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util KeyError: 140104017450816
2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util
2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util During handling of the above exception, another exception occurred:
2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util
2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util Traceback (most recent call last):
2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util   File "/usr/bin/neutron-ovn-db-sync-util", line 10, in <module>
2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util     sys.exit(main())
2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util   File "/usr/lib/python3.6/site-packages/networking_ovn/cmd/neutron_ovn_db_sync_util.py", line 221, in main
2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util     synchronizer.do_sync()
2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util   File "/usr/lib/python3.6/site-packages/networking_ovn/ovn_db_sync.py", line 99, in do_sync
2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util     self.sync_acls(ctx)
2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util   File "/usr/lib/python3.6/site-packages/networking_ovn/ovn_db_sync.py", line 283, in sync_acls
2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util     txn.add(self.ovn_api.pg_acl_add(**acla))
2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util   File "/usr/lib64/python3.6/contextlib.py", line 88, in __exit__
2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util     next(self.gen)
2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util   File "/usr/lib/python3.6/site-packages/networking_ovn/ovsdb/impl_idl_ovn.py", line 196, in transaction
2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util     yield t
2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util   File "/usr/lib64/python3.6/contextlib.py", line 88, in __exit__
2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util     next(self.gen)
2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util   File "/usr/lib/python3.6/site-packages/ovsdbapp/api.py", line 119, in transaction
2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util     del self._nested_txns_map[cur_thread_id]
2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util   File "/usr/lib/python3.6/site-packages/ovsdbapp/api.py", line 69, in __exit__
2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util     self.result = self.commit()
2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util   File "/usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/transaction.py", line 62, in commit
2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util     raise result.ex
2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util   File "/usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/connection.py", line 128, in run
2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util     txn.results.put(txn.do_commit())
2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util   File "/usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/transaction.py", line 86, in do_commit
2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util     command.run_idl(txn)
2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util   File "/usr/lib/python3.6/site-packages/ovsdbapp/schema/ovn_northbound/commands.py", line 121, in run_idl
2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util     self.direction, self.priority, self.match))
2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util RuntimeError: ACL (to-lport, 1002, outport == @pg_ea91cc4b_8bb6_4fa5_8758_7940349667dc && ip4 && ip4.src == 10.30.186.0/22 && tcp && tcp.dst == 2701$
) already exists
2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util

This is on 16.1.7 ...

Comment 3 Jakub Libosvar 2023-05-04 20:05:59 UTC
(In reply to David Hill from comment #0)
> We attempted to sycn the northbound database with the following command:
> ```
> podman exec -it neutron_api neutron-ovn-db-sync-util --config-file
> /etc/neutron/neutron.conf --config-file
> /etc/neutron/plugins/ml2/ml2_conf.ini --ovn-neutron_sync_mode repair --debug
> Deprecated: Option "vif_type" from group "ovn" is deprecated for removal
> (The port VIF type is now determined based on the OVN chassis information
> when the port is bound to a host.).  Its value may be silently ignored in
> the future.
> Deprecated: Option "ovn_l3_mode" from group "ovn" is deprecated for removal
> (This option is no longer used. Native L3 support in OVN is always used.). 
> Its value may be silently ignored in the future.
> Error: non zero exit code: 1: OCI runtime error
> ```
> but it broke with an exception:
> ```
> 2023-05-03 14:54:55.727 250864 ERROR ovsdbapp.backend.ovs_idl.transaction
> [req-564e4e5e-bd95-4475-bcb4-cb326a12bd16 - - - - -] Traceback (most recent
> call last):
>   File
> "/usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/connection.py",
> line 128, in run
>     txn.results.put(txn.do_commit())
>   File
> "/usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/transaction.py",
> line 86, in do_commit
>     command.run_idl(txn)
>   File
> "/usr/lib/python3.6/site-packages/ovsdbapp/schema/ovn_northbound/commands.
> py", line 121, in run_idl
>     self.direction, self.priority, self.match))
> RuntimeError: ACL (to-lport, 1002, outport ==
> @pg_ea91cc4b_8bb6_4fa5_8758_7940349667dc && ip4 && ip4.src == 10.30.186.0/22
> && tcp && tcp.dst == 27017) already exists
> 
> 2023-05-03 14:54:55.727 250864 CRITICAL neutron_ovn_db_sync_util
> [req-564e4e5e-bd95-4475-bcb4-cb326a12bd16 - - - - -] Unhandled error:
> RuntimeError: ACL (to-lport, 1002, outport == @pg_ea91cc4b_8bb6_4fa5_8758_7$
> 40349667dc && ip4 && ip4.src == 10.30.186.0/22 && tcp && tcp.dst == 27017)
> already exists
> 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util Traceback
> (most recent call last):
> 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util   File
> "/usr/lib/python3.6/site-packages/ovsdbapp/api.py", line 111, in transaction
> 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util     yield
> self._nested_txns_map[cur_thread_id]
> 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util KeyError:
> 140104017450816
> 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util
> 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util During
> handling of the above exception, another exception occurred:
> 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util
> 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util Traceback
> (most recent call last):
> 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util   File
> "/usr/bin/neutron-ovn-db-sync-util", line 10, in <module>
> 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util    
> sys.exit(main())
> 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util   File
> "/usr/lib/python3.6/site-packages/networking_ovn/cmd/
> neutron_ovn_db_sync_util.py", line 221, in main
> 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util    
> synchronizer.do_sync()
> 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util   File
> "/usr/lib/python3.6/site-packages/networking_ovn/ovn_db_sync.py", line 99,
> in do_sync
> 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util    
> self.sync_acls(ctx)
> 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util   File
> "/usr/lib/python3.6/site-packages/networking_ovn/ovn_db_sync.py", line 283,
> in sync_acls
> 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util    
> txn.add(self.ovn_api.pg_acl_add(**acla))
> 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util   File
> "/usr/lib64/python3.6/contextlib.py", line 88, in __exit__
> 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util    
> next(self.gen)
> 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util   File
> "/usr/lib/python3.6/site-packages/networking_ovn/ovsdb/impl_idl_ovn.py",
> line 196, in transaction
> 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util     yield t
> 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util   File
> "/usr/lib64/python3.6/contextlib.py", line 88, in __exit__
> 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util    
> next(self.gen)
> 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util   File
> "/usr/lib/python3.6/site-packages/ovsdbapp/api.py", line 119, in transaction
> 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util     del
> self._nested_txns_map[cur_thread_id]
> 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util   File
> "/usr/lib/python3.6/site-packages/ovsdbapp/api.py", line 69, in __exit__
> 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util    
> self.result = self.commit()
> 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util   File
> "/usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/transaction.py",
> line 62, in commit
> 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util     raise
> result.ex
> 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util   File
> "/usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/connection.py",
> line 128, in run
> 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util    
> txn.results.put(txn.do_commit())
> 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util   File
> "/usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/transaction.py",
> line 86, in do_commit
> 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util    
> command.run_idl(txn)
> 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util   File
> "/usr/lib/python3.6/site-packages/ovsdbapp/schema/ovn_northbound/commands.
> py", line 121, in run_idl
> 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util    
> self.direction, self.priority, self.match))
> 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util RuntimeError:
> ACL (to-lport, 1002, outport == @pg_ea91cc4b_8bb6_4fa5_8758_7940349667dc &&
> ip4 && ip4.src == 10.30.186.0/22 && tcp && tcp.dst == 2701$
> ) already exists
> 2023-05-03 14:54:55.727 250864 ERROR neutron_ovn_db_sync_util
> 
> This is on 16.1.7 ...

This is fixed with bug 2080224 in 16.1.9

I'll look at the sosreports, the protocol errors might mean a connection string was changed or certificates were re-generated. Then ovn-controller won't be able to connect to the SB DBs and won't perform any changes in the data plane.

Comment 4 Jakub Libosvar 2023-05-08 15:24:28 UTC
To summarize, the problem seems to be that ovn-controller process/container was not restarted after new certificates were installed. That led to failures in connections from ovn-controller to the southbound database. We should revisit why the containers were not restarted when new certificates were installed.


Note You need to log in before you can comment on or make changes to this bug.