Description of problem: Cu was unable to create new network components in a OSP16.2 env. For example, trying to create a new network through the OpenStack GUI resulted in: Failed to create network "test": Request Failed: internal server error while processing your request. Neutron server returns request_ids: ['req-c2713ea8-6ecf-4d4b-950a-f74b093c832f'] This happens on the whole environment with all users in all tenants. As per Cu feedback, at the same time this problems started there was a failover for the network connections from one physical switch in the datacenter to another one. checking (before resolution) /var/log/containers/neutron/server.log | grep fail ----> multiple failures could be seen: 2023-08-14 09:02:52.258 27 ERROR neutron.api.v2.resource [req-a9c5555c-d14c-4b9a-acef-12ae186bba08 495b9f35865c47f6ac1f63c145493907 4f319b4ae84b44fdb8c693f74b9e7da7 - default default] add_router_interface failed: No details.: neutron.plugins.ml2.common.exceptions.MechanismDriverError 2023-08-14 09:05:09.065 22 INFO neutron.api.v2.resource [req-d43bf298-30b6-474d-82b3-728c24dfe038 495b9f35865c47f6ac1f63c145493907 4f319b4ae84b44fdb8c693f74b9e7da7 - default default] remove_router_interface failed (client error): The resource could not be found. 2023-08-14 09:05:15.428 20 INFO neutron.pecan_wsgi.hooks.translation [req-18113fb4-8a83-4750-9281-a39182c614c8 495b9f35865c47f6ac1f63c145493907 4f319b4ae84b44fdb8c693f74b9e7da7 - default default] GET failed (client error): The resource could not be found. 2023-08-14 09:05:18.202 23 INFO neutron.pecan_wsgi.hooks.translation [req-aa0fb2a7-a640-42f4-93d6-7e98f7fabc9d 495b9f35865c47f6ac1f63c145493907 4f319b4ae84b44fdb8c693f74b9e7da7 - default default] GET failed (client error): The resource could not be found. 2023-08-14 09:15:46.088 16 INFO neutron.pecan_wsgi.hooks.translation [req-46d9ff84-dc12-4541-b2f4-c2b879fbc939 495b9f35865c47f6ac1f63c145493907 4f319b4ae84b44fdb8c693f74b9e7da7 - default default] GET failed (client error): The resource could not be found. 2023-08-14 09:16:00.101 18 INFO neutron.api.v2.resource [req-f2a4324c-022a-426f-8265-b27325605778 495b9f35865c47f6ac1f63c145493907 4f319b4ae84b44fdb8c693f74b9e7da7 - default default] show failed (client error): The resource could not be found. 2023-08-14 09:16:03.180 18 INFO neutron.api.v2.resource [req-17cbcfc0-445e-4099-964f-d070ba14d43c 495b9f35865c47f6ac1f63c145493907 4f319b4ae84b44fdb8c693f74b9e7da7 - default default] show failed (client error): The resource could not be found. 2023-08-14 09:20:10.008 19 INFO neutron.api.v2.resource [req-012f1a56-ff5f-4f4e-8f04-94484d4688e0 495b9f35865c47f6ac1f63c145493907 4f319b4ae84b44fdb8c693f74b9e7da7 - default default] show failed (client error): The resource could not be found. 2023-08-14 09:20:26.867 24 INFO neutron.pecan_wsgi.hooks.translation [req-4a8a4680-4755-487d-8e85-f5019f590601 495b9f35865c47f6ac1f63c145493907 4f319b4ae84b44fdb8c693f74b9e7da7 - default default] GET failed (client error): The resource could not be found. 2023-08-14 09:30:30.078 23 INFO neutron.api.v2.resource [req-37a79104-f372-4818-85b8-32de6929534d 495b9f35865c47f6ac1f63c145493907 4f319b4ae84b44fdb8c693f74b9e7da7 - default default] show failed (client error): The resource could not be found. 2023-08-14 09:30:33.142 19 INFO neutron.api.v2.resource [req-0f21b798-0750-427d-b1ce-0cad0176d4f3 495b9f35865c47f6ac1f63c145493907 4f319b4ae84b44fdb8c693f74b9e7da7 - default default] show failed (client error): The resource could not be found. 2023-08-14 09:32:59.255 26 INFO neutron.pecan_wsgi.hooks.translation [req-9ec44497-0a80-4a10-be35-2ca9901ce099 495b9f35865c47f6ac1f63c145493907 4f319b4ae84b44fdb8c693f74b9e7da7 - default default] GET failed (client error): The resource could not be found. 2023-08-14 09:33:09.515 17 ERROR neutron.plugins.ml2.managers [req-072e3350-8d8d-495e-b69e-9113141dfb2f 495b9f35865c47f6ac1f63c145493907 4f319b4ae84b44fdb8c693f74b9e7da7 - default default] Mechanism driver 'ovn' failed in update_port_postcommit: ovsdbapp.backend.ovs_idl.idlutils.RowNotFound: Cannot find Logical_Switch_Port with name=c50dd7e2-360c-4035-9d03-224f3df68ccb 2023-08-14 09:33:09.516 17 ERROR neutron.plugins.ml2.plugin [req-072e3350-8d8d-495e-b69e-9113141dfb2f 495b9f35865c47f6ac1f63c145493907 4f319b4ae84b44fdb8c693f74b9e7da7 - default default] mechanism_manager.update_port_postcommit failed for port c50dd7e2-360c-4035-9d03-224f3df68ccb: neutron.plugins.ml2.common.exceptions.MechanismDriverError 2023-08-14 09:33:16.707 17 INFO neutron.pecan_wsgi.hooks.translation [req-a5d2f9d6-cac3-42ad-bdd7-dc80d2412c07 495b9f35865c47f6ac1f63c145493907 4f319b4ae84b44fdb8c693f74b9e7da7 - default default] GET failed (client error): The resource could not be found. Pacemaker status looked healthy: pcs status Cluster name: tripleo_cluster Cluster Summary: Stack: corosync Current DC: mase-ostkctl01-s06 (version 2.0.5-9.el8_4.1-ba59be7122) - partition with quorum Last updated: Mon Aug 14 09:41:40 2023 Last change: Mon Aug 14 08:07:13 2023 by hacluster via crmd on mase-ostkctl01-s06 15 nodes configured 45 resource instances configured Node List: Online: [ mase-ostkctl01-s06 mase-ostkctl02-s06 mase-ostkctl03-s06 ] GuestOnline: [ galera-bundle-0@mase-ostkctl01-s06 galera-bundle-1@mase-ostkctl03-s06 galera-bundle-2@mase-ostkctl02-s06 ovn-dbs-bundle-0@mase-ostkctl03-s06 ovn-dbs-bundle-1@mase-ostkctl01-s06 ovn-dbs-bundle-2@mase-ostkctl02-s06 rabbitmq-bundle-0@mase-ostkctl02-s06 rabbitmq-bundle-1@mase-ostkctl03-s06 rabbitmq-bundle-2@mase-ostkctl01-s06 redis-bundle-0@mase-ostkctl01-s06 redis-bundle-1@mase-ostkctl03-s06 redis-bundle-2@mase-ostkctl02-s06 ] Full List of Resources: ip-10.1.1.10 (ocf::heartbeat:IPaddr2): Started mase-ostkctl01-s06 ip-195.227.240.85 (ocf::heartbeat:IPaddr2): Started mase-ostkctl02-s06 ip-10.1.0.12 (ocf::heartbeat:IPaddr2): Started mase-ostkctl01-s06 ip-10.1.0.10 (ocf::heartbeat:IPaddr2): Started mase-ostkctl01-s06 Container bundle set: haproxy-bundle [cluster.common.tag/mase-ostk2-production-environment-rhosp_16_2-rhosp_16_2_containers-haproxy:pcmklatest]: haproxy-bundle-podman-0 (ocf::heartbeat:podman): Started mase-ostkctl01-s06 haproxy-bundle-podman-1 (ocf::heartbeat:podman): Started mase-ostkctl02-s06 haproxy-bundle-podman-2 (ocf::heartbeat:podman): Started mase-ostkctl03-s06 Container bundle set: galera-bundle [cluster.common.tag/mase-ostk2-production-environment-rhosp_16_2-rhosp_16_2_containers-mariadb:pcmklatest]: galera-bundle-0 (ocf::heartbeat:galera): Master mase-ostkctl01-s06 galera-bundle-1 (ocf::heartbeat:galera): Master mase-ostkctl03-s06 galera-bundle-2 (ocf::heartbeat:galera): Master mase-ostkctl02-s06 Container bundle set: rabbitmq-bundle [cluster.common.tag/mase-ostk2-production-environment-rhosp_16_2-rhosp_16_2_containers-rabbitmq:pcmklatest]: rabbitmq-bundle-0 (ocf::heartbeat:rabbitmq-cluster): Started mase-ostkctl02-s06 rabbitmq-bundle-1 (ocf::heartbeat:rabbitmq-cluster): Started mase-ostkctl03-s06 rabbitmq-bundle-2 (ocf::heartbeat:rabbitmq-cluster): Started mase-ostkctl01-s06 Container bundle set: redis-bundle [cluster.common.tag/mase-ostk2-production-environment-rhosp_16_2-rhosp_16_2_containers-redis:pcmklatest]: redis-bundle-0 (ocf::heartbeat:redis): Master mase-ostkctl01-s06 redis-bundle-1 (ocf::heartbeat:redis): Slave mase-ostkctl03-s06 redis-bundle-2 (ocf::heartbeat:redis): Slave mase-ostkctl02-s06 Container bundle set: ovn-dbs-bundle [cluster.common.tag/mase-ostk2-production-environment-rhosp_16_2-rhosp_16_2_containers-ovn-northd:pcmklatest]: ovn-dbs-bundle-0 (ocf::ovn:ovndb-servers): Slave mase-ostkctl03-s06 ovn-dbs-bundle-1 (ocf::ovn:ovndb-servers): Master mase-ostkctl01-s06 ovn-dbs-bundle-2 (ocf::ovn:ovndb-servers): Slave mase-ostkctl02-s06 ip-10.1.0.11 (ocf::heartbeat:IPaddr2): Started mase-ostkctl01-s06 Container bundle: openstack-cinder-volume [cluster.common.tag/mase-ostk2-production-environment-rhosp_16_2-rhosp_16_2_containers-cinder-volume:pcmklatest]: openstack-cinder-volume-podman-0 (ocf::heartbeat:podman): Started mase-ostkctl02-s06 Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: active/enabled Cu's controllers and computes are always connecting to two switches. At that time one of their switches was restarted and they had a failover to the other one. The problems started at a time close to the switch restarting and the connections failing over. All computes were experiencing this issue. Version-Release number of selected component (if applicable): Red Hat OpenStack Platform release 16.2.0 GA (Train) puppet-neutron-15.6.0-2.20210601015533.7f36270.el8ost.2.noarch rhosp-openvswitch-2.15-4.el8ost.1.noarch How reproducible: Happens each time. Steps to Reproduce: 1.Create new network/router/port 2. 3. Actual results: After analyzing the logs of the neutron-api container, this was run: systemctl restart tripleo_neutron_api.service This solved all of the issues. Example of what was seen before the restart: 2023-08-14 11:07:14.837 24 ERROR ovsdbapp.backend.ovs_idl.command Traceback (most recent call last): 2023-08-14 11:07:14.837 24 ERROR ovsdbapp.backend.ovs_idl.command File "/usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/command.py", line 42, in execute 2023-08-14 11:07:14.837 24 ERROR ovsdbapp.backend.ovs_idl.command t.add(self) 2023-08-14 11:07:14.837 24 ERROR ovsdbapp.backend.ovs_idl.command File "/usr/lib64/python3.6/contextlib.py", line 88, in exit 2023-08-14 11:07:14.837 24 ERROR ovsdbapp.backend.ovs_idl.command next(self.gen) 2023-08-14 11:07:14.837 24 ERROR ovsdbapp.backend.ovs_idl.command File "/usr/lib/python3.6/site-packages/ovsdbapp/api.py", line 119, in transaction 2023-08-14 11:07:14.837 24 ERROR ovsdbapp.backend.ovs_idl.command del self._nested_txns_map[cur_thread_id] 2023-08-14 11:07:14.837 24 ERROR ovsdbapp.backend.ovs_idl.command File "/usr/lib/python3.6/site-packages/ovsdbapp/api.py", line 69, in exit 2023-08-14 11:07:14.837 24 ERROR ovsdbapp.backend.ovs_idl.command self.result = self.commit() 2023-08-14 11:07:14.837 24 ERROR ovsdbapp.backend.ovs_idl.command File "/usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/transaction.py", line 57, in commit 2023-08-14 11:07:14.837 24 ERROR ovsdbapp.backend.ovs_idl.command timeout=self.timeout) 2023-08-14 11:07:14.837 24 ERROR ovsdbapp.backend.ovs_idl.command ovsdbapp.exceptions.TimeoutException: Commands [<ovsdbapp.backend.ovs_idl.command.DbSetCommand object at 0x7fcd7db1e198>] exceeded timeout 180 seconds 2023-08-14 11:07:14.837 24 ERROR ovsdbapp.backend.ovs_idl.command 2023-08-14 11:10:13.848 17 ERROR ovsdbapp.backend.ovs_idl.command [req-14978cda-23df-4748-babf-163d3fdb0514 f88759c7fab04371b52c2b4135e68f7e 5d90e0bfd1a94454a6e48e68cd4426fe - - -] Error executing command: ovsdbapp.exceptions.TimeoutException: Commands [<ovsdbapp.backend.ovs_idl.command.DbSetCommand object at 0x7fcd7c357358>] exceeded timeout 180 seconds 2023-08-14 11:10:13.848 17 ERROR ovsdbapp.backend.ovs_idl.command Traceback (most recent call last): 2023-08-14 11:10:13.848 17 ERROR ovsdbapp.backend.ovs_idl.command File "/usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/transaction.py", line 54, in commit 2023-08-14 11:10:13.848 17 ERROR ovsdbapp.backend.ovs_idl.command result = self.results.get(timeout=self.timeout) 2023-08-14 11:10:13.848 17 ERROR ovsdbapp.backend.ovs_idl.command File "/usr/lib/python3.6/site-packages/eventlet/queue.py", line 322, in get 2023-08-14 11:10:13.848 17 ERROR ovsdbapp.backend.ovs_idl.command return waiter.wait() 2023-08-14 11:10:13.848 17 ERROR ovsdbapp.backend.ovs_idl.command File "/usr/lib/python3.6/site-packages/eventlet/queue.py", line 141, in wait 2023-08-14 11:10:13.848 17 ERROR ovsdbapp.backend.ovs_idl.command return get_hub().switch() 2023-08-14 11:10:13.848 17 ERROR ovsdbapp.backend.ovs_idl.command File "/usr/lib/python3.6/site-packages/eventlet/hubs/hub.py", line 298, in switch 2023-08-14 11:10:13.848 17 ERROR ovsdbapp.backend.ovs_idl.command return self.greenlet.switch() 2023-08-14 11:10:13.848 17 ERROR ovsdbapp.backend.ovs_idl.command queue.Empty 2023-08-14 11:10:13.848 17 ERROR ovsdbapp.backend.ovs_idl.command 2023-08-14 11:10:13.848 17 ERROR ovsdbapp.backend.ovs_idl.command During handling of the above exception, another exception occurred: 2023-08-14 11:10:13.848 17 ERROR ovsdbapp.backend.ovs_idl.command 2023-08-14 11:10:13.848 17 ERROR ovsdbapp.backend.ovs_idl.command Traceback (most recent call last): 2023-08-14 11:10:13.848 17 ERROR ovsdbapp.backend.ovs_idl.command File "/usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/command.py", line 42, in execute 2023-08-14 11:10:13.848 17 ERROR ovsdbapp.backend.ovs_idl.command t.add(self) 2023-08-14 11:10:13.848 17 ERROR ovsdbapp.backend.ovs_idl.command File "/usr/lib64/python3.6/contextlib.py", line 88, in exit 2023-08-14 11:10:13.848 17 ERROR ovsdbapp.backend.ovs_idl.command next(self.gen) 2023-08-14 11:10:13.848 17 ERROR ovsdbapp.backend.ovs_idl.command File "/usr/lib/python3.6/site-packages/ovsdbapp/api.py", line 119, in transaction 2023-08-14 11:10:13.848 17 ERROR ovsdbapp.backend.ovs_idl.command del self._nested_txns_map[cur_thread_id] 2023-08-14 11:10:13.848 17 ERROR ovsdbapp.backend.ovs_idl.command File "/usr/lib/python3.6/site-packages/ovsdbapp/api.py", line 69, in exit 2023-08-14 11:10:13.848 17 ERROR ovsdbapp.backend.ovs_idl.command self.result = self.commit() 2023-08-14 11:10:13.848 17 ERROR ovsdbapp.backend.ovs_idl.command File "/usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/transaction.py", line 57, in commit 2023-08-14 11:10:13.848 17 ERROR ovsdbapp.backend.ovs_idl.command timeout=self.timeout) 2023-08-14 11:10:13.848 17 ERROR ovsdbapp.backend.ovs_idl.command ovsdbapp.exceptions.TimeoutException: Commands [<ovsdbapp.backend.ovs_idl.command.DbSetCommand object at 0x7fcd7c357358>] exceeded timeout 180 seconds 2023-08-14 11:10:13.848 17 ERROR ovsdbapp.backend.ovs_idl.command Expected results: Network components created as expected. Additional info: Attached to the case are Sosreports from Controller1 before the resolution of the issue, and also a complete set of Controller Sosreports taken after the issue has been resolved.
*** This bug has been marked as a duplicate of bug 2128911 ***