Bug 2232728 - [OSP16.2] Unable to create neutron components
Summary: [OSP16.2] Unable to create neutron components
Keywords:
Status: CLOSED DUPLICATE of bug 2128911
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-neutron
Version: 16.2 (Train)
Hardware: Unspecified
OS: Unspecified
unspecified
low
Target Milestone: ---
: ---
Assignee: Terry Wilson
QA Contact: Eran Kuris
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-08-18 08:32 UTC by Matsvei Hauryliuk
Modified: 2023-08-22 13:22 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-08-22 13:22:52 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker OSP-27575 0 None None None 2023-08-18 08:33:26 UTC

Description Matsvei Hauryliuk 2023-08-18 08:32:16 UTC
Description of problem:
Cu was unable to create new network components in a OSP16.2 env.
For example, trying to create a new network through the OpenStack GUI resulted in:

Failed to create network "test": Request Failed: internal server error while processing your request. Neutron server returns request_ids: ['req-c2713ea8-6ecf-4d4b-950a-f74b093c832f']
This happens on the whole environment with all users in all tenants.

As per Cu feedback, at the same time this problems started there was a failover for the network connections from one physical switch in the datacenter to another one. 

checking (before resolution) /var/log/containers/neutron/server.log | grep fail ----> multiple failures could be seen:

2023-08-14 09:02:52.258 27 ERROR neutron.api.v2.resource [req-a9c5555c-d14c-4b9a-acef-12ae186bba08 495b9f35865c47f6ac1f63c145493907 4f319b4ae84b44fdb8c693f74b9e7da7 - default default] add_router_interface failed: No details.: neutron.plugins.ml2.common.exceptions.MechanismDriverError
2023-08-14 09:05:09.065 22 INFO neutron.api.v2.resource [req-d43bf298-30b6-474d-82b3-728c24dfe038 495b9f35865c47f6ac1f63c145493907 4f319b4ae84b44fdb8c693f74b9e7da7 - default default] remove_router_interface failed (client error): The resource could not be found.
2023-08-14 09:05:15.428 20 INFO neutron.pecan_wsgi.hooks.translation [req-18113fb4-8a83-4750-9281-a39182c614c8 495b9f35865c47f6ac1f63c145493907 4f319b4ae84b44fdb8c693f74b9e7da7 - default default] GET failed (client error): The resource could not be found.
2023-08-14 09:05:18.202 23 INFO neutron.pecan_wsgi.hooks.translation [req-aa0fb2a7-a640-42f4-93d6-7e98f7fabc9d 495b9f35865c47f6ac1f63c145493907 4f319b4ae84b44fdb8c693f74b9e7da7 - default default] GET failed (client error): The resource could not be found.
2023-08-14 09:15:46.088 16 INFO neutron.pecan_wsgi.hooks.translation [req-46d9ff84-dc12-4541-b2f4-c2b879fbc939 495b9f35865c47f6ac1f63c145493907 4f319b4ae84b44fdb8c693f74b9e7da7 - default default] GET failed (client error): The resource could not be found.
2023-08-14 09:16:00.101 18 INFO neutron.api.v2.resource [req-f2a4324c-022a-426f-8265-b27325605778 495b9f35865c47f6ac1f63c145493907 4f319b4ae84b44fdb8c693f74b9e7da7 - default default] show failed (client error): The resource could not be found.
2023-08-14 09:16:03.180 18 INFO neutron.api.v2.resource [req-17cbcfc0-445e-4099-964f-d070ba14d43c 495b9f35865c47f6ac1f63c145493907 4f319b4ae84b44fdb8c693f74b9e7da7 - default default] show failed (client error): The resource could not be found.
2023-08-14 09:20:10.008 19 INFO neutron.api.v2.resource [req-012f1a56-ff5f-4f4e-8f04-94484d4688e0 495b9f35865c47f6ac1f63c145493907 4f319b4ae84b44fdb8c693f74b9e7da7 - default default] show failed (client error): The resource could not be found.
2023-08-14 09:20:26.867 24 INFO neutron.pecan_wsgi.hooks.translation [req-4a8a4680-4755-487d-8e85-f5019f590601 495b9f35865c47f6ac1f63c145493907 4f319b4ae84b44fdb8c693f74b9e7da7 - default default] GET failed (client error): The resource could not be found.
2023-08-14 09:30:30.078 23 INFO neutron.api.v2.resource [req-37a79104-f372-4818-85b8-32de6929534d 495b9f35865c47f6ac1f63c145493907 4f319b4ae84b44fdb8c693f74b9e7da7 - default default] show failed (client error): The resource could not be found.
2023-08-14 09:30:33.142 19 INFO neutron.api.v2.resource [req-0f21b798-0750-427d-b1ce-0cad0176d4f3 495b9f35865c47f6ac1f63c145493907 4f319b4ae84b44fdb8c693f74b9e7da7 - default default] show failed (client error): The resource could not be found.
2023-08-14 09:32:59.255 26 INFO neutron.pecan_wsgi.hooks.translation [req-9ec44497-0a80-4a10-be35-2ca9901ce099 495b9f35865c47f6ac1f63c145493907 4f319b4ae84b44fdb8c693f74b9e7da7 - default default] GET failed (client error): The resource could not be found.
2023-08-14 09:33:09.515 17 ERROR neutron.plugins.ml2.managers [req-072e3350-8d8d-495e-b69e-9113141dfb2f 495b9f35865c47f6ac1f63c145493907 4f319b4ae84b44fdb8c693f74b9e7da7 - default default] Mechanism driver 'ovn' failed in update_port_postcommit: ovsdbapp.backend.ovs_idl.idlutils.RowNotFound: Cannot find Logical_Switch_Port with name=c50dd7e2-360c-4035-9d03-224f3df68ccb
2023-08-14 09:33:09.516 17 ERROR neutron.plugins.ml2.plugin [req-072e3350-8d8d-495e-b69e-9113141dfb2f 495b9f35865c47f6ac1f63c145493907 4f319b4ae84b44fdb8c693f74b9e7da7 - default default] mechanism_manager.update_port_postcommit failed for port c50dd7e2-360c-4035-9d03-224f3df68ccb: neutron.plugins.ml2.common.exceptions.MechanismDriverError
2023-08-14 09:33:16.707 17 INFO neutron.pecan_wsgi.hooks.translation [req-a5d2f9d6-cac3-42ad-bdd7-dc80d2412c07 495b9f35865c47f6ac1f63c145493907 4f319b4ae84b44fdb8c693f74b9e7da7 - default default] GET failed (client error): The resource could not be found.

Pacemaker status looked healthy:

pcs status
Cluster name: tripleo_cluster
Cluster Summary:


    Stack: corosync

    Current DC: mase-ostkctl01-s06 (version 2.0.5-9.el8_4.1-ba59be7122) - partition with quorum

    Last updated: Mon Aug 14 09:41:40 2023

    Last change:  Mon Aug 14 08:07:13 2023 by hacluster via crmd on mase-ostkctl01-s06

    15 nodes configured

    45 resource instances configured


Node List:


    Online: [ mase-ostkctl01-s06 mase-ostkctl02-s06 mase-ostkctl03-s06 ]

    GuestOnline: [ galera-bundle-0@mase-ostkctl01-s06 galera-bundle-1@mase-ostkctl03-s06 galera-bundle-2@mase-ostkctl02-s06 ovn-dbs-bundle-0@mase-ostkctl03-s06 ovn-dbs-bundle-1@mase-ostkctl01-s06 ovn-dbs-bundle-2@mase-ostkctl02-s06 rabbitmq-bundle-0@mase-ostkctl02-s06 rabbitmq-bundle-1@mase-ostkctl03-s06 rabbitmq-bundle-2@mase-ostkctl01-s06 redis-bundle-0@mase-ostkctl01-s06 redis-bundle-1@mase-ostkctl03-s06 redis-bundle-2@mase-ostkctl02-s06 ]


Full List of Resources:


    ip-10.1.1.10        (ocf::heartbeat:IPaddr2):        Started mase-ostkctl01-s06

    ip-195.227.240.85   (ocf::heartbeat:IPaddr2):        Started mase-ostkctl02-s06

    ip-10.1.0.12        (ocf::heartbeat:IPaddr2):        Started mase-ostkctl01-s06

    ip-10.1.0.10        (ocf::heartbeat:IPaddr2):        Started mase-ostkctl01-s06

    Container bundle set: haproxy-bundle [cluster.common.tag/mase-ostk2-production-environment-rhosp_16_2-rhosp_16_2_containers-haproxy:pcmklatest]:

        haproxy-bundle-podman-0   (ocf::heartbeat:podman):         Started mase-ostkctl01-s06

        haproxy-bundle-podman-1   (ocf::heartbeat:podman):         Started mase-ostkctl02-s06

        haproxy-bundle-podman-2   (ocf::heartbeat:podman):         Started mase-ostkctl03-s06



    Container bundle set: galera-bundle [cluster.common.tag/mase-ostk2-production-environment-rhosp_16_2-rhosp_16_2_containers-mariadb:pcmklatest]:

        galera-bundle-0   (ocf::heartbeat:galera):         Master mase-ostkctl01-s06

        galera-bundle-1   (ocf::heartbeat:galera):         Master mase-ostkctl03-s06

        galera-bundle-2   (ocf::heartbeat:galera):         Master mase-ostkctl02-s06



    Container bundle set: rabbitmq-bundle [cluster.common.tag/mase-ostk2-production-environment-rhosp_16_2-rhosp_16_2_containers-rabbitmq:pcmklatest]:

        rabbitmq-bundle-0 (ocf::heartbeat:rabbitmq-cluster):       Started mase-ostkctl02-s06

        rabbitmq-bundle-1 (ocf::heartbeat:rabbitmq-cluster):       Started mase-ostkctl03-s06

        rabbitmq-bundle-2 (ocf::heartbeat:rabbitmq-cluster):       Started mase-ostkctl01-s06



    Container bundle set: redis-bundle [cluster.common.tag/mase-ostk2-production-environment-rhosp_16_2-rhosp_16_2_containers-redis:pcmklatest]:

        redis-bundle-0    (ocf::heartbeat:redis):  Master mase-ostkctl01-s06

        redis-bundle-1    (ocf::heartbeat:redis):  Slave mase-ostkctl03-s06

        redis-bundle-2    (ocf::heartbeat:redis):  Slave mase-ostkctl02-s06



    Container bundle set: ovn-dbs-bundle [cluster.common.tag/mase-ostk2-production-environment-rhosp_16_2-rhosp_16_2_containers-ovn-northd:pcmklatest]:

        ovn-dbs-bundle-0  (ocf::ovn:ovndb-servers):        Slave mase-ostkctl03-s06

        ovn-dbs-bundle-1  (ocf::ovn:ovndb-servers):        Master mase-ostkctl01-s06

        ovn-dbs-bundle-2  (ocf::ovn:ovndb-servers):        Slave mase-ostkctl02-s06



    ip-10.1.0.11        (ocf::heartbeat:IPaddr2):        Started mase-ostkctl01-s06

    Container bundle: openstack-cinder-volume [cluster.common.tag/mase-ostk2-production-environment-rhosp_16_2-rhosp_16_2_containers-cinder-volume:pcmklatest]:

        openstack-cinder-volume-podman-0  (ocf::heartbeat:podman):         Started mase-ostkctl02-s06




Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

Cu's controllers and computes are always connecting to two switches. At that time one of their switches was restarted and they had a failover to the other one. The problems started at a time close to the switch restarting and the connections failing over.

All computes were experiencing this issue.

Version-Release number of selected component (if applicable):
Red Hat OpenStack Platform release 16.2.0 GA (Train)
puppet-neutron-15.6.0-2.20210601015533.7f36270.el8ost.2.noarch
rhosp-openvswitch-2.15-4.el8ost.1.noarch 

How reproducible:
Happens each time.

Steps to Reproduce:
1.Create new network/router/port
2.
3.

Actual results:
After analyzing the logs of the neutron-api container, this was run:
systemctl restart tripleo_neutron_api.service

This solved all of the issues.
Example of what was seen before the restart:

2023-08-14 11:07:14.837 24 ERROR ovsdbapp.backend.ovs_idl.command Traceback (most recent call last):
2023-08-14 11:07:14.837 24 ERROR ovsdbapp.backend.ovs_idl.command   File "/usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/command.py", line 42, in execute
2023-08-14 11:07:14.837 24 ERROR ovsdbapp.backend.ovs_idl.command     t.add(self)
2023-08-14 11:07:14.837 24 ERROR ovsdbapp.backend.ovs_idl.command   File "/usr/lib64/python3.6/contextlib.py", line 88, in exit
2023-08-14 11:07:14.837 24 ERROR ovsdbapp.backend.ovs_idl.command     next(self.gen)
2023-08-14 11:07:14.837 24 ERROR ovsdbapp.backend.ovs_idl.command   File "/usr/lib/python3.6/site-packages/ovsdbapp/api.py", line 119, in transaction
2023-08-14 11:07:14.837 24 ERROR ovsdbapp.backend.ovs_idl.command     del self._nested_txns_map[cur_thread_id]
2023-08-14 11:07:14.837 24 ERROR ovsdbapp.backend.ovs_idl.command   File "/usr/lib/python3.6/site-packages/ovsdbapp/api.py", line 69, in exit
2023-08-14 11:07:14.837 24 ERROR ovsdbapp.backend.ovs_idl.command     self.result = self.commit()
2023-08-14 11:07:14.837 24 ERROR ovsdbapp.backend.ovs_idl.command   File "/usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/transaction.py", line 57, in commit
2023-08-14 11:07:14.837 24 ERROR ovsdbapp.backend.ovs_idl.command     timeout=self.timeout)
2023-08-14 11:07:14.837 24 ERROR ovsdbapp.backend.ovs_idl.command ovsdbapp.exceptions.TimeoutException: Commands [<ovsdbapp.backend.ovs_idl.command.DbSetCommand object at 0x7fcd7db1e198>] exceeded timeout 180 seconds
2023-08-14 11:07:14.837 24 ERROR ovsdbapp.backend.ovs_idl.command
2023-08-14 11:10:13.848 17 ERROR ovsdbapp.backend.ovs_idl.command [req-14978cda-23df-4748-babf-163d3fdb0514 f88759c7fab04371b52c2b4135e68f7e 5d90e0bfd1a94454a6e48e68cd4426fe - - -] Error executing command: ovsdbapp.exceptions.TimeoutException: Commands [<ovsdbapp.backend.ovs_idl.command.DbSetCommand object at 0x7fcd7c357358>] exceeded timeout 180 seconds
2023-08-14 11:10:13.848 17 ERROR ovsdbapp.backend.ovs_idl.command Traceback (most recent call last):
2023-08-14 11:10:13.848 17 ERROR ovsdbapp.backend.ovs_idl.command   File "/usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/transaction.py", line 54, in commit
2023-08-14 11:10:13.848 17 ERROR ovsdbapp.backend.ovs_idl.command     result = self.results.get(timeout=self.timeout)
2023-08-14 11:10:13.848 17 ERROR ovsdbapp.backend.ovs_idl.command   File "/usr/lib/python3.6/site-packages/eventlet/queue.py", line 322, in get
2023-08-14 11:10:13.848 17 ERROR ovsdbapp.backend.ovs_idl.command     return waiter.wait()
2023-08-14 11:10:13.848 17 ERROR ovsdbapp.backend.ovs_idl.command   File "/usr/lib/python3.6/site-packages/eventlet/queue.py", line 141, in wait
2023-08-14 11:10:13.848 17 ERROR ovsdbapp.backend.ovs_idl.command     return get_hub().switch()
2023-08-14 11:10:13.848 17 ERROR ovsdbapp.backend.ovs_idl.command   File "/usr/lib/python3.6/site-packages/eventlet/hubs/hub.py", line 298, in switch
2023-08-14 11:10:13.848 17 ERROR ovsdbapp.backend.ovs_idl.command     return self.greenlet.switch()
2023-08-14 11:10:13.848 17 ERROR ovsdbapp.backend.ovs_idl.command queue.Empty
2023-08-14 11:10:13.848 17 ERROR ovsdbapp.backend.ovs_idl.command
2023-08-14 11:10:13.848 17 ERROR ovsdbapp.backend.ovs_idl.command During handling of the above exception, another exception occurred:
2023-08-14 11:10:13.848 17 ERROR ovsdbapp.backend.ovs_idl.command
2023-08-14 11:10:13.848 17 ERROR ovsdbapp.backend.ovs_idl.command Traceback (most recent call last):
2023-08-14 11:10:13.848 17 ERROR ovsdbapp.backend.ovs_idl.command   File "/usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/command.py", line 42, in execute
2023-08-14 11:10:13.848 17 ERROR ovsdbapp.backend.ovs_idl.command     t.add(self)
2023-08-14 11:10:13.848 17 ERROR ovsdbapp.backend.ovs_idl.command   File "/usr/lib64/python3.6/contextlib.py", line 88, in exit
2023-08-14 11:10:13.848 17 ERROR ovsdbapp.backend.ovs_idl.command     next(self.gen)
2023-08-14 11:10:13.848 17 ERROR ovsdbapp.backend.ovs_idl.command   File "/usr/lib/python3.6/site-packages/ovsdbapp/api.py", line 119, in transaction
2023-08-14 11:10:13.848 17 ERROR ovsdbapp.backend.ovs_idl.command     del self._nested_txns_map[cur_thread_id]
2023-08-14 11:10:13.848 17 ERROR ovsdbapp.backend.ovs_idl.command   File "/usr/lib/python3.6/site-packages/ovsdbapp/api.py", line 69, in exit
2023-08-14 11:10:13.848 17 ERROR ovsdbapp.backend.ovs_idl.command     self.result = self.commit()
2023-08-14 11:10:13.848 17 ERROR ovsdbapp.backend.ovs_idl.command   File "/usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/transaction.py", line 57, in commit
2023-08-14 11:10:13.848 17 ERROR ovsdbapp.backend.ovs_idl.command     timeout=self.timeout)
2023-08-14 11:10:13.848 17 ERROR ovsdbapp.backend.ovs_idl.command ovsdbapp.exceptions.TimeoutException: Commands [<ovsdbapp.backend.ovs_idl.command.DbSetCommand object at 0x7fcd7c357358>] exceeded timeout 180 seconds
2023-08-14 11:10:13.848 17 ERROR ovsdbapp.backend.ovs_idl.command

Expected results:
Network components created as expected.

Additional info:
Attached to the case are Sosreports from Controller1 before the resolution of the issue, and also a complete set of Controller Sosreports taken after the issue has been resolved.

Comment 2 Terry Wilson 2023-08-22 13:22:52 UTC

*** This bug has been marked as a duplicate of bug 2128911 ***


Note You need to log in before you can comment on or make changes to this bug.