Bug 1955031 - [neutron]http_retries config option value not being used for calls to the port binding API
Summary: [neutron]http_retries config option value not being used for calls to the por...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-nova
Version: 16.1 (Train)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: z7
: 16.1 (Train on RHEL 8.2)
Assignee: melanie witt
QA Contact: James Parker
URL:
Whiteboard:
: 1968747 1973635 2017803 2103781 (view as bug list)
Depends On: 1968722 1968723 1968735
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-04-29 10:01 UTC by Jakub Libosvar
Modified: 2024-12-20 19:58 UTC (History)
18 users (show)

Fixed In Version: openstack-nova-20.4.1-1.20210708183304.1ee93b9.el8ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1968722 (view as bug list)
Environment:
Last Closed: 2021-12-09 20:19:04 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1929886 0 None None None 2021-05-28 00:50:00 UTC
OpenStack gerrit 795177 0 None NEW Honor [neutron]http_retries in the manual client 2021-06-07 22:19:42 UTC
Red Hat Issue Tracker OSP-3483 0 None None None 2021-11-18 11:31:32 UTC
Red Hat Product Errata RHBA-2021:3762 0 None None None 2021-12-09 20:19:31 UTC

Internal Links: 2017803

Description Jakub Libosvar 2021-04-29 10:01:51 UTC
Description of problem:
In case Neutron is slow and activate step from https://specs.openstack.org/openstack/neutron-specs/_images/seqdiag-7d93af8770687c1a95685208b8e6ca0553a8fb25.png fails, the multiple port binding is left in the Neutron database and VM ends up in ERROR state.

Even though the activate is successful on Neutron side (but slow) and live-migration was successful, the things are left in inconsistent broken state.

Version-Release number of selected component (if applicable):
OSP 16.1.5

How reproducible:
Always

Steps to Reproduce:
1. Put some sleep, like 120 seconds, in Neutron activate method: https://github.com/openstack/neutron/blob/150396625aaa870640a6fb9d636aeb90cea4da3e/neutron/plugins/ml2/plugin.py#L2556
2. Run live migration
3. Observe ml2_port_bindings table for the VM port

Actual results:
Two port bindings remain and VM is in error state

Expected results:
Nova can be more resilient to failures in active (probably trying indefinitely to activate the port) or have a mechanism to finish migration in such state, because everything is actually correct from the VM standpoint.


Additional info:

Comment 2 Artom Lifshitz 2021-05-03 16:36:21 UTC
Nova does have the [neutron]http_retries config option, that one can set to any number > 0 to make Nova retry all calls to the Neutron API if the initial request errors out.

I'm not sure what else Nova can do here - sure, once we're at the point that we have to activate the port binding on the destination everything is set up correctly from Nova's POV, but until we get confirmation from Neutron that the port binding has been updated, we can't reasonably move the instance back to ACTIVE, so I feel like setting it to ERROR if the port binding update fails is the correct thing do. We'll discuss this with the wider team on our bug triage call on Wednesday.

Comment 3 melanie witt 2021-05-03 19:45:47 UTC
(In reply to Artom Lifshitz from comment #2)
> Nova does have the [neutron]http_retries config option, that one can set to
> any number > 0 to make Nova retry all calls to the Neutron API if the
> initial request errors out.

Yeah, this defaults to 3 retries and it does apply to a timeout situation (the retries handle "ConnectionError" which can be things like timeout, SSL error, connection refused, etc) [1].

But if neutron is being consistently slow, then it's possible all 4 attempts will still timeout each time.

It's not clear to me what's the behavior of the API if the binding is already activated and we keep doing the PUT to activate it? The API is unfortunately not documented [2].

I'm not sure whether this is a good idea but I wonder if we could do something like catch a timeout exception from the PUT to activate the port and in that case, do a GET for the port bindings and check whether the destination binding is ACTIVE and the source INACTIVE (check whether the activate actually did work) and if so, go ahead and put the instance into ACTIVE state as a successfully completed migration.

[1] https://github.com/openstack/keystoneauth/blob/112bcae1fbec355fcb58da07ae1d9f1adf8b77ba/keystoneauth1/exceptions/connection.py#L38
[2] https://bugs.launchpad.net/neutron/+bug/1853873

Comment 6 melanie witt 2021-05-26 01:52:25 UTC
I have reproduced the issue in a local devstack by doing the following patch as suggested in comment 0:

--- a/neutron/plugins/ml2/plugin.py
+++ b/neutron/plugins/ml2/plugin.py
@@ -13,6 +13,8 @@
 #    License for the specific language governing permissions and limitations
 #    under the License.

+import time
+
 from eventlet import greenthread
 import netaddr
 from netaddr.strategy import eui48
@@ -2566,6 +2568,7 @@ class Ml2Plugin(db_base_plugin_v2.NeutronDbPluginV2,
             cur_context, _, try_again = self._commit_port_binding(
                 original_context, bind_context, need_notify=True,
                 update_binding_levels=False)
+            time.sleep(120)
             if not try_again:
                 self.notifier.binding_deactivate(context, port_id,
                                                  active_binding.host,


And then doing live migrations:

  (openstack) server create --image 57e9f98f-c7c9-4aac-a289-95ea6bcbcc83 --flavor c1 eight
  (openstack) server migrate --live --block-migration eight

I have found that the behavior on the nova side is actually a bug where our [neutron]http_retries config option is _not_ being honored in the case of the neutron port binding API. In our neutron API client code, we actually have two different ways of creating a client to talk to neutron [1] and when the [neutron]http_retries config option was added to nova, the second client (that calls the neutron port binding API) was missed for adding the default retry. So, the call to port binding API /activate is not doing any retries by default.

You can work around the bug by adding to your nova.conf on the compute hosts:

  [neutron]
  connect_retries = 3

and the keystoneauth code we use in the clients will pick up the setting automatically.

When retries are actually being done, nova will timeout during the first call and then retry and get a 409 Conflict response from neutron which nova treats as a successful port binding activation and completes the live migration without error. These excerpts show the behavior with [neutron]connect_retries = 3 on the nova side and the neutron side:


n-cpu:

May 26 01:09:14 ubuntu-subnode nova-compute[411810]: INFO keystoneauth.session [None req-9327f0c5-e70f-4d54-93af-cae935d1e3bd None None] Failure: Request to http://192.168.33.10:9696/v2.0/ports/d235be7e-6bf8-4403-afbf-ab2963e6574b/bindings/ubuntu-controller/activate timed out. Retrying in 0.5s.
May 26 01:09:15 ubuntu-subnode nova-compute[411810]: WARNING nova.network.neutron [None req-9327f0c5-e70f-4d54-93af-cae935d1e3bd None None] Binding for port d235be7e-6bf8-4403-afbf-ab2963e6574b and host ubuntu-controller is already active.

q-svc:

May 26 01:09:15 ubuntu-controller neutron-server[144111]: INFO neutron.wsgi [req-8160f605-287b-494d-bb22-db3f8b3ee756 req-d439d52a-c18a-43fc-9c7d-79087b44dbae service neutron] 192.168.33.11 "PUT /v2.0/por
ts/d235be7e-6bf8-4403-afbf-ab2963e6574b/bindings/ubuntu-controller/activate HTTP/1.1" status: 409  len: 386 time: 0.1463201

[...]

May 26 01:10:45 ubuntu-controller neutron-server[144112]: INFO neutron.wsgi [req-8160f605-287b-494d-bb22-db3f8b3ee756 req-17d0d362-c8a1-4f67-bd49-0c68f50715ef service neutron] 192.168.33.11 "PUT /v2.0/ports/d235be7e-6bf8-4403-afbf-ab2963e6574b/bindings/ubuntu-controller/activate HTTP/1.1" status: 200  len: 443 time: 120.8294811


On the neutron side, we can see here ^ that the 409 Conflict reply actually happens before the original first request from nova because of the sleep(120) that was patched in to reproduce the issue.

To fix the nova bug, we need to pass CONF.neutron.http_retries as connect_retries to the keystoneauth based client for calling the port binding API.

[1] https://github.com/openstack/nova/blob/b0cd985f0c09088098f74cc0cb1df616cc0ef12b/nova/network/neutron.py#L232-L274

Comment 7 melanie witt 2021-05-26 02:10:00 UTC
I found that the repeating neutron DuplicateEntry error is actually caused by a neutron side retry sequence:

q-svc:

May 26 00:59:49 ubuntu-controller neutron-server[144110]: DEBUG neutron.plugins.ml2.managers [req-7f1bf57f-6fdb-4014-9abd-a7317f3c54e2 req-30feb9f0-e0fa-4f48-802b-31e78181805a service neutron] DB exception raised by Mechanism driver 'openvswitch' in update_port_precommit {{(pid=144110) _call_on_drivers /opt/stack/neutron/neutron/plugins/ml2/managers.py:481}}
May 26 00:59:49 ubuntu-controller neutron-server[144110]: ERROR neutron.plugins.ml2.managers Traceback (most recent call last):
May 26 00:59:49 ubuntu-controller neutron-server[144110]: ERROR neutron.plugins.ml2.managers   File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
May 26 00:59:49 ubuntu-controller neutron-server[144110]: ERROR neutron.plugins.ml2.managers     self.dialect.do_execute(
May 26 00:59:49 ubuntu-controller neutron-server[144110]: ERROR neutron.plugins.ml2.managers   File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/engine/default.py", line 608, in do_execute
May 26 00:59:49 ubuntu-controller neutron-server[144110]: ERROR neutron.plugins.ml2.managers     cursor.execute(statement, parameters)
May 26 00:59:49 ubuntu-controller neutron-server[144110]: ERROR neutron.plugins.ml2.managers   File "/usr/local/lib/python3.8/dist-packages/pymysql/cursors.py", line 148, in execute
May 26 00:59:49 ubuntu-controller neutron-server[144110]: ERROR neutron.plugins.ml2.managers     result = self._query(query)
May 26 00:59:49 ubuntu-controller neutron-server[144110]: ERROR neutron.plugins.ml2.managers   File "/usr/local/lib/python3.8/dist-packages/pymysql/cursors.py", line 310, in _query
May 26 00:59:49 ubuntu-controller neutron-server[144110]: ERROR neutron.plugins.ml2.managers     conn.query(q)
May 26 00:59:49 ubuntu-controller neutron-server[144110]: ERROR neutron.plugins.ml2.managers   File "/usr/local/lib/python3.8/dist-packages/pymysql/connections.py", line 548, in query
May 26 00:59:49 ubuntu-controller neutron-server[144110]: ERROR neutron.plugins.ml2.managers     self._affected_rows = self._read_query_result(unbuffered=unbuffered)
May 26 00:59:49 ubuntu-controller neutron-server[144110]: ERROR neutron.plugins.ml2.managers   File "/usr/local/lib/python3.8/dist-packages/pymysql/connections.py", line 775, in _read_query_result
May 26 00:59:49 ubuntu-controller neutron-server[144110]: ERROR neutron.plugins.ml2.managers     result.read()
May 26 00:59:49 ubuntu-controller neutron-server[144110]: ERROR neutron.plugins.ml2.managers   File "/usr/local/lib/python3.8/dist-packages/pymysql/connections.py", line 1156, in read
May 26 00:59:49 ubuntu-controller neutron-server[144110]: ERROR neutron.plugins.ml2.managers     first_packet = self.connection._read_packet()
May 26 00:59:49 ubuntu-controller neutron-server[144110]: ERROR neutron.plugins.ml2.managers   File "/usr/local/lib/python3.8/dist-packages/pymysql/connections.py", line 725, in _read_packet
May 26 00:59:49 ubuntu-controller neutron-server[144110]: ERROR neutron.plugins.ml2.managers     packet.raise_for_error()
May 26 00:59:49 ubuntu-controller neutron-server[144110]: ERROR neutron.plugins.ml2.managers   File "/usr/local/lib/python3.8/dist-packages/pymysql/protocol.py", line 221, in raise_for_error
May 26 00:59:49 ubuntu-controller neutron-server[144110]: ERROR neutron.plugins.ml2.managers     err.raise_mysql_exception(self._data)
May 26 00:59:49 ubuntu-controller neutron-server[144110]: ERROR neutron.plugins.ml2.managers   File "/usr/local/lib/python3.8/dist-packages/pymysql/err.py", line 143, in raise_mysql_exception
May 26 00:59:49 ubuntu-controller neutron-server[144110]: ERROR neutron.plugins.ml2.managers     raise errorclass(errno, errval)
May 26 00:59:49 ubuntu-controller neutron-server[144110]: ERROR neutron.plugins.ml2.managers pymysql.err.IntegrityError: (1062, "Duplicate entry '17c5e5e6-be12-44b0-898f-80e0d731fd71-ubuntu-subnode' for k
ey 'ml2_port_bindings.PRIMARY'")

[...]

May 26 00:59:49 ubuntu-controller neutron-server[144110]: ERROR neutron.plugins.ml2.managers [SQL: UPDATE ml2_port_bindings SET host=%(host)s, vif_type=%(vif_type)s, vif_details=%(vif_details)s WHERE ml2_port_bindings.port_id = %(ml2_port_bindings_port_id)s AND ml2_port_bindings.host = %(ml2_port_bindings_host)s]
May 26 00:59:49 ubuntu-controller neutron-server[144110]: ERROR neutron.plugins.ml2.managers [parameters: {'host': 'ubuntu-subnode', 'vif_type': 'unbound', 'vif_details': '', 'ml2_port_bindings_port_id': '17c5e5e6-be12-44b0-898f-80e0d731fd71', 'ml2_port_bindings_host': 'ubuntu-controller'}]
May 26 00:59:49 ubuntu-controller neutron-server[144110]: ERROR neutron.plugins.ml2.managers (Background on this error at: http://sqlalche.me/e/13/gkpj)
May 26 00:59:49 ubuntu-controller neutron-server[144110]: ERROR neutron.plugins.ml2.managers
May 26 00:59:49 ubuntu-controller neutron-server[144110]: DEBUG neutron_lib.db.api [req-7f1bf57f-6fdb-4014-9abd-a7317f3c54e2 req-30feb9f0-e0fa-4f48-802b-31e78181805a service neutron] Retry wrapper got retriable exception: (pymysql.err.IntegrityError) (1062, "Duplicate entry '17c5e5e6-be12-44b0-898f-80e0d731fd71-ubuntu-subnode' for key 'ml2_port_bindings.PRIMARY'")
May 26 00:59:49 ubuntu-controller neutron-server[144110]: [SQL: UPDATE ml2_port_bindings SET host=%(host)s, vif_type=%(vif_type)s, vif_details=%(vif_details)s WHERE ml2_port_bindings.port_id = %(ml2_port_bindings_port_id)s AND ml2_port_bindings.host = %(ml2_port_bindings_host)s]
May 26 00:59:49 ubuntu-controller neutron-server[144110]: [parameters: {'host': 'ubuntu-subnode', 'vif_type': 'unbound', 'vif_details': '', 'ml2_port_bindings_port_id': '17c5e5e6-be12-44b0-898f-80e0d731fd
71', 'ml2_port_bindings_host': 'ubuntu-controller'}]
May 26 00:59:49 ubuntu-controller neutron-server[144110]: (Background on this error at: http://sqlalche.me/e/13/gkpj) {{(pid=144110) wrapped /usr/local/lib/python3.8/dist-packages/neutron_lib/db/api.py:183}}
May 26 00:59:49 ubuntu-controller neutron-server[144110]: DEBUG oslo_db.api [req-7f1bf57f-6fdb-4014-9abd-a7317f3c54e2 req-30feb9f0-e0fa-4f48-802b-31e78181805a service neutron] Performing DB retry for func
tion neutron.plugins.ml2.plugin.Ml2Plugin.update_port {{(pid=144110) wrapper /usr/local/lib/python3.8/dist-packages/oslo_db/api.py:155}}

[...]

I think this ^ would be a separate bug to open for the neutron component.

Comment 8 Jakub Libosvar 2021-05-26 15:31:34 UTC
(In reply to melanie witt from comment #7)
> I found that the repeating neutron DuplicateEntry error is actually caused
> by a neutron side retry sequence:
> 
> q-svc:
> 
> May 26 00:59:49 ubuntu-controller neutron-server[144110]: DEBUG
> neutron.plugins.ml2.managers [req-7f1bf57f-6fdb-4014-9abd-a7317f3c54e2
> req-30feb9f0-e0fa-4f48-802b-31e78181805a service neutron] DB exception
> raised by Mechanism driver 'openvswitch' in update_port_precommit
> {{(pid=144110) _call_on_drivers
> /opt/stack/neutron/neutron/plugins/ml2/managers.py:481}}
> May 26 00:59:49 ubuntu-controller neutron-server[144110]: ERROR
> neutron.plugins.ml2.managers Traceback (most recent call last):
> May 26 00:59:49 ubuntu-controller neutron-server[144110]: ERROR
> neutron.plugins.ml2.managers   File
> "/usr/local/lib/python3.8/dist-packages/sqlalchemy/engine/base.py", line
> 1276, in _execute_context
> May 26 00:59:49 ubuntu-controller neutron-server[144110]: ERROR
> neutron.plugins.ml2.managers     self.dialect.do_execute(
> May 26 00:59:49 ubuntu-controller neutron-server[144110]: ERROR
> neutron.plugins.ml2.managers   File
> "/usr/local/lib/python3.8/dist-packages/sqlalchemy/engine/default.py", line
> 608, in do_execute
> May 26 00:59:49 ubuntu-controller neutron-server[144110]: ERROR
> neutron.plugins.ml2.managers     cursor.execute(statement, parameters)
> May 26 00:59:49 ubuntu-controller neutron-server[144110]: ERROR
> neutron.plugins.ml2.managers   File
> "/usr/local/lib/python3.8/dist-packages/pymysql/cursors.py", line 148, in
> execute
> May 26 00:59:49 ubuntu-controller neutron-server[144110]: ERROR
> neutron.plugins.ml2.managers     result = self._query(query)
> May 26 00:59:49 ubuntu-controller neutron-server[144110]: ERROR
> neutron.plugins.ml2.managers   File
> "/usr/local/lib/python3.8/dist-packages/pymysql/cursors.py", line 310, in
> _query
> May 26 00:59:49 ubuntu-controller neutron-server[144110]: ERROR
> neutron.plugins.ml2.managers     conn.query(q)
> May 26 00:59:49 ubuntu-controller neutron-server[144110]: ERROR
> neutron.plugins.ml2.managers   File
> "/usr/local/lib/python3.8/dist-packages/pymysql/connections.py", line 548,
> in query
> May 26 00:59:49 ubuntu-controller neutron-server[144110]: ERROR
> neutron.plugins.ml2.managers     self._affected_rows =
> self._read_query_result(unbuffered=unbuffered)
> May 26 00:59:49 ubuntu-controller neutron-server[144110]: ERROR
> neutron.plugins.ml2.managers   File
> "/usr/local/lib/python3.8/dist-packages/pymysql/connections.py", line 775,
> in _read_query_result
> May 26 00:59:49 ubuntu-controller neutron-server[144110]: ERROR
> neutron.plugins.ml2.managers     result.read()
> May 26 00:59:49 ubuntu-controller neutron-server[144110]: ERROR
> neutron.plugins.ml2.managers   File
> "/usr/local/lib/python3.8/dist-packages/pymysql/connections.py", line 1156,
> in read
> May 26 00:59:49 ubuntu-controller neutron-server[144110]: ERROR
> neutron.plugins.ml2.managers     first_packet =
> self.connection._read_packet()
> May 26 00:59:49 ubuntu-controller neutron-server[144110]: ERROR
> neutron.plugins.ml2.managers   File
> "/usr/local/lib/python3.8/dist-packages/pymysql/connections.py", line 725,
> in _read_packet
> May 26 00:59:49 ubuntu-controller neutron-server[144110]: ERROR
> neutron.plugins.ml2.managers     packet.raise_for_error()
> May 26 00:59:49 ubuntu-controller neutron-server[144110]: ERROR
> neutron.plugins.ml2.managers   File
> "/usr/local/lib/python3.8/dist-packages/pymysql/protocol.py", line 221, in
> raise_for_error
> May 26 00:59:49 ubuntu-controller neutron-server[144110]: ERROR
> neutron.plugins.ml2.managers     err.raise_mysql_exception(self._data)
> May 26 00:59:49 ubuntu-controller neutron-server[144110]: ERROR
> neutron.plugins.ml2.managers   File
> "/usr/local/lib/python3.8/dist-packages/pymysql/err.py", line 143, in
> raise_mysql_exception
> May 26 00:59:49 ubuntu-controller neutron-server[144110]: ERROR
> neutron.plugins.ml2.managers     raise errorclass(errno, errval)
> May 26 00:59:49 ubuntu-controller neutron-server[144110]: ERROR
> neutron.plugins.ml2.managers pymysql.err.IntegrityError: (1062, "Duplicate
> entry '17c5e5e6-be12-44b0-898f-80e0d731fd71-ubuntu-subnode' for k
> ey 'ml2_port_bindings.PRIMARY'")
> 
> [...]
> 
> May 26 00:59:49 ubuntu-controller neutron-server[144110]: ERROR
> neutron.plugins.ml2.managers [SQL: UPDATE ml2_port_bindings SET
> host=%(host)s, vif_type=%(vif_type)s, vif_details=%(vif_details)s WHERE
> ml2_port_bindings.port_id = %(ml2_port_bindings_port_id)s AND
> ml2_port_bindings.host = %(ml2_port_bindings_host)s]
> May 26 00:59:49 ubuntu-controller neutron-server[144110]: ERROR
> neutron.plugins.ml2.managers [parameters: {'host': 'ubuntu-subnode',
> 'vif_type': 'unbound', 'vif_details': '', 'ml2_port_bindings_port_id':
> '17c5e5e6-be12-44b0-898f-80e0d731fd71', 'ml2_port_bindings_host':
> 'ubuntu-controller'}]
> May 26 00:59:49 ubuntu-controller neutron-server[144110]: ERROR
> neutron.plugins.ml2.managers (Background on this error at:
> http://sqlalche.me/e/13/gkpj)
> May 26 00:59:49 ubuntu-controller neutron-server[144110]: ERROR
> neutron.plugins.ml2.managers
> May 26 00:59:49 ubuntu-controller neutron-server[144110]: DEBUG
> neutron_lib.db.api [req-7f1bf57f-6fdb-4014-9abd-a7317f3c54e2
> req-30feb9f0-e0fa-4f48-802b-31e78181805a service neutron] Retry wrapper got
> retriable exception: (pymysql.err.IntegrityError) (1062, "Duplicate entry
> '17c5e5e6-be12-44b0-898f-80e0d731fd71-ubuntu-subnode' for key
> 'ml2_port_bindings.PRIMARY'")
> May 26 00:59:49 ubuntu-controller neutron-server[144110]: [SQL: UPDATE
> ml2_port_bindings SET host=%(host)s, vif_type=%(vif_type)s,
> vif_details=%(vif_details)s WHERE ml2_port_bindings.port_id =
> %(ml2_port_bindings_port_id)s AND ml2_port_bindings.host =
> %(ml2_port_bindings_host)s]
> May 26 00:59:49 ubuntu-controller neutron-server[144110]: [parameters:
> {'host': 'ubuntu-subnode', 'vif_type': 'unbound', 'vif_details': '',
> 'ml2_port_bindings_port_id': '17c5e5e6-be12-44b0-898f-80e0d731fd
> 71', 'ml2_port_bindings_host': 'ubuntu-controller'}]
> May 26 00:59:49 ubuntu-controller neutron-server[144110]: (Background on
> this error at: http://sqlalche.me/e/13/gkpj) {{(pid=144110) wrapped
> /usr/local/lib/python3.8/dist-packages/neutron_lib/db/api.py:183}}
> May 26 00:59:49 ubuntu-controller neutron-server[144110]: DEBUG oslo_db.api
> [req-7f1bf57f-6fdb-4014-9abd-a7317f3c54e2
> req-30feb9f0-e0fa-4f48-802b-31e78181805a service neutron] Performing DB
> retry for func
> tion neutron.plugins.ml2.plugin.Ml2Plugin.update_port {{(pid=144110) wrapper
> /usr/local/lib/python3.8/dist-packages/oslo_db/api.py:155}}
> 
> [...]
> 
> I think this ^ would be a separate bug to open for the neutron component.

I think this is actually correct behavior. Neutron should retry the port binding multiple times - even if the DB is in bad shape. This is triggered on every port update to fix failed_binding statuses. I'll needinfo Slawek to keep me honest here and to get his opinion.

Comment 9 melanie witt 2021-05-26 15:40:04 UTC
(In reply to Jakub Libosvar from comment #8)
> (In reply to melanie witt from comment #7)
> > I think this ^ would be a separate bug to open for the neutron component.
> 
> I think this is actually correct behavior. Neutron should retry the port
> binding multiple times - even if the DB is in bad shape. This is triggered
> on every port update to fix failed_binding statuses. I'll needinfo Slawek to
> keep me honest here and to get his opinion.

Yeah.. what I found odd about the behavior is that it continues to retry forever AFAICT and it didn't need to because the activation eventually worked (delayed by sleep(120)) but it continued to try to apply the same DB update repeatedly anyway, even though the DB was already in the desired state.

Comment 10 Slawek Kaplonski 2021-05-27 07:23:44 UTC
I'm not sure if it should keep trying to update DB forever. IMHO it should stops at some point. Please open new BZ for that and I will try to reproduce such case and take a deeper look.

Comment 11 Jakub Libosvar 2021-05-27 07:28:17 UTC
(In reply to melanie witt from comment #9)
> (In reply to Jakub Libosvar from comment #8)
> > (In reply to melanie witt from comment #7)
> > > I think this ^ would be a separate bug to open for the neutron component.
> > 
> > I think this is actually correct behavior. Neutron should retry the port
> > binding multiple times - even if the DB is in bad shape. This is triggered
> > on every port update to fix failed_binding statuses. I'll needinfo Slawek to
> > keep me honest here and to get his opinion.
> 
> Yeah.. what I found odd about the behavior is that it continues to retry
> forever AFAICT and it didn't need to because the activation eventually
> worked (delayed by sleep(120)) but it continued to try to apply the same DB
> update repeatedly anyway, even though the DB was already in the desired
> state.

A-ha, I missed that it never stops attempting the DB update. Thanks for clarification.

Comment 14 melanie witt 2021-06-07 21:50:59 UTC
*** Bug 1968747 has been marked as a duplicate of this bug. ***

Comment 17 smooney 2021-06-18 12:26:38 UTC
*** Bug 1973635 has been marked as a duplicate of this bug. ***

Comment 43 errata-xmlrpc 2021-12-09 20:19:04 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 16.1.7 (Train) bug fix and enhancement advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:3762

Comment 44 Slawek Kaplonski 2022-01-03 19:38:24 UTC
*** Bug 2017803 has been marked as a duplicate of this bug. ***

Comment 46 Bernard Cafarelli 2022-07-11 09:45:25 UTC
*** Bug 2103781 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.