Bug 1860448 - OVN transaction could not be completed due to a race condition
Summary: OVN transaction could not be completed due to a race condition
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: python-networking-ovn
Version: 16.1 (Train)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: z9
: 16.1 (Train on RHEL 8.2)
Assignee: Arnau Verdaguer
QA Contact: Bharath M V
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-07-24 15:38 UTC by Eduardo Olivares
Modified: 2022-12-07 20:25 UTC (History)
8 users (show)

Fixed In Version: python-networking-ovn-7.3.1-1.20221011003657.4e24f4c.el8ost
Doc Type: No Doc Update
Doc Text:
Clone Of:
: 2132280 2132405 (view as bug list)
Environment:
Last Closed: 2022-12-07 20:24:45 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 821401 0 None MERGED [ovn] Specify port type if it's a router port when updating 2022-10-11 13:29:53 UTC
Red Hat Issue Tracker OSP-478 0 None None None 2021-11-18 14:32:36 UTC
Red Hat Product Errata RHBA-2022:8795 0 None None None 2022-12-07 20:25:20 UTC

Description Eduardo Olivares 2020-07-24 15:38:40 UTC
Description of problem:
A similar issue was reported upstream: https://bugs.launchpad.net/neutron/+bug/1885898

Please take a look at the logs pasted here: http://pastebin.test.redhat.com/887526

All controller logs can be found here: http://file.mad.redhat.com/eolivare/ovn-race-bug-16.1/

This is the test failing due to this bug: https://rhos-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/DFG-network-networking-ovn-16.1_director-rhel-virthost-3cont_2comp-ipv4-geneve/93/testReport/junit/neutron_tempest_plugin.scenario.test_connectivity/NetworkConnectivityTest/test_connectivity_through_2_routers_id_8944b90d_1766_4669_bd8a_672b5d106bb7_/


The test fails when trying to connect via ssh to a VM instance with a FIP. What happens, according to the logs, is that a port was not added correctly to the router and due to that, the FIP is not reachable. The transaction was answered with a 200 OK by the neutron API, which is wrong too.

Apparently, there were two transactions on that port happening simultaneously: first one from controller-1 and second one from controller-2.
This is from controller-2: 
2020-07-24 09:03:23.416 26 INFO networking_ovn.db.revision [req-9a7f0860-874b-400c-8da1-8e9dd7bc0225 - - - - -] Successfully bumped revision number for resource 506b38c9-9c96-4ea3-ad73-1df17fcb5d39 (type: ports) to 5
Meanwhile, controller-1:
2020-07-24 09:03:23.792 30 INFO networking_ovn.db.revision [req-f075e29f-18fb-4847-a8ca-a75f18691a4d 16fba420c546457da720cb2460389130 f748c640989141599d3d9a416fdfd1ee - default default] Successfully bumped revision number for resource 506b38c9-9c96-4ea3-ad73-1df17fcb5d39 (type: router_ports) to 4

And finally, the transaction crashes on controller-1:
2020-07-24 09:03:23.972 30 ERROR ovsdbapp.backend.ovs_idl.transaction [req-f075e29f-18fb-4847-a8ca-a75f18691a4d 16fba420c546457da720cb2460389130 f748c640989141599d3d9a416fdfd1ee - default default] Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/connection.py", line 122, in run
    txn.results.put(txn.do_commit())                                            
  File "/usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/transaction.py", line 86, in do_commit
    command.run_idl(txn)                                                        
  File "/usr/lib/python3.6/site-packages/networking_ovn/ovsdb/commands.py", line 1021, in run_idl
    resource_id=self.name, resource_type=self.resource_type)                    
networking_ovn.common.exceptions.RevisionConflict: OVN revision number for 506b38c9-9c96-4ea3-ad73-1df17fcb5d39 (type: ports) is equal or higher than the given resource. Skipping update





Version-Release number of selected component (if applicable):
RHOS-16.1-RHEL-8-20200723.n.0
ovn2.13-host-2.13.0-37.el8fdp.x86_64
python3-networking-ovn-7.2.1-0.20200611111150.18fabca.el8ost.noarch

The test did not fail with RHOS-16.1-RHEL-8-20200714.n.0, which has same OVN versions. This is normal because the issue is due to a race condition


How reproducible:
Not often, it is due to a race condition


Steps to Reproduce:
1. it was reproduced by tempest test neutron_tempest_plugin.scenario.test_connectivity.NetworkConnectivityTest.test_connectivity_through_2_routers
2.
3.

Comment 21 errata-xmlrpc 2022-12-07 20:24:45 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 16.1.9 bug fix and enhancement advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:8795


Note You need to log in before you can comment on or make changes to this bug.