Description of problem: After overcloud update (but before overcloud reboot) there are many failed tempest tests complaining that some valid neutron API call returned 'Server error'. Link to the job: http://staging-jenkins2-qe-playground.usersys.redhat.com/view/DFG/view/network/view/networking-ovn/job/DFG-network-networking-ovn-update-13_director-rhel-virthost-3cont_2comp_2net-ipv4-geneve-composable/2/ There are multiple errors like "Logical port XXXX is not bound to a chassis: MechanismDriverError" in neutron server log, see below 2020-02-23 20:02:19.649 36 ERROR neutron.plugins.ml2.managers 2020-02-23 20:02:19.700 36 ERROR ovsdbapp.event [req-cbcf2731-9877-4d3c-bd36-5d612bd019ac - - - - -] Unexpected exception in notify_loop: MechanismDriverError 2020-02-23 20:02:19.700 36 ERROR ovsdbapp.event Traceback (most recent call last): 2020-02-23 20:02:19.700 36 ERROR ovsdbapp.event File "/usr/lib/python2.7/site-packages/ovsdbapp/event.py", line 117, in notify_loop 2020-02-23 20:02:19.700 36 ERROR ovsdbapp.event match.run(event, row, updates) 2020-02-23 20:02:19.700 36 ERROR ovsdbapp.event File "/usr/lib/python2.7/site-packages/networking_ovn/ovsdb/ovsdb_monitor.py", line 211, in run 2020-02-23 20:02:19.700 36 ERROR ovsdbapp.event self.driver.set_port_status_up(row.name) 2020-02-23 20:02:19.700 36 ERROR ovsdbapp.event File "/usr/lib/python2.7/site-packages/networking_ovn/ml2/mech_driver.py", line 775, in set_port_status_up 2020-02-23 20:02:19.700 36 ERROR ovsdbapp.event const.PORT_STATUS_ACTIVE) 2020-02-23 20:02:19.700 36 ERROR ovsdbapp.event File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/plugin.py", line 1855, in update_port_status 2020-02-23 20:02:19.700 36 ERROR ovsdbapp.event context, {port_id: status}, host)[port_id] 2020-02-23 20:02:19.700 36 ERROR ovsdbapp.event File "/usr/lib/python2.7/site-packages/neutron/common/utils.py", line 634, in inner 2020-02-23 20:02:19.700 36 ERROR ovsdbapp.event return f(self, context, *args, **kwargs) 2020-02-23 20:02:19.700 36 ERROR ovsdbapp.event File "/usr/lib/python2.7/site-packages/neutron/db/api.py", line 161, in wrapped 2020-02-23 20:02:19.700 36 ERROR ovsdbapp.event return method(*args, **kwargs) 2020-02-23 20:02:19.700 36 ERROR ovsdbapp.event File "/usr/lib/python2.7/site-packages/neutron/db/api.py", line 91, in wrapped 2020-02-23 20:02:19.700 36 ERROR ovsdbapp.event setattr(e, '_RETRY_EXCEEDED', True) 2020-02-23 20:02:19.700 36 ERROR ovsdbapp.event File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__ 2020-02-23 20:02:19.700 36 ERROR ovsdbapp.event self.force_reraise() 2020-02-23 20:02:19.700 36 ERROR ovsdbapp.event File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise 2020-02-23 20:02:19.700 36 ERROR ovsdbapp.event six.reraise(self.type_, self.value, self.tb) 2020-02-23 20:02:19.700 36 ERROR ovsdbapp.event File "/usr/lib/python2.7/site-packages/neutron/db/api.py", line 87, in wrapped 2020-02-23 20:02:19.700 36 ERROR ovsdbapp.event return f(*args, **kwargs) 2020-02-23 20:02:19.700 36 ERROR ovsdbapp.event File "/usr/lib/python2.7/site-packages/oslo_db/api.py", line 147, in wrapper 2020-02-23 20:02:19.700 36 ERROR ovsdbapp.event ectxt.value = e.inner_exc 2020-02-23 20:02:19.700 36 ERROR ovsdbapp.event File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__ 2020-02-23 20:02:19.700 36 ERROR ovsdbapp.event self.force_reraise() 2020-02-23 20:02:19.700 36 ERROR ovsdbapp.event File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise 2020-02-23 20:02:19.700 36 ERROR ovsdbapp.event six.reraise(self.type_, self.value, self.tb) 2020-02-23 20:02:19.700 36 ERROR ovsdbapp.event File "/usr/lib/python2.7/site-packages/oslo_db/api.py", line 135, in wrapper 2020-02-23 20:02:19.700 36 ERROR ovsdbapp.event return f(*args, **kwargs) 2020-02-23 20:02:19.700 36 ERROR ovsdbapp.event File "/usr/lib/python2.7/site-packages/neutron/db/api.py", line 126, in wrapped 2020-02-23 20:02:19.700 36 ERROR ovsdbapp.event LOG.debug("Retry wrapper got retriable exception: %s", e) 2020-02-23 20:02:19.700 36 ERROR ovsdbapp.event File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__ 2020-02-23 20:02:19.700 36 ERROR ovsdbapp.event self.force_reraise() 2020-02-23 20:02:19.700 36 ERROR ovsdbapp.event File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise 2020-02-23 20:02:19.700 36 ERROR ovsdbapp.event six.reraise(self.type_, self.value, self.tb) 2020-02-23 20:02:19.700 36 ERROR ovsdbapp.event File "/usr/lib/python2.7/site-packages/neutron/db/api.py", line 122, in wrapped 2020-02-23 20:02:19.700 36 ERROR ovsdbapp.event return f(*dup_args, **dup_kwargs) 2020-02-23 20:02:19.700 36 ERROR ovsdbapp.event File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/plugin.py", line 1870, in update_port_statuses 2020-02-23 20:02:19.700 36 ERROR ovsdbapp.event context, port_dbs_by_id[port_id], status, host) 2020-02-23 20:02:19.700 36 ERROR ovsdbapp.event File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/plugin.py", line 1882, in _safe_update_individual_port_db_status 2020-02-23 20:02:19.700 36 ERROR ovsdbapp.event ectx.reraise = bool(db.get_port(context, port_id)) 2020-02-23 20:02:19.700 36 ERROR ovsdbapp.event File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__ 2020-02-23 20:02:19.700 36 ERROR ovsdbapp.event self.force_reraise() 2020-02-23 20:02:19.700 36 ERROR ovsdbapp.event File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise 2020-02-23 20:02:19.700 36 ERROR ovsdbapp.event six.reraise(self.type_, self.value, self.tb) 2020-02-23 20:02:19.700 36 ERROR ovsdbapp.event File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/plugin.py", line 1878, in _safe_update_individual_port_db_status 2020-02-23 20:02:19.700 36 ERROR ovsdbapp.event context, port, status, host) 2020-02-23 20:02:19.700 36 ERROR ovsdbapp.event File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/plugin.py", line 1946, in _update_individual_port_db_status 2020-02-23 20:02:19.700 36 ERROR ovsdbapp.event self.mechanism_manager.update_port_postcommit(mech_context) 2020-02-23 20:02:19.700 36 ERROR ovsdbapp.event File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/managers.py", line 723, in update_port_postcommit 2020-02-23 20:02:19.700 36 ERROR ovsdbapp.event continue_on_failure=True) 2020-02-23 20:02:19.700 36 ERROR ovsdbapp.event File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/managers.py", line 473, in _call_on_drivers 2020-02-23 20:02:19.700 36 ERROR ovsdbapp.event errors=errors 2020-02-23 20:02:19.700 36 ERROR ovsdbapp.event MechanismDriverError 2020-02-23 20:02:19.700 36 ERROR ovsdbapp.event 2020-02-23 20:02:21.789 36 WARNING networking_ovn.ml2.mech_driver [req-cbcf2731-9877-4d3c-bd36-5d612bd019ac - - - - -] Logical port 82d73d17-f93d-4a0c-9619-234a162dc564 is not bound to a chassis: MechanismDriverError Note: Minor update job consists of the following steps: 1. Overcloud deployment 2. Tempest run - there were no failed tests 3. Undercloud update 4. Overcloud update 5. 2nd tempest run - there were 74 failed tests, customers may experience issues 6. Overcloud reboot 7. 3rd tempest run - there were no failed tests Version-Release number of selected component (if applicable): Minor update from 13z8 (2019-08-27.2, last release with openvswitch 2.9) to 2020-02-20.4 (latest puddle with openvswitch 2.11) python-networking-ovn-4.0.4-2.el7ost.noarch openvswitch2.11-2.11.0-35.el7fdp.x86_64 How reproducible: 100% Steps to Reproduce: Run this job: http://staging-jenkins2-qe-playground.usersys.redhat.com/view/DFG/view/network/view/networking-ovn/job/DFG-network-networking-ovn-update-13_director-rhel-virthost-3cont_2comp_2net-ipv4-geneve-composable/ Note: this is staging jenkins, the job is identical to https://rhos-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/DFG/view/network/view/networking-ovn/job/DFG-network-networking-ovn-update-13_director-rhel-virthost-3cont_2comp_2net-ipv4-geneve-composable/ However I had to run the job with ir_tripleo_upgrade_floatingip_check: 'no' in order to avoid fail due to https://bugzilla.redhat.com/show_bug.cgi?id=1786468 Actual results: 74 tests failed on 2nd tempest run Expected results: No failed tests on 2nd tempest run Additional info: Logs can be found at http://file.tlv.redhat.com/~rsafrono/bug_sosreport/osp13_minor_update/
The reason binding fails is because the VMs' security groups try to use port groups while the maintenance task hasn't successfully migrated from the address sets: 2020-02-23 20:27:18.459 [c2] 37 DEBUG networking_ovn.common.maintenance [req-fe795db1-bb36-4e90-bfef-b35a9cff0f7b - - - - -] Maintenance task: Fixing resource a940c8ac-f657-45b5-a849-42730a6b41be (type: security_groups) at delete check_for_inconsistencies /usr/lib/python2.7/site-packages/networking_ovn/common/maintenance.py:325 2020-02-23 20:27:18.461 [c2] 37 DEBUG ovsdbapp.backend.ovs_idl.transaction [-] Running txn command(idx=0): DelAddrSetCommand(if_exists=True, name=as_ip4_a940c8ac_f657_45b5_a849_42730a6b41be) do_commit /usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/transaction.py:84 2020-02-23 20:27:18.461 [c2] 37 DEBUG ovsdbapp.backend.ovs_idl.transaction [-] Running txn command(idx=1): DelAddrSetCommand(if_exists=True, name=as_ip6_a940c8ac_f657_45b5_a849_42730a6b41be) do_commit /usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/transaction.py:84 2020-02-23 20:27:18.463 [c2] 37 ERROR ovsdbapp.backend.ovs_idl.transaction [-] OVSDB Error: The transaction failed because the IDL has been configured to require a database lock but didn't get it yet or has already lost it: RuntimeError: OVSDB Error: The transaction failed because the IDL has been configured to require a database lock but didn't get it yet or has already lost it 2020-02-23 20:27:18.463 [c2] 37 ERROR ovsdbapp.backend.ovs_idl.transaction [req-fe795db1-bb36-4e90-bfef-b35a9cff0f7b - - - - -] Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/connection.py", line 122, in run txn.results.put(txn.do_commit()) File "/usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/transaction.py", line 115, in do_commit raise RuntimeError(msg) RuntimeError: OVSDB Error: The transaction failed because the IDL has been configured to require a database lock but didn't get it yet or has already lost it : RuntimeError: OVSDB Error: The transaction failed because the IDL has been configured to require a database lock but didn't get it yet or has already lost it I need to investigate why the DB lock has been lost.
I have never found the reason why the lock got lost. Since 13 is ELS, I'm going to close it but feel free to re-open in case the problem is hit again.