Red Hat Bugzilla – Bug 1285879
Race condition puts ovs agent in resync
Last modified: 2016-04-26 15:08:22 EDT
Cloned from launchpad bug 1499488.
The following code is from neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent.OVSNeutronAgent.treat_devices_added_or_updated():
devices_details_list = (
#TODO(rossella_s) handle better the resync in next patches,
# this is just to preserve the current behavior
devices = devices_details_list.get('devices')
vif_by_id = self.int_br.get_vifs_by_ids(
[vif['device'] for vif in devices])
The race condition comes in between get_devices_details_list_and_failed_devices() and get_vifs_by_ids(). If a VM is deleted in that time, then the OVS port goes away and get_vifs_by_ids() raises an exception, which bumps us out to the exception handler in rpc_loop and puts us in resync, causing the next rpc_loop to rescan ALL ports. On a highly scaled system, this resync can take many minutes, in which time new plug requests all timeout.
get_vifs_by_ids() was added under this patch: https://review.openstack.org/#/c/186734/
The reason the exception is raised due to the missing port is because this new get_vifs_by_id method is not passing if_exists=True on the call to get_ports_attributes(). A grep within that file shows every other call to get_ports_attributes passing if_exists=True.
I believe the fix is to simply start passing if_exists=True in get_vifs_by_ids.
Specification URL (additional information):
Will be resolved via OSP 8 rebase before GA.
*** This bug has been marked as a duplicate of bug 1289994 ***