Cloned from launchpad bug 1499488. Description: The following code is from neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent.OVSNeutronAgent.treat_devices_added_or_updated(): devices_details_list = ( self.plugin_rpc.get_devices_details_list_and_failed_devices( self.context, devices, self.agent_id, self.conf.host)) if devices_details_list.get('failed_devices'): #TODO(rossella_s) handle better the resync in next patches, # this is just to preserve the current behavior raise DeviceListRetrievalError(devices=devices) devices = devices_details_list.get('devices') vif_by_id = self.int_br.get_vifs_by_ids( [vif['device'] for vif in devices]) The race condition comes in between get_devices_details_list_and_failed_devices() and get_vifs_by_ids(). If a VM is deleted in that time, then the OVS port goes away and get_vifs_by_ids() raises an exception, which bumps us out to the exception handler in rpc_loop and puts us in resync, causing the next rpc_loop to rescan ALL ports. On a highly scaled system, this resync can take many minutes, in which time new plug requests all timeout. get_vifs_by_ids() was added under this patch: https://review.openstack.org/#/c/186734/ The reason the exception is raised due to the missing port is because this new get_vifs_by_id method is not passing if_exists=True on the call to get_ports_attributes(). A grep within that file shows every other call to get_ports_attributes passing if_exists=True. I believe the fix is to simply start passing if_exists=True in get_vifs_by_ids. Specification URL (additional information): https://bugs.launchpad.net/neutron/+bug/1499488
Will be resolved via OSP 8 rebase before GA. *** This bug has been marked as a duplicate of bug 1289994 ***