Bug 1285879 - Race condition puts ovs agent in resync
Race condition puts ovs agent in resync
Status: CLOSED DUPLICATE of bug 1289994
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-neutron (Show other bugs)
7.0 (Kilo)
Unspecified Unspecified
high Severity high
: ---
: 8.0 (Liberty)
Assigned To: lpeer
Ofer Blaut
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2015-11-26 15:25 EST by Brent Eagles
Modified: 2016-04-26 15:08 EDT (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2015-12-16 21:58:39 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Launchpad 1499488 None None None Never
OpenStack gerrit 227517 None None None Never

  None (edit)
Description Brent Eagles 2015-11-26 15:25:22 EST
Cloned from launchpad bug 1499488.

Description:

The following code is from neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent.OVSNeutronAgent.treat_devices_added_or_updated():

        devices_details_list = (
            self.plugin_rpc.get_devices_details_list_and_failed_devices(
                self.context,
                devices,
                self.agent_id,
                self.conf.host))
        if devices_details_list.get('failed_devices'):
            #TODO(rossella_s) handle better the resync in next patches,
            # this is just to preserve the current behavior
            raise DeviceListRetrievalError(devices=devices)

        devices = devices_details_list.get('devices')
        vif_by_id = self.int_br.get_vifs_by_ids(
            [vif['device'] for vif in devices])

The race condition comes in between get_devices_details_list_and_failed_devices() and get_vifs_by_ids().  If a VM is deleted in that time, then the OVS port goes away and get_vifs_by_ids() raises an exception, which bumps us out to the exception handler in rpc_loop and puts us in resync, causing the next rpc_loop to rescan ALL ports.  On a highly scaled system, this resync can take many minutes, in which time new plug requests all timeout.

get_vifs_by_ids() was added under this patch: https://review.openstack.org/#/c/186734/

The reason the exception is raised due to the missing port is because this new get_vifs_by_id method is not passing if_exists=True on the call to get_ports_attributes().  A grep within that file shows every other call to get_ports_attributes passing if_exists=True.

I believe the fix is to simply start passing if_exists=True in get_vifs_by_ids.

Specification URL (additional information):

https://bugs.launchpad.net/neutron/+bug/1499488
Comment 1 Assaf Muller 2015-12-16 21:58:39 EST
Will be resolved via OSP 8 rebase before GA.

*** This bug has been marked as a duplicate of bug 1289994 ***

Note You need to log in before you can comment on or make changes to this bug.