Bug 1285879 - Race condition puts ovs agent in resync
Summary: Race condition puts ovs agent in resync
Keywords:
Status: CLOSED DUPLICATE of bug 1289994
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-neutron
Version: 7.0 (Kilo)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 8.0 (Liberty)
Assignee: lpeer
QA Contact: Ofer Blaut
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-11-26 20:25 UTC by Brent Eagles
Modified: 2016-04-26 19:08 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-12-17 02:58:39 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1499488 0 None None None Never
OpenStack gerrit 227517 0 None None None Never

Description Brent Eagles 2015-11-26 20:25:22 UTC
Cloned from launchpad bug 1499488.

Description:

The following code is from neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent.OVSNeutronAgent.treat_devices_added_or_updated():

        devices_details_list = (
            self.plugin_rpc.get_devices_details_list_and_failed_devices(
                self.context,
                devices,
                self.agent_id,
                self.conf.host))
        if devices_details_list.get('failed_devices'):
            #TODO(rossella_s) handle better the resync in next patches,
            # this is just to preserve the current behavior
            raise DeviceListRetrievalError(devices=devices)

        devices = devices_details_list.get('devices')
        vif_by_id = self.int_br.get_vifs_by_ids(
            [vif['device'] for vif in devices])

The race condition comes in between get_devices_details_list_and_failed_devices() and get_vifs_by_ids().  If a VM is deleted in that time, then the OVS port goes away and get_vifs_by_ids() raises an exception, which bumps us out to the exception handler in rpc_loop and puts us in resync, causing the next rpc_loop to rescan ALL ports.  On a highly scaled system, this resync can take many minutes, in which time new plug requests all timeout.

get_vifs_by_ids() was added under this patch: https://review.openstack.org/#/c/186734/

The reason the exception is raised due to the missing port is because this new get_vifs_by_id method is not passing if_exists=True on the call to get_ports_attributes().  A grep within that file shows every other call to get_ports_attributes passing if_exists=True.

I believe the fix is to simply start passing if_exists=True in get_vifs_by_ids.

Specification URL (additional information):

https://bugs.launchpad.net/neutron/+bug/1499488

Comment 1 Assaf Muller 2015-12-17 02:58:39 UTC
Will be resolved via OSP 8 rebase before GA.

*** This bug has been marked as a duplicate of bug 1289994 ***


Note You need to log in before you can comment on or make changes to this bug.