Bug 1652405

Summary: Race condition while locking the ironic node and Stale VIF ports in ironic
Product: Red Hat OpenStack Reporter: PURANDHAR SAIRAM MANNIDI <pmannidi>
Component: openstack-novaAssignee: OSP DFG:Compute <osp-dfg-compute>
Status: CLOSED WONTFIX QA Contact: OSP DFG:Compute <osp-dfg-compute>
Severity: medium Docs Contact:
Priority: medium    
Version: 13.0 (Queens)CC: bfournie, dasmith, dtantsur, eglynn, jhakimra, jkreger, kchamart, mbooth, mburns, mwitt, pmannidi, sbauza, sgordon, vromanso
Target Milestone: ---Keywords: Triaged, ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-10-15 10:59:28 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description PURANDHAR SAIRAM MANNIDI 2018-11-22 01:34:53 UTC
Description of problem:
Ironic has stale VIF ports during a race condition to lock the same node for overcloud nodes and VIF ports are not deleted upon instance deletion and all the subsequent instance spawning fails because of the stale VIF attached to the node.

Version-Release number of selected component (if applicable):
Red Hat OpenStack Platform 13 

How reproducible:
Once

Steps to Reproduce:
1. Deploy a overcloud from director with just two nodes (one controller + one compute), one node have incorrect capabilities
2. deployment fails because of 2 instances trying to lock same node.


Actual results:
Deployment fails and leaves a stale VIF attached to the baremetal node.

Expected results:
Even if deployment fails, it shouldn't leave a stale VIF port.

Comment 3 Bob Fournier 2018-11-26 18:52:05 UTC
Seems similar to https://bugzilla.redhat.com/show_bug.cgi?id=1535766. 

Sai - can we see the version of openstack-ironic being used, to confirm they have that fix?

Comment 5 Bob Fournier 2018-11-30 16:44:42 UTC
Is the issue that the deployment was done before the deletion was complete, so cleaning  of the node was still in process?

Comment 8 Bob Fournier 2018-12-06 15:09:08 UTC
Moving this to compute to take a look based on Comment 7.

Comment 13 Matthew Booth 2019-10-15 10:59:28 UTC
I am closing this bug as it has not been addressed for a very long time. Please feel free to reopen if it is still relevant.