Bug 1652405

Summary:	Race condition while locking the ironic node and Stale VIF ports in ironic
Product:	Red Hat OpenStack	Reporter:	PURANDHAR SAIRAM MANNIDI <pmannidi>
Component:	openstack-nova	Assignee:	OSP DFG:Compute <osp-dfg-compute>
Status:	CLOSED WONTFIX	QA Contact:	OSP DFG:Compute <osp-dfg-compute>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	13.0 (Queens)	CC:	bfournie, dasmith, dtantsur, eglynn, jhakimra, jkreger, kchamart, mbooth, mburns, mwitt, pmannidi, sbauza, sgordon, vromanso
Target Milestone:	---	Keywords:	Triaged, ZStream
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2019-10-15 10:59:28 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description PURANDHAR SAIRAM MANNIDI 2018-11-22 01:34:53 UTC

Description of problem:
Ironic has stale VIF ports during a race condition to lock the same node for overcloud nodes and VIF ports are not deleted upon instance deletion and all the subsequent instance spawning fails because of the stale VIF attached to the node.

Version-Release number of selected component (if applicable):
Red Hat OpenStack Platform 13 

How reproducible:
Once

Steps to Reproduce:
1. Deploy a overcloud from director with just two nodes (one controller + one compute), one node have incorrect capabilities
2. deployment fails because of 2 instances trying to lock same node.


Actual results:
Deployment fails and leaves a stale VIF attached to the baremetal node.

Expected results:
Even if deployment fails, it shouldn't leave a stale VIF port.

Comment 3 Bob Fournier 2018-11-26 18:52:05 UTC

Seems similar to https://bugzilla.redhat.com/show_bug.cgi?id=1535766. 

Sai - can we see the version of openstack-ironic being used, to confirm they have that fix?

Comment 5 Bob Fournier 2018-11-30 16:44:42 UTC

Is the issue that the deployment was done before the deletion was complete, so cleaning  of the node was still in process?

Comment 8 Bob Fournier 2018-12-06 15:09:08 UTC

Moving this to compute to take a look based on Comment 7.

Comment 13 Matthew Booth 2019-10-15 10:59:28 UTC

I am closing this bug as it has not been addressed for a very long time. Please feel free to reopen if it is still relevant.