Bug 1652405 - Race condition while locking the ironic node and Stale VIF ports in ironic
Summary: Race condition while locking the ironic node and Stale VIF ports in ironic
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-nova
Version: 13.0 (Queens)
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: ---
Assignee: OSP DFG:Compute
QA Contact: OSP DFG:Compute
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-11-22 01:34 UTC by PURANDHAR SAIRAM MANNIDI
Modified: 2023-03-21 19:08 UTC (History)
14 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-10-15 10:59:28 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker OSP-13874 0 None None None 2022-03-13 17:12:43 UTC

Description PURANDHAR SAIRAM MANNIDI 2018-11-22 01:34:53 UTC
Description of problem:
Ironic has stale VIF ports during a race condition to lock the same node for overcloud nodes and VIF ports are not deleted upon instance deletion and all the subsequent instance spawning fails because of the stale VIF attached to the node.

Version-Release number of selected component (if applicable):
Red Hat OpenStack Platform 13 

How reproducible:
Once

Steps to Reproduce:
1. Deploy a overcloud from director with just two nodes (one controller + one compute), one node have incorrect capabilities
2. deployment fails because of 2 instances trying to lock same node.


Actual results:
Deployment fails and leaves a stale VIF attached to the baremetal node.

Expected results:
Even if deployment fails, it shouldn't leave a stale VIF port.

Comment 3 Bob Fournier 2018-11-26 18:52:05 UTC
Seems similar to https://bugzilla.redhat.com/show_bug.cgi?id=1535766. 

Sai - can we see the version of openstack-ironic being used, to confirm they have that fix?

Comment 5 Bob Fournier 2018-11-30 16:44:42 UTC
Is the issue that the deployment was done before the deletion was complete, so cleaning  of the node was still in process?

Comment 8 Bob Fournier 2018-12-06 15:09:08 UTC
Moving this to compute to take a look based on Comment 7.

Comment 13 Matthew Booth 2019-10-15 10:59:28 UTC
I am closing this bug as it has not been addressed for a very long time. Please feel free to reopen if it is still relevant.


Note You need to log in before you can comment on or make changes to this bug.