Bug 1725189

Summary: Port detach fails when compute host is unreachable
Product: Red Hat OpenStack Reporter: Carlos Goncalves <cgoncalves>
Component: openstack-novaAssignee: OSP DFG:Compute <osp-dfg-compute>
Status: CLOSED WONTFIX QA Contact: OSP DFG:Compute <osp-dfg-compute>
Severity: high Docs Contact:
Priority: high    
Version: 13.0 (Queens)CC: alifshit, astupnik, dasmith, eglynn, jhakimra, kchamart, michjohn, mvalsecc, njohnston, sbauza, sgordon, smooney, sputhenp, vromanso
Target Milestone: ---Keywords: Reopened, Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-05-30 18:00:09 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Carlos Goncalves 2019-06-28 15:49:27 UTC
This bug report is being opened to better track on OSP a bug that has been confirmed upstream in Nova. This bug impacts Octavia use cases.

"When a compute host is unreachable, a port detach for a VM on that host will not complete until the host is reachable again. In some cases, this may for an extended period or even indefinitely (for example, a host is powered down for hardware maintenance, and possibly needs to be removed from the fleet entirely). This is problematic for multiple reasons:

1) The port should not be deleted in this state (it can be, but for reasons outside the scope of this bug, that is not recommended). Thus, the quota cannot be reclaimed by the project.
2) The port cannot be reassigned to another VM. This means that for projects that rely heavily on maintaining a published IP (or possibly even a published port ID), there is no way to proceed. For example, if Octavia wanted to allow failing over from one VM to another in a VM down event (as would happen if the host was powered off) without using AAP, it would be unable to do so, leading to an extended downtime.

Nova will supposedly clean up such resources after the host has been powered up, but that could take hours or possibly never happen. So, there should be a way to force the port to detach regardless of ability to reach the compute host, and simply allow the cleanup to happen on that host in the future (if possible) but immediately release the port for delete or rebinding."

Description copied from https://bugs.launchpad.net/nova/+bug/1827746

Comment 1 Matthew Booth 2019-07-05 15:18:36 UTC
It's not clear that we would be able to backport this to OSP13.

Comment 6 Michael Johnson 2019-07-18 14:57:49 UTC
VM delete via nova gets stuck in "pending-delete", port detach also gets stuck and does not complete until the instance is up.

If I remember right port delete will also not complete, but this also is a problem as the IP would then be released and potentially allocated to a different port in the project, thus losing the VIP fixed IP.

Comment 9 stchen 2020-11-04 19:52:07 UTC
Closing EOL, OSP 16.0 has been retired as of Oct 27, 2020

Comment 10 Carlos Goncalves 2020-11-05 16:53:52 UTC
Reopening BZ. This bug impacts other OSP supported versions (13, 16.1) as well as future releases (16.2, 17).

Comment 19 Artom Lifshitz 2023-05-30 18:00:09 UTC
With the deadline for patches for the last maintenance release for 16.2 fast approaching (August 9th), it is not realistic to start, fix, and backport this BZ in time. Closing as WONTFIX to set realistic expectations. There is still a possibility that this will get fixed upstream (https://bugs.launchpad.net/nova/+bug/1827746), and the fix would make its way into our product at some point in the future.