Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 957888

Summary: Exceptions when deleting a VM can leave it stuck in task_state deleting and render it "undeletable"
Product: Red Hat OpenStack Reporter: Brent Eagles <beagles>
Component: openstack-novaAssignee: RHOS Maint <rhos-maint>
Status: CLOSED NOTABUG QA Contact: Ami Jeain <ajeain>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 3.0CC: dallan, ndipanov, yeylon, ykaul
Target Milestone: ---Keywords: Triaged
Target Release: 3.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-05-08 19:19:20 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Brent Eagles 2013-04-29 19:54:23 UTC
If an exception occurs while deleting an instance, it is possible for nova to leave it with a task_state of "deleting". Nova operations generally do not modify instances that have task_state that is not "None". As there is no way to "reset" this state using command line tools, that instance is effectively "stuck" there. The only resolution is direct manipulation of the database.

In some cases, the VMs are removed and the data left behind. In others, both the data and the VMs are left.

How reproducible:

Varys, some reports seem to imply a race condition involving multiple duplicate requests, but it may simply be a timeout and retry under heavy load.

Direct steps to reproduce TBD.

This may be a duplicate of or at least related to:
https://bugzilla.redhat.com/show_bug.cgi?id=957267

Comment 2 Brent Eagles 2013-04-29 20:05:47 UTC
This bug is at the root of https://bugzilla.redhat.com/show_bug.cgi?id=918530

Comment 4 Brent Eagles 2013-05-08 19:09:08 UTC
This bug is a *little* bogus in that normally the VM is permanently stuck. You have to do something "unnecessary" to get it to that point. First.. how does the VM get stuck in "deleting"? There are a number of ways this could occur, but the on that seemed to be occurring in the bug that initiated this report is a race condition involving multiple delete calls. It is possible for a domain to be successfully "looked up" while a previous delete is pending on libvirt. This is often caught by an exception block in the libvirt driver's _destroy() method, there is a small window where the previous process may also "undefine" the domain, resulting in a different error code and causing things to back out. Needless to say this is difficult to reproduce without actually "hacking" the code to open a window for this weirdness to occur. I suppose there are a number of other possible causes (and possibly other bugs) that might make this happen more frequently in the wild.

The *little* bogus part is that when I was initially investigating this and found that several VMs had task_state deleting and were still active, I ran "nova reset-state" on them. That's bad apparently because THEN you can't delete the records through the APIs. If you aren't so foolish as to run "reset-state" the second delete clears things up.

I'm leaving this open for the moment while I see if there isn't something that can be done about the "stuck after reset-state" issue.

Comment 5 Brent Eagles 2013-05-08 19:19:20 UTC
... and that is also bogus. This also currently works. As the task_state thing is covered in other issues and the distinguishing premise of this bug report (i.e. "undeletable") is invalid, I'm closing this one.