Bug 861508

Summary: terminating instances that have failed to launch leaves fixed_ips, floating_ips, and instances tables in inconsistent state
Product: Red Hat OpenStack Reporter: Dan Yocum <dyocum>
Component: openstack-novaAssignee: RHOS Maint <rhos-maint>
Status: CLOSED WONTFIX QA Contact: Nir Magnezi <nmagnezi>
Severity: medium Docs Contact:
Priority: high    
Version: 1.0 (Essex)CC: markmc
Target Milestone: ga   
Target Release: 2.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-03-06 07:50:37 EST Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:

Description Dan Yocum 2012-09-28 17:23:51 EDT
Description of problem:

When VMs fail to launch due to an error (whatever error that may be), "terminating" the instance sometimes fails to reset the fixed_ips, floating_ips, and instances table requiring manually updating those tables to restore the ability launch new VMs.

I'm using flatDHCP network manager.

Version-Release number of selected component (if applicable):

Essex

How reproducible:

too often to count

Steps to Reproduce:

I don't know of a reliable way to reproduce this bug.  I think that it's possible by causing a timeout to the qpid server particularly when it takes an abnormally long period of time to spawn the VM. Before you ask, I have increased the qpid_heartbeat to 60s.  

Then while the VM is purported "building" issue the "terminate" command and then the database gets into it's paradoxical state of "building" and "deleting" at the same time, from which it never leaves until I manually edit the instances, floating_ip, and fixed_ip tables.

  
Actual results:


Expected results:


Additional info:
Comment 2 Dan Yocum 2012-12-04 12:49:42 EST
This bug has bit me again.  It's effects are cascading and cumulative.

Here are some more details:

Apparently, sometimes the terminated_at field is not updated with a valid datetime, leaving it NULL, however, deleted is set to 1, and deleted_at is updated with the datetime that the time the instance was deleted.  

In this state, the dhcp release of the fixed IP assigned to the instance isn't called and the fixed_ip tables maintains that the IP is still assigned to a now terminated instance.  Since the fixed IP isn't cleared out, the floating IP in the floating_ips table is never cleared out, either.  

Eventually, and for reasons I still do not understand, automatic assignment of a floating IP will not take place on ANY NEWLY CREATE VM.

But!  And here it gets confusing - if I assign a floating IP manually, *2* floating IPs will get assigned to an instance, immediately.
Comment 3 Mark McLoughlin 2012-12-07 04:45:36 EST
Sounds like a nasty bug and something that's reasonably likely to be since fixed upstream.

Since it's so nasty, it's probably worth our while figuring out a reproducer on Essex and then confirming whether it's fixed in Folsom.