Bug 861508

Summary:	terminating instances that have failed to launch leaves fixed_ips, floating_ips, and instances tables in inconsistent state
Product:	Red Hat OpenStack	Reporter:	Dan Yocum <dyocum>
Component:	openstack-nova	Assignee:	RHOS Maint <rhos-maint>
Status:	CLOSED WONTFIX	QA Contact:	Nir Magnezi <nmagnezi>
Severity:	medium	Docs Contact:
Priority:	high
Version:	1.0 (Essex)	CC:	markmc
Target Milestone:	ga
Target Release:	2.1
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2013-03-06 12:50:37 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Dan Yocum 2012-09-28 21:23:51 UTC

Description of problem:

When VMs fail to launch due to an error (whatever error that may be), "terminating" the instance sometimes fails to reset the fixed_ips, floating_ips, and instances table requiring manually updating those tables to restore the ability launch new VMs.

I'm using flatDHCP network manager.

Version-Release number of selected component (if applicable):

Essex

How reproducible:

too often to count

Steps to Reproduce:

I don't know of a reliable way to reproduce this bug.  I think that it's possible by causing a timeout to the qpid server particularly when it takes an abnormally long period of time to spawn the VM. Before you ask, I have increased the qpid_heartbeat to 60s.  

Then while the VM is purported "building" issue the "terminate" command and then the database gets into it's paradoxical state of "building" and "deleting" at the same time, from which it never leaves until I manually edit the instances, floating_ip, and fixed_ip tables.

  
Actual results:


Expected results:


Additional info:

Comment 2 Dan Yocum 2012-12-04 17:49:42 UTC

This bug has bit me again.  It's effects are cascading and cumulative.

Here are some more details:

Apparently, sometimes the terminated_at field is not updated with a valid datetime, leaving it NULL, however, deleted is set to 1, and deleted_at is updated with the datetime that the time the instance was deleted.  

In this state, the dhcp release of the fixed IP assigned to the instance isn't called and the fixed_ip tables maintains that the IP is still assigned to a now terminated instance.  Since the fixed IP isn't cleared out, the floating IP in the floating_ips table is never cleared out, either.  

Eventually, and for reasons I still do not understand, automatic assignment of a floating IP will not take place on ANY NEWLY CREATE VM.

But!  And here it gets confusing - if I assign a floating IP manually, *2* floating IPs will get assigned to an instance, immediately.

Comment 3 Mark McLoughlin 2012-12-07 09:45:36 UTC

Sounds like a nasty bug and something that's reasonably likely to be since fixed upstream.

Since it's so nasty, it's probably worth our while figuring out a reproducer on Essex and then confirming whether it's fixed in Folsom.