861508 – terminating instances that have failed to launch leaves fixed_ips, floating_ips, and instances tables in inconsistent state

Bug 861508 - terminating instances that have failed to launch leaves fixed_ips, floating_ips, and instances tables in inconsistent state

Summary: terminating instances that have failed to launch leaves fixed_ips, floating_i...

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	openstack-nova
Sub Component:
Version:	1.0 (Essex)
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	medium
Target Milestone:	ga
Target Release:	2.1
Assignee:	RHOS Maint
QA Contact:	Nir Magnezi
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2012-09-28 21:23 UTC by Dan Yocum
Modified:	2016-04-26 21:26 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2013-03-06 12:50:37 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Dan Yocum 2012-09-28 21:23:51 UTC

Description of problem:

When VMs fail to launch due to an error (whatever error that may be), "terminating" the instance sometimes fails to reset the fixed_ips, floating_ips, and instances table requiring manually updating those tables to restore the ability launch new VMs.

I'm using flatDHCP network manager.

Version-Release number of selected component (if applicable):

Essex

How reproducible:

too often to count

Steps to Reproduce:

I don't know of a reliable way to reproduce this bug.  I think that it's possible by causing a timeout to the qpid server particularly when it takes an abnormally long period of time to spawn the VM. Before you ask, I have increased the qpid_heartbeat to 60s.  

Then while the VM is purported "building" issue the "terminate" command and then the database gets into it's paradoxical state of "building" and "deleting" at the same time, from which it never leaves until I manually edit the instances, floating_ip, and fixed_ip tables.

  
Actual results:


Expected results:


Additional info:

Comment 2 Dan Yocum 2012-12-04 17:49:42 UTC

This bug has bit me again.  It's effects are cascading and cumulative.

Here are some more details:

Apparently, sometimes the terminated_at field is not updated with a valid datetime, leaving it NULL, however, deleted is set to 1, and deleted_at is updated with the datetime that the time the instance was deleted.  

In this state, the dhcp release of the fixed IP assigned to the instance isn't called and the fixed_ip tables maintains that the IP is still assigned to a now terminated instance.  Since the fixed IP isn't cleared out, the floating IP in the floating_ips table is never cleared out, either.  

Eventually, and for reasons I still do not understand, automatic assignment of a floating IP will not take place on ANY NEWLY CREATE VM.

But!  And here it gets confusing - if I assign a floating IP manually, *2* floating IPs will get assigned to an instance, immediately.

Comment 3 Mark McLoughlin 2012-12-07 09:45:36 UTC

Sounds like a nasty bug and something that's reasonably likely to be since fixed upstream.

Since it's so nasty, it's probably worth our while figuring out a reproducer on Essex and then confirming whether it's fixed in Folsom.

Note You need to log in before you can comment on or make changes to this bug.