Description of problem: When VMs fail to launch due to an error (whatever error that may be), "terminating" the instance sometimes fails to reset the fixed_ips, floating_ips, and instances table requiring manually updating those tables to restore the ability launch new VMs. I'm using flatDHCP network manager. Version-Release number of selected component (if applicable): Essex How reproducible: too often to count Steps to Reproduce: I don't know of a reliable way to reproduce this bug. I think that it's possible by causing a timeout to the qpid server particularly when it takes an abnormally long period of time to spawn the VM. Before you ask, I have increased the qpid_heartbeat to 60s. Then while the VM is purported "building" issue the "terminate" command and then the database gets into it's paradoxical state of "building" and "deleting" at the same time, from which it never leaves until I manually edit the instances, floating_ip, and fixed_ip tables. Actual results: Expected results: Additional info:
This bug has bit me again. It's effects are cascading and cumulative. Here are some more details: Apparently, sometimes the terminated_at field is not updated with a valid datetime, leaving it NULL, however, deleted is set to 1, and deleted_at is updated with the datetime that the time the instance was deleted. In this state, the dhcp release of the fixed IP assigned to the instance isn't called and the fixed_ip tables maintains that the IP is still assigned to a now terminated instance. Since the fixed IP isn't cleared out, the floating IP in the floating_ips table is never cleared out, either. Eventually, and for reasons I still do not understand, automatic assignment of a floating IP will not take place on ANY NEWLY CREATE VM. But! And here it gets confusing - if I assign a floating IP manually, *2* floating IPs will get assigned to an instance, immediately.
Sounds like a nasty bug and something that's reasonably likely to be since fixed upstream. Since it's so nasty, it's probably worth our while figuring out a reproducer on Essex and then confirming whether it's fixed in Folsom.