Bug 1083620 - nova network disassociate and nova delete leads to VM in ERROR state
Summary: nova network disassociate and nova delete leads to VM in ERROR state
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-nova
Version: 4.0
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: 6.0 (Juno)
Assignee: Brent Eagles
QA Contact: Jaroslav Henner
URL:
Whiteboard:
Depends On:
Blocks: 1076100
TreeView+ depends on / blocked
 
Reported: 2014-04-02 15:11 UTC by Jaroslav Henner
Modified: 2019-09-09 16:17 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-04-15 16:15:01 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
network.log.bz2 (93.83 KB, application/x-bzip2)
2014-04-02 15:11 UTC, Jaroslav Henner
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Launchpad 967832 0 None None None Never
Launchpad 1288230 0 None None None Never
Launchpad 1351315 0 None None None Never
Red Hat Bugzilla 1072938 0 medium CLOSED A project shouldn't be deleted when there are instances running 2021-02-22 00:41:40 UTC

Internal Links: 1072938

Description Jaroslav Henner 2014-04-02 15:11:22 UTC
Created attachment 881852 [details]
network.log.bz2

Description of problem:

There are several issues with the OpenStack entities garbage collection:

1) IMHO `keystone tenant-delete` should broadcast a message to all components that tenant is being deleted, so all components could delete all related data.
2) If not 1) then `nova scrub` should remove all the data related to the tenant. Currently it does (checking the https://github.com/openstack/nova/blob/master/nova/cmd/manage.py) equivalent of

nova network-disassociate
nova secgroup-delete 

IMHO It should also be deleting the keys, floating ips, vms, neutron routers and networks ... which it probably doesn't.

nova network-dissacociate of network with a VM that has a port on it leads to VM that is in ERROR state, ERRORS and tracebacks in logs and the VM seems to persist -- it cannot be deleted. 

3) `nova scrub` should not die on deleting the default secgroups.


That is using libvirt driver VlanManager.


Version-Release number of selected component (if applicable):
openstack-nova-api.noarch                    2013.2.2-2.el6ost            @puddle
openstack-nova-cert.noarch                   2013.2.2-2.el6ost            @puddle
openstack-nova-common.noarch                 2013.2.2-2.el6ost            @puddle
openstack-nova-compute.noarch                2013.2.2-2.el6ost            @puddle
openstack-nova-conductor.noarch              2013.2.2-2.el6ost            @puddle
openstack-nova-console.noarch                2013.2.2-2.el6ost            @puddle
openstack-nova-network.noarch                2013.2.2-2.el6ost            @puddle
openstack-nova-novncproxy.noarch             2013.2.2-2.el6ost            @puddle
openstack-nova-objectstore.noarch            2013.2.2-2.el6ost            @puddle
openstack-nova-scheduler.noarch              2013.2.2-2.el6ost            @puddle


How reproducible:
always

Steps to Reproduce:
# openstack-config --set /etc/nova/nova.conf DEFAULT network_manager nova.network.manager.VlanManager
# openstack-config --set /etc/nova/nova.conf DEFAULT  vlan_interface eth0
# /etc/init.d/openstack-nova-network restart

# # Delete the network without a VlanID set (our preparation script is creting it)
# nova-manage network delete 192.168.32.0/22
2014-04-02 14:02:47.319 29148 INFO nova.network.driver [-] Loading network driver 'nova.network.linux_net'

# # Create a similar one, but with the VlanID set:
# nova-manage network create foo 192.168.32.0/22
# nova-manage network list
id   	IPv4              	IPv6           	start address  	DNS1           	DNS2           	VlanID         	project        	uuid           
2    	192.168.32.0/24   	None           	192.168.32.3   	8.8.4.4        	None           	100            	None           	42cb9bf3-6e7c-43ad-be2f-5a8d112

# # Create a tenant:
keystone tenant-create --name foo
keystone user-role-add --user admin --role Member --tenant foo
OS_TENANT_NAME=foo nova boot --image cirros-0.3.1-x86_64-uec --flavor m1.tiny foo_vm
...

nova list --all-tenants
+--------------------------------------+--------+--------+------------+-------------+------------------+
| ID                                   | Name   | Status | Task State | Power State | Networks         |
+--------------------------------------+--------+--------+------------+-------------+------------------+
| afdc97c2-5c07-443c-ba69-39fef99981d3 | foo_vm | ACTIVE | -          | Running     | foo=192.168.32.3 |
+--------------------------------------+--------+--------+------------+-------------+------------------+

# # The network gets auto-associated:
# nova-manage network list | head
id   	IPv4              	IPv6           	start address  	DNS1           	DNS2           	VlanID         	project        	uuid           
2    	192.168.32.0/24   	None           	192.168.32.3   	8.8.4.4        	None           	100            	e2b0caea396b4e94ba532c1979bade9a	42cb9bf3-6e7c-43ad-be2f-5a8d112c56bb

# # Try deleting the related entities:
# nova scrub e2b0caea396b4e94ba532c1979bade9a
ERROR: Unable to delete system group 'default' (HTTP 400) (Request-ID: req-a41b7f5d-7d7e-4cfa-b165-d964ef42a443)
# nova scrub foo
# nova scrub e2b0caea396b4e94ba532c1979bade9a
ERROR: Unable to delete system group 'default' (HTTP 400) (Request-ID: req-72d6b346-21f2-427c-afe7-947826ba0707)



# # Now when I delete the tenant foo...
# keystone tenant-delete foo
# nova list --all-tenants
+--------------------------------------+--------+--------+------------+-------------+------------------+
| ID                                   | Name   | Status | Task State | Power State | Networks         |
+--------------------------------------+--------+--------+------------+-------------+------------------+
| afdc97c2-5c07-443c-ba69-39fef99981d3 | foo_vm | ACTIVE | -          | Running     | foo=192.168.32.3 |
+--------------------------------------+--------+--------+------------+-------------+------------------+

# # The VM is still there:
# nova delete afdc97c2-5c07-443c-ba69-39fef99981d3
# nova list --all-tenants
+--------------------------------------+--------+--------+------------+-------------+------------------+
| ID                                   | Name   | Status | Task State | Power State | Networks         |
+--------------------------------------+--------+--------+------------+-------------+------------------+
| afdc97c2-5c07-443c-ba69-39fef99981d3 | foo_vm | ACTIVE | deleting   | Running     | foo=192.168.32.3 |
+--------------------------------------+--------+--------+------------+-------------+------------------+

And after a while it fails to delete it:

[root@jhenner-node-permanent ~(keystone_admin)]# nova list --all-tenants
+--------------------------------------+--------+--------+------------+-------------+------------------+
| ID                                   | Name   | Status | Task State | Power State | Networks         |
+--------------------------------------+--------+--------+------------+-------------+------------------+
| afdc97c2-5c07-443c-ba69-39fef99981d3 | foo_vm | ACTIVE | deleting   | Running     | foo=192.168.32.3 |
+--------------------------------------+--------+--------+------------+-------------+------------------+

# nova list --all-tenants
+--------------------------------------+--------+--------+------------+-------------+------------------+
| ID                                   | Name   | Status | Task State | Power State | Networks         |
+--------------------------------------+--------+--------+------------+-------------+------------------+
| afdc97c2-5c07-443c-ba69-39fef99981d3 | foo_vm | ERROR  | -          | Running     | foo=192.168.32.3 |
+--------------------------------------+--------+--------+------------+-------------+------------------+



Actual results:
ERRORs and TRACEbacks in log

Expected results:
all cleaned up after nova scrub finishes

Additional info:

Comment 1 Jaroslav Henner 2014-04-22 14:32:48 UTC
I believe this was discovered on RHOS 4.0, so setting the field accordingly.

Comment 2 Brent Eagles 2014-05-23 20:20:40 UTC
This BZ would be better served split into multiple parts. Addressing the issues:

1. There is an upstream endeavor on this. As you might imagine, having the components coordinate reliability on a keystone action is complicated, must consider transient communication issues and timeouts and essentially behave as a system wide transaction with the tenant being the final element actually removed. In other words, this isn't going to be resolved any time soon.

2. I think it is outside of nova's scope to call the other services that might have tenant info to "scrub". Having each component have an API that can be used to recover from an out-of-order operation like deleting a tenant without cleaning up the other services is more within reach. Once that is achieved than orchestration or admin tools can do that particular work.

3. Deleting the default security group isn't really allowed. Either the API needs to filter that group out or the implementation needs to allow it. However, it cannot really be deleted until all of the nova elements that might need it on behalf of that tenant go away - or at least depend on the static nature of a default security group ID relative to that tenant. 

I recommend reducing the scope of this bz to 1. as it has the relevant trackers in place. 2. is in the realm of 1. in-so-far that it is likely out of nova's scope and the problems and complexities of 1. would apply in 2. in any case - putting it out of reach for the forseeable future. 3. is arguably an independent bug and has a fairly well defined scope and should be filed separately, preferably upstream. If this this is acceptable to you, please reply to the NEEDINFO and I will make the changes.


Note You need to log in before you can comment on or make changes to this bug.