Created attachment 881852 [details] network.log.bz2 Description of problem: There are several issues with the OpenStack entities garbage collection: 1) IMHO `keystone tenant-delete` should broadcast a message to all components that tenant is being deleted, so all components could delete all related data. 2) If not 1) then `nova scrub` should remove all the data related to the tenant. Currently it does (checking the https://github.com/openstack/nova/blob/master/nova/cmd/manage.py) equivalent of nova network-disassociate nova secgroup-delete IMHO It should also be deleting the keys, floating ips, vms, neutron routers and networks ... which it probably doesn't. nova network-dissacociate of network with a VM that has a port on it leads to VM that is in ERROR state, ERRORS and tracebacks in logs and the VM seems to persist -- it cannot be deleted. 3) `nova scrub` should not die on deleting the default secgroups. That is using libvirt driver VlanManager. Version-Release number of selected component (if applicable): openstack-nova-api.noarch 2013.2.2-2.el6ost @puddle openstack-nova-cert.noarch 2013.2.2-2.el6ost @puddle openstack-nova-common.noarch 2013.2.2-2.el6ost @puddle openstack-nova-compute.noarch 2013.2.2-2.el6ost @puddle openstack-nova-conductor.noarch 2013.2.2-2.el6ost @puddle openstack-nova-console.noarch 2013.2.2-2.el6ost @puddle openstack-nova-network.noarch 2013.2.2-2.el6ost @puddle openstack-nova-novncproxy.noarch 2013.2.2-2.el6ost @puddle openstack-nova-objectstore.noarch 2013.2.2-2.el6ost @puddle openstack-nova-scheduler.noarch 2013.2.2-2.el6ost @puddle How reproducible: always Steps to Reproduce: # openstack-config --set /etc/nova/nova.conf DEFAULT network_manager nova.network.manager.VlanManager # openstack-config --set /etc/nova/nova.conf DEFAULT vlan_interface eth0 # /etc/init.d/openstack-nova-network restart # # Delete the network without a VlanID set (our preparation script is creting it) # nova-manage network delete 192.168.32.0/22 2014-04-02 14:02:47.319 29148 INFO nova.network.driver [-] Loading network driver 'nova.network.linux_net' # # Create a similar one, but with the VlanID set: # nova-manage network create foo 192.168.32.0/22 # nova-manage network list id IPv4 IPv6 start address DNS1 DNS2 VlanID project uuid 2 192.168.32.0/24 None 192.168.32.3 8.8.4.4 None 100 None 42cb9bf3-6e7c-43ad-be2f-5a8d112 # # Create a tenant: keystone tenant-create --name foo keystone user-role-add --user admin --role Member --tenant foo OS_TENANT_NAME=foo nova boot --image cirros-0.3.1-x86_64-uec --flavor m1.tiny foo_vm ... nova list --all-tenants +--------------------------------------+--------+--------+------------+-------------+------------------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+--------+--------+------------+-------------+------------------+ | afdc97c2-5c07-443c-ba69-39fef99981d3 | foo_vm | ACTIVE | - | Running | foo=192.168.32.3 | +--------------------------------------+--------+--------+------------+-------------+------------------+ # # The network gets auto-associated: # nova-manage network list | head id IPv4 IPv6 start address DNS1 DNS2 VlanID project uuid 2 192.168.32.0/24 None 192.168.32.3 8.8.4.4 None 100 e2b0caea396b4e94ba532c1979bade9a 42cb9bf3-6e7c-43ad-be2f-5a8d112c56bb # # Try deleting the related entities: # nova scrub e2b0caea396b4e94ba532c1979bade9a ERROR: Unable to delete system group 'default' (HTTP 400) (Request-ID: req-a41b7f5d-7d7e-4cfa-b165-d964ef42a443) # nova scrub foo # nova scrub e2b0caea396b4e94ba532c1979bade9a ERROR: Unable to delete system group 'default' (HTTP 400) (Request-ID: req-72d6b346-21f2-427c-afe7-947826ba0707) # # Now when I delete the tenant foo... # keystone tenant-delete foo # nova list --all-tenants +--------------------------------------+--------+--------+------------+-------------+------------------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+--------+--------+------------+-------------+------------------+ | afdc97c2-5c07-443c-ba69-39fef99981d3 | foo_vm | ACTIVE | - | Running | foo=192.168.32.3 | +--------------------------------------+--------+--------+------------+-------------+------------------+ # # The VM is still there: # nova delete afdc97c2-5c07-443c-ba69-39fef99981d3 # nova list --all-tenants +--------------------------------------+--------+--------+------------+-------------+------------------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+--------+--------+------------+-------------+------------------+ | afdc97c2-5c07-443c-ba69-39fef99981d3 | foo_vm | ACTIVE | deleting | Running | foo=192.168.32.3 | +--------------------------------------+--------+--------+------------+-------------+------------------+ And after a while it fails to delete it: [root@jhenner-node-permanent ~(keystone_admin)]# nova list --all-tenants +--------------------------------------+--------+--------+------------+-------------+------------------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+--------+--------+------------+-------------+------------------+ | afdc97c2-5c07-443c-ba69-39fef99981d3 | foo_vm | ACTIVE | deleting | Running | foo=192.168.32.3 | +--------------------------------------+--------+--------+------------+-------------+------------------+ # nova list --all-tenants +--------------------------------------+--------+--------+------------+-------------+------------------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+--------+--------+------------+-------------+------------------+ | afdc97c2-5c07-443c-ba69-39fef99981d3 | foo_vm | ERROR | - | Running | foo=192.168.32.3 | +--------------------------------------+--------+--------+------------+-------------+------------------+ Actual results: ERRORs and TRACEbacks in log Expected results: all cleaned up after nova scrub finishes Additional info:
I believe this was discovered on RHOS 4.0, so setting the field accordingly.
This BZ would be better served split into multiple parts. Addressing the issues: 1. There is an upstream endeavor on this. As you might imagine, having the components coordinate reliability on a keystone action is complicated, must consider transient communication issues and timeouts and essentially behave as a system wide transaction with the tenant being the final element actually removed. In other words, this isn't going to be resolved any time soon. 2. I think it is outside of nova's scope to call the other services that might have tenant info to "scrub". Having each component have an API that can be used to recover from an out-of-order operation like deleting a tenant without cleaning up the other services is more within reach. Once that is achieved than orchestration or admin tools can do that particular work. 3. Deleting the default security group isn't really allowed. Either the API needs to filter that group out or the implementation needs to allow it. However, it cannot really be deleted until all of the nova elements that might need it on behalf of that tenant go away - or at least depend on the static nature of a default security group ID relative to that tenant. I recommend reducing the scope of this bz to 1. as it has the relevant trackers in place. 2. is in the realm of 1. in-so-far that it is likely out of nova's scope and the problems and complexities of 1. would apply in 2. in any case - putting it out of reach for the forseeable future. 3. is arguably an independent bug and has a fairly well defined scope and should be filed separately, preferably upstream. If this this is acceptable to you, please reply to the NEEDINFO and I will make the changes.