Description of problem: The bug is happening in our case with a Ceph cluster as storage backend for OpenStack. If someone is trying to delete a virtual machine with attached volumes while the sorage backend is overwhelmed or not responding, the vm is deleted but the associated volumes are remaining as "Attached to None". Once you are in this state, you can't delete them from Cinder either in CLI or from Horizon and they are still hangin in the Ceph cluster. Version-Release number of selected component (if applicable): ceph-common-0.80.5-4.el7ost.x86_64 openstack-cinder-2014.2.1-3.el7ost.noarch python-ceph-0.80.5-4.el7ost.x86_64 python-cinder-2014.2.1-3.el7ost.noarch python-cinderclient-1.1.1-1.el7ost.noarch How reproducible: don't know Steps to Reproduce: trying to delete a virtual machine with attached volumes while the sorage backend is overwhelmed or not responding, the vm is deleted but the associated volumes are remaining as "Attached to None". Actual results: volumes are remaining as "Attached to None". Expected results: volumes not attached Additional info:
It's really hard to say what exactly happened, Cinder's log is configured with severity INFO so it doesn't contain any DEBUG messages. According to the SOS report it seems like cinder-volume was off for 3 days between Feb 24 and Feb 27, or at least it didn't write any logs: 2015-02-24 20:39:07.741 48436 INFO cinder.openstack.common.service [-] Child 48521 killed by signal 15 2015-02-27 14:05:50.915 32618 INFO cinder.openstack.common.service [-] Starting 1 workers If I take the first volume that appears as "Attached to None on /dev/vda" in Horizon (RHELOSP_CASE_STALLED_VOLUMES_2.jpg), id = d24a4b61-370b-4f06-8b6a-203e9b00cbd9 and go to search for this volume in the log, I see the following: At 2015-02-26 15:27:58.076 it still was attached to the instance 8a32e673-9400-4073-81c3-947490a3a82e on /dev/vda 15 seconds later there is an attempt to delete this volume: 2015-02-26 15:28:10.866 32904 INFO cinder.api.v1.volumes [req-9c4e9501-9f43-410e-89c6-8eeb25331e8f 662d11847ec3498a955204fa8d843130 fdaf2abe1ed841748768c31ab4595650 - - -] Delete volume with id: d24a4b61-370b-4f06-8b6a-203e9b00cbd9 It fails: 2015-02-26 15:28:10.933 32904 INFO cinder.api.openstack.wsgi [req-9c4e9501-9f43-410e-89c6-8eeb25331e8f 662d11847ec3498a955204fa8d843130 fdaf2abe1ed841748768c31ab4595650 - - -] http://10.64.111.97:8776/v1/fdaf2abe1ed841748768c31ab4595650/volumes/d24a4b61-370b-4f06-8b6a-203e9b00cbd9 returned with HTTP 400 So the volume still appears to be attached to the same instance: 2015-02-26 15:28:30.379 32906 INFO cinder.api.v1.volumes [req-4a2b92bc-b5e0-4492-ba99-e4180d1e50f6 662d11847ec3498a955204fa8d843130 fdaf2abe1ed841748768c31ab4595650 - - -] vol={'migration_status': None, 'availability_zone': u'nova', 'terminated_at': None, 'updated_at': datetime.datetime(2015, 2, 26, 15, 14, 23), 'provider_geometry': None, 'replication_extended_status': None, 'replication_status': u'disabled', 'snapshot_id': None, 'ec2_id': None, 'mountpoint': u'/dev/vda', 'deleted_at': None, 'id': u'd24a4b61-370b-4f06-8b6a-203e9b00cbd9', 'size': 20L, 'user_id': u'dc82521a45f4428aaf2b0bb5b431b08c', 'attach_time': u'2015-02-26T14:48:34.553852', 'attached_host': None, 'display_description': u'', 'volume_admin_metadata': u'[<cinder.db.sqlalchemy.models.VolumeAdminMetadata object at 0x5025c10>, <cinder.db.sqlalchemy.models.VolumeAdminMetadata object at 0x50253d0>]', 'project_id': u'4a3ea3b05da24a2f94b5882fc539ccab', 'launched_at': datetime.datetime(2015, 2, 26, 14, 47, 40), 'scheduled_at': datetime.datetime(2015, 2, 26, 14, 44, 31), 'status': u'available', 'volume_type_id': None, 'deleted': False, 'provider_location': None, 'host': u'ha-controller#DEFAULT', 'consistencygroup_id': None, 'source_volid': None, 'provider_auth': None, 'display_name': u'', 'instance_uuid': u'8a32e673-9400-4073-81c3-947490a3a82e', 'bootable': True, 'created_at': datetime.datetime(2015, 2, 26, 14, 44, 32), 'attach_status': u'attached', 'volume_type': None, 'consistencygroup': None, 'volume_metadata': [], '_name_id': None, 'encryption_key_id': None, 'replication_driver_data': None, 'metadata': {u'readonly': u'False', u'attached_mode': u'rw'}} But then 3 minutes later, there is another action happens (unfortunately, it's hard to say which action it was without a debug mode): 2015-02-26 15:31:27.235 32903 INFO cinder.api.openstack.wsgi [req-2696e830-fbc0-4448-ad0d-d01f840a88dd dc82521a45f4428aaf2b0bb5b431b08c 4a3ea3b05da24a2f94b5882fc539ccab - - -] POST http://10.64.111.97:8776/v1/4a3ea3b05da24a2f94b5882fc539ccab/volumes/d24a4b61-370b-4f06-8b6a-203e9b00cbd9/action 2015-02-26 15:31:27.606 32903 INFO cinder.api.openstack.wsgi [req-2696e830-fbc0-4448-ad0d-d01f840a88dd dc82521a45f4428aaf2b0bb5b431b08c 4a3ea3b05da24a2f94b5882fc539ccab - - -] http://10.64.111.97:8776/v1/4a3ea3b05da24a2f94b5882fc539ccab/volumes/d24a4b61-370b-4f06-8b6a-203e9b00cbd9/action returned with HTTP 202 2015-02-26 15:31:27.608 32903 INFO eventlet.wsgi.server [req-2696e830-fbc0-4448-ad0d-d01f840a88dd dc82521a45f4428aaf2b0bb5b431b08c 4a3ea3b05da24a2f94b5882fc539ccab - - -] 10.64.111.11 - - [26/Feb/2015 15:31:27] "POST /v1/4a3ea3b05da24a2f94b5882fc539ccab/volumes/d24a4b61-370b-4f06-8b6a-203e9b00cbd9/action HTTP/1.1" 202 187 0.376937 And finally volume appears as detached - without mountpoint and without an instance id: 2015-02-26 15:31:31.940 32899 INFO cinder.api.v1.volumes [req-b6b290a9-08de-45d7-9196-7613776e70a5 dc82521a45f4428aaf2b0bb5b431b08c 4a3ea3b05da24a2f94b5882fc539ccab - - -] vol={'migration_status': None, 'availability_zone': u'nova', 'terminated_at': None, 'updated_at': datetime.datetime(2015, 2, 26, 15, 31, 28), 'provider_geometry': None, 'replication_extended_status': None, 'replication_status': u'disabled', 'snapshot_id': None, 'ec2_id': None, 'mountpoint': None, 'deleted_at': None, 'id': u'd24a4b61-370b-4f06-8b6a-203e9b00cbd9', 'size': 20L, 'user_id': u'dc82521a45f4428aaf2b0bb5b431b08c', 'attach_time': None, 'attached_host': None, 'display_description': u'', 'volume_admin_metadata': u'[<cinder.db.sqlalchemy.models.VolumeAdminMetadata object at 0x5979a10>]', 'project_id': u'4a3ea3b05da24a2f94b5882fc539ccab', 'launched_at': datetime.datetime(2015, 2, 26, 14, 47, 40), 'scheduled_at': datetime.datetime(2015, 2, 26, 14, 44, 31), 'status': u'available', 'volume_type_id': None, 'deleted': False, 'provider_location': None, 'host': u'ha-controller#DEFAULT', 'consistencygroup_id': None, 'source_volid': None, 'provider_auth': None, 'display_name': u'', 'instance_uuid': None, 'bootable': True, 'created_at': datetime.datetime(2015, 2, 26, 14, 44, 32), 'attach_status': u'detached', 'volume_type': None, 'consistencygroup': None, 'volume_metadata': [], '_name_id': None, 'encryption_key_id': None, 'replication_driver_data': None, 'metadata': {u'readonly': u'False'}} I still can't find when Nova actually deleted an instance, but it seems like at some point it was deleted: 2015-02-26 15:27:02.912 3824 INFO nova.api.openstack.wsgi [req-41d7ad11-d46c-41bf-b96f-5c816c5eacf7 None] HTTP exception thrown: Instance could not be found 2015-02-26 15:27:02.914 3824 INFO nova.osapi_compute.wsgi.server [req-41d7ad11-d46c-41bf-b96f-5c816c5eacf7 None] 10.64.111.12 "GET /v2/fdaf2abe1ed841748768c31ab4595650/servers/8a32e673-9400-4073-81c3-947490a3a82e HTTP/1.1" status: 404 len: 267 time: 0.0935590 So, I guess it one of these flows when Nova deletes an instance even if Cinder reports back an error to detach a volume.
*** Bug 1226064 has been marked as a duplicate of this bug. ***
In Liberty some work is being done in Cinder to improve the API interactions between Cinder and Nova. It is possible that that work will improve the error recovery in this case, but I don't have a lot of specifics I can provide there for now.
Russ, Thanks for the update. Let me summarize: "attach to None" is the result, not the root cause. Cinder folks are working to introduce "force-detach" in Mitaka to allow volumes to get back to normal after that unfortunate result. Till then, as a workaround, you have to manually update volume metadata in the Cinder DB. If this problem happens too often then we need to get Cinder and Nova logs to understand an fix the root cause on that specific environment. Please, reopen that case if happens again on the customer env.
Tested using: openstack-cinder-2014.2.4-11.el7ost.noarch python-cinderclient-1.1.1-4.el7ost.noarch python-cinder-2014.2.4-11.el7ost.noarch Verification flow: [root@panther13 ~(keystone_admin)]# glance image-create --disk-format iso --container-format bare --name cirros --file dsl-4.4.10.iso +------------------+--------------------------------------+ | Property | Value | +------------------+--------------------------------------+ | checksum | 5cb7e0d4506c249b78bbe0cd4695b865 | | container_format | bare | | created_at | 2016-12-12T08:05:49 | | deleted | False | | deleted_at | None | | disk_format | iso | | id | ca0e6d96-9dd3-48e7-9b89-7596fb4abefc | | is_public | False | | min_disk | 0 | | min_ram | 0 | | name | cirros | | owner | a9c6433306c24883a2f89e43fca9c448 | | protected | False | | size | 52328448 | | status | active | | updated_at | 2016-12-12T08:05:49 | | virtual_size | None | +------------------+--------------------------------------+ [root@panther13 ~(keystone_admin)]# glance image-list +--------------------------------------+---------+-------------+------------------+------------+--------+ | ID | Name | Disk Format | Container Format | Size | Status | +--------------------------------------+---------+-------------+------------------+------------+--------+ | ca0e6d96-9dd3-48e7-9b89-7596fb4abefc | cirros | iso | bare | 52328448 | active | +--------------------------------------+---------+-------------+------------------+------------+--------+ [root@panther13 ~(keystone_admin)]# nova boot --image ca0e6d96-9dd3-48e7-9b89-7596fb4abefc --flavor 1 --nic net-id=3325c72f-a537-4a3f-8952-60bd38ef00e6 vm +--------------------------------------+-----------------------------------------------+ | Property | Value | +--------------------------------------+-----------------------------------------------+ | OS-DCF:diskConfig | MANUAL | | OS-EXT-AZ:availability_zone | | | OS-EXT-SRV-ATTR:host | - | | OS-EXT-SRV-ATTR:hypervisor_hostname | - | | OS-EXT-SRV-ATTR:instance_name | instance-00000004 | | OS-EXT-STS:power_state | 0 | | OS-EXT-STS:task_state | scheduling | | OS-EXT-STS:vm_state | building | | OS-SRV-USG:launched_at | - | | OS-SRV-USG:terminated_at | - | | accessIPv4 | | | accessIPv6 | | | adminPass | QBmK3UfPNAMZ | | config_drive | | | created | 2016-12-12T08:07:33Z | | flavor | m1.tiny (1) | | hostId | | | id | ff64ad78-217c-4b9a-ba90-6f2bf6fea400 | | image | cirros (ca0e6d96-9dd3-48e7-9b89-7596fb4abefc) | | key_name | - | | metadata | {} | | name | vm | | os-extended-volumes:volumes_attached | [] | | progress | 0 | | security_groups | default | | status | BUILD | | tenant_id | a9c6433306c24883a2f89e43fca9c448 | | updated | 2016-12-12T08:07:33Z | | user_id | 7db381d6683340b996aec64672330458 | +--------------------------------------+-----------------------------------------------+ [root@panther13 ~(keystone_admin)]# nova list +--------------------------------------+-------+---------+------------+-------------+---------------------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+-------+---------+------------+-------------+---------------------+ | ff64ad78-217c-4b9a-ba90-6f2bf6fea400 | vm | ACTIVE | - | Running | private=10.0.0.3 | +--------------------------------------+-------+---------+------------+-------------+---------------------+ [root@panther13 ~(keystone_admin)]# cinder create 1 +---------------------+--------------------------------------+ | Property | Value | +---------------------+--------------------------------------+ | attachments | [] | | availability_zone | nova | | bootable | false | | created_at | 2016-12-12T08:07:58.267074 | | display_description | None | | display_name | None | | encrypted | False | | id | 02ff9e50-e32c-4f63-9dcc-e440d8a0b87e | | metadata | {} | | size | 1 | | snapshot_id | None | | source_volid | None | | status | creating | | volume_type | None | +---------------------+--------------------------------------+ [root@panther13 ~(keystone_admin)]# cinder list +--------------------------------------+-----------+--------------+------+-------------+----------+-------------+ | ID | Status | Display Name | Size | Volume Type | Bootable | Attached to | +--------------------------------------+-----------+--------------+------+-------------+----------+-------------+ | 02ff9e50-e32c-4f63-9dcc-e440d8a0b87e | available | None | 1 | None | false | | +--------------------------------------+-----------+--------------+------+-------------+----------+-------------+ [root@panther13 ~(keystone_admin)]# nova list +--------------------------------------+-------+---------+------------+-------------+---------------------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+-------+---------+------------+-------------+---------------------+ | ff64ad78-217c-4b9a-ba90-6f2bf6fea400 | vm | SHUTOFF | - | Shutdown | private=10.0.0.3 | +--------------------------------------+-------+---------+------------+-------------+---------------------+ [root@panther13 ~(keystone_admin)]# nova volume-attach ff64ad78-217c-4b9a-ba90-6f2bf6fea400 02ff9e50-e32c-4f63-9dcc-e440d8a0b87e /dev/vdb +----------+--------------------------------------+ | Property | Value | +----------+--------------------------------------+ | device | /dev/hdb | | id | 02ff9e50-e32c-4f63-9dcc-e440d8a0b87e | | serverId | ff64ad78-217c-4b9a-ba90-6f2bf6fea400 | | volumeId | 02ff9e50-e32c-4f63-9dcc-e440d8a0b87e | +----------+--------------------------------------+ [root@panther13 ~(keystone_admin)]# cinder list +--------------------------------------+--------+--------------+------+-------------+----------+--------------------------------------+ | ID | Status | Display Name | Size | Volume Type | Bootable | Attached to | +--------------------------------------+--------+--------------+------+-------------+----------+--------------------------------------+ | 02ff9e50-e32c-4f63-9dcc-e440d8a0b87e | in-use | None | 1 | None | false | ff64ad78-217c-4b9a-ba90-6f2bf6fea400 | +--------------------------------------+--------+--------------+------+-------------+----------+--------------------------------------+ [root@panther13 ~(keystone_admin)]# nova list +--------------------------------------+-------+---------+------------+-------------+---------------------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+-------+---------+------------+-------------+---------------------+ | ff64ad78-217c-4b9a-ba90-6f2bf6fea400 | vm | ACTIVE | - | Running | private=10.0.0.3 | +--------------------------------------+-------+---------+------------+-------------+---------------------+ [root@panther13 ~(keystone_admin)]# nova volume-detach ff64ad78-217c-4b9a-ba90-6f2bf6fea400 02ff9e50-e32c-4f63-9dcc-e440d8a0b87e [root@panther13 ~(keystone_admin)]# nova delete ff64ad78-217c-4b9a-ba90-6f2bf6fea400 Request to delete server ff64ad78-217c-4b9a-ba90-6f2bf6fea400 has been accepted. [root@panther13 ~(keystone_admin)]# nova list +--------------------------------------+-------+---------+------------+-------------+---------------------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+-------+---------+------------+-------------+---------------------+ +--------------------------------------+-------+---------+------------+-------------+---------------------+ [root@panther13 ~(keystone_admin)]# cinder list +--------------------------------------+-----------+--------------+------+-------------+----------+-------------+ | ID | Status | Display Name | Size | Volume Type | Bootable | Attached to | +--------------------------------------+-----------+--------------+------+-------------+----------+-------------+ | 02ff9e50-e32c-4f63-9dcc-e440d8a0b87e | available | None | 1 | None | false | | +--------------------------------------+-----------+--------------+------+-------------+----------+-------------+
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2017-0156.html