Bug 1815594 - [OSP13] Cinder volume in reserved state. Why ?
Summary: [OSP13] Cinder volume in reserved state. Why ?
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: python-os-brick
Version: 13.0 (Queens)
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: ---
Assignee: Pablo Caruana
QA Contact: Tzach Shefi
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-03-20 16:33 UTC by ggrimaux
Modified: 2021-10-07 16:39 UTC (History)
18 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-11-23 10:40:42 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description ggrimaux 2020-03-20 16:33:34 UTC
Description of problem:
While deploying 8 instance with volumes one of the volume was stuck in 'reserved' state.

Client was able to clean it up.

Now they would like to know if we can figure out why it got stuck in that state.

Looking at the logs (sosreport) I couldn't find what was wrong.

volume_driver=cinder.volume.drivers.emc.vnx.driver.EMCVNXDriver
storage_protocol=iSCSI

More output in the next comment.

Version-Release number of selected component (if applicable):
openstack-cinder-12.0.7-5.el7ost.noarch

How reproducible:
Unknown, happened once.

Steps to Reproduce:
1.Unknown, happened once.
2.
3.

Actual results:
Volume was stuck in 'reserved' state

Expected results:
Volume be available like the other ones.

Additional info:
Have sosreports from the environment.

Comment 1 ggrimaux 2020-03-20 16:35:37 UTC
The volume state at the time:

(overcloud) [stack@ieatrheldir4a ~]$ openstack volume show 4ee80bab-342c-4d2f-a104-c94c847b0bc7
+--------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Field                          | Value                                                                                                                                                                                                                                                                |
+--------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| attachments                    | []                                                                                                                                                                                                                                                                   |
| availability_zone              | nova                                                                                                                                                                                                                                                                 |
| bootable                       | true                                                                                                                                                                                                                                                                 |
| consistencygroup_id            | None                                                                                                                                                                                                                                                                 |
| created_at                     | 2020-03-13T18:49:53.000000                                                                                                                                                                                                                                           |
| description                    | None                                                                                                                                                                                                                                                                 |
| encrypted                      | False                                                                                                                                                                                                                                                                |
| id                             | 4ee80bab-342c-4d2f-a104-c94c847b0bc7                                                                                                                                                                                                                                 |
| migration_status               | None                                                                                                                                                                                                                                                                 |
| multiattach                    | False                                                                                                                                                                                                                                                                |
| name                           | worker-app-plicdfn3-root-volume-ccd-c4a004                                                                                                                                                                                                                           |
| os-vol-host-attr:host          | hostgroup@ieatvnx-76#Cloud_4A_pool                                                                                                                                                                                                                                   |
| os-vol-mig-status-attr:migstat | None                                                                                                                                                                                                                                                                 |
| os-vol-mig-status-attr:name_id | None                                                                                                                                                                                                                                                                 |
| os-vol-tenant-attr:tenant_id   | b17375fff9724e5c812d9e9df63d19f9                                                                                                                                                                                                                                     |
| properties                     | snapcopy='False'                                                                                                                                                                                                                                                     |
| replication_status             | disabled                                                                                                                                                                                                                                                             |
| size                           | 40                                                                                                                                                                                                                                                                   |
| snapshot_id                    | None                                                                                                                                                                                                                                                                 |
| source_volid                   | None                                                                                                                                                                                                                                                                 |
| status                         | reserved                                                                                                                                                                                                                                                             |
| type                           | Thin                                                                                                                                                                                                                                                                 |
| updated_at                     | 2020-03-13T18:57:25.000000                                                                                                                                                                                                                                           |
| user_id                        | 80e47b6d05f74ca18c188e462f0d23d4                                                                                                                                                                                                                                     |
| volume_image_metadata          | {u'container_format': u'bare', u'min_ram': u'0', u'disk_format': u'qcow2', u'image_name': u'eccd-2.7.0-296-node', u'image_id': u'91c73a22-0310-4447-8bd5-43d53fbea14f', u'checksum': u'e547e35fba5611ced873b4e080f50277', u'min_disk': u'0', u'size': u'2016542720'} |
+--------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

The only "relevant" log I found about this volume:

2020-03-13 18:49:54.031 17 DEBUG cinder.volume.api [req-b0de07bf-2b74-419d-bc84-740fc37d93aa 80e47b6d05f74ca18c188e462f0d23d4 b17375fff9724e5c812d9e9df63d19f9 - default default] Task 'cinder.volume.flows.api.create_volume.EntryCreateTask;volume:create' (2604d07d-7f22-4ae1-b0b5-79f97cafa479) transitioned into state 'SUCCESS' from state 'RUNNING' with result '{'volume': Volume(_name_id=None,admin_metadata=<?>,attach_status='detached',availability_zone='nova',bootable=False,cluster=<?>,cluster_name=None,consistencygroup=<?>,consistencygroup_id=None,created_at=2020-03-13T18:49:53Z,deleted=False,deleted_at=None,display_description=None,display_name='worker-app-plicdfn3-root-volume-ccd-c4a004',ec2_id=None,encryption_key_id=None,glance_metadata=<?>,group=<?>,group_id=None,host=None,id=4ee80bab-342c-4d2f-a104-c94c847b0bc7,launched_at=None,metadata={},migration_status=None,multiattach=False,previous_status=None,project_id='b17375fff9724e5c812d9e9df63d19f9',provider_auth=None,provider_geometry=None,provider_id=None,provider_location=None,replication_driver_data=None,replication_extended_status=None,replication_status=None,scheduled_at=None,service_uuid=None,shared_targets=True,size=40,snapshot_id=None,snapshots=<?>,source_volid=None,status='creating',terminated_at=None,updated_at=None,user_id='80e47b6d05f74ca18c188e462f0d23d4',volume_attachment=<?>,volume_type=<?>,volume_type_id=2460c5a1-f6e7-4a52-96fb-ebddb9b72ab4), 'volume_properties': VolumeProperties(attach_status='detached',availability_zone='nova',cgsnapshot_id=None,consistencygroup_id=None,display_description=None,display_name='worker-app-plicdfn3-root-volume-ccd-c4a004',encryption_key_id=None,group_id=None,group_type_id=<?>,metadata={},multiattach=False,project_id='b17375fff9724e5c812d9e9df63d19f9',qos_specs=None,replication_status=<?>,reservations=['681cbf6b-37a7-44a3-a1c7-528ae3f2e643','a8a00e30-bc63-428f-b86e-ae3542a62787','0574cc4f-ba18-41e1-8c70-46e67629a5a4','70883a3c-68e4-4022-9d4e-3d80cd2f5a3e'],size=40,snapshot_id=None,source_replicaid=<?>,source_volid=None,status='creating',user_id='80e47b6d05f74ca18c188e462f0d23d4',volume_type_id=2460c5a1-f6e7-4a52-96fb-ebddb9b72ab4), 'volume_id': '4ee80bab-342c-4d2f-a104-c94c847b0bc7'}' _task_receiver /usr/lib/python2.7/site-packages/taskflow/listeners/logging.py:183

Comment 5 Ryan Liang 2020-03-24 01:53:36 UTC
Could you please upload the sosreport logs? I'll try to find more clues in nova compute and cinder volume logs.
Because lots of operations will mark the volume reserved.

Comment 6 ggrimaux 2020-03-31 13:05:15 UTC
@Ryan Liang is there any place I can drop the sosreport logs for you ?

I can't upload those publicly since they have PII info in them and if I put them private you can't see them. Also I have 20mb limit for files.

Please let me know if you have a safe place for me to upload them to.

Thank you

Comment 7 Ryan Liang 2020-04-01 01:46:04 UTC
Replied to you via email. Please check it.

Comment 8 ggrimaux 2020-04-01 17:58:44 UTC
Hi Ryan!

I uploaded the cinder and nova logs in a subfolder called BZ1815594.

Please confirm you have everything you need!

Thank you

Comment 9 ggrimaux 2020-04-08 08:21:29 UTC
Hi Ryan!

You found anything in the logs explaining the situation the client experienced ?

Please let us know!

Thank you.

Comment 10 Ryan Liang 2020-04-08 09:03:58 UTC
## 1. Volume 4ee80bab-342c-4d2f-a104-c94c847b0bc7 was created successfully.

Log file: sosreport-cloud4a-controller-1-2020-03-16-ieynuii/var/log/containers/cinder/cinder-volume.log.9.gz
Request ID: req-b0de07bf-2b74-419d-bc84-740fc37d93aa

122766 2020-03-13 18:57:04.214 54 INFO cinder.volume.manager [req-b0de07bf-2b74-419d-bc84-740fc37d93aa 80e47b6d05f74ca18c188e462f0d23d4        b17375fff9724e5c812d9e9df63d19f9 - default default] Created volume successfully.

## 2. Nova sent requests to Cinder to create an attachment for volume 4ee80bab-342c-4d2f-a104-c94c847b0bc7 on a VM.
NOTE: The volume was set to `reserved` in this step.

Attachment ID: f0e07a87-9e31-45bb-aa6b-40368b1117aa
Cinder Request ID: req-1805257c-c6c8-4986-afd2-1f6c4b72b218

./sosreport-cloud4a-controller-0-2020-03-16-bmhcizu/var/log/containers/nova/nova-api.log.33.gz:36778:2020-03-13 18:57:25.062 24 DEBUG cinderclient.v3.client [req-7ab340f9-f0a4-4f41-9ed1-2666c38bab6b 80e47b6d05f74ca18c188e462f0d23d4 b17375fff9724e5c812d9e9df63d19f9 - default default] REQ: curl -g -i -X POST http://10.148.31.138:8776/v3/b17375fff9724e5c812d9e9df63d19f9/attachments -H "X-Service-Token: {SHA1}8bd3abe5cdaf9716881233547e0b8c1576e4f615" -H "Accept: application/json" -H "User-Agent: python-cinderclient" -H "OpenStack-API-Version: volume 3.44" -H "X-Auth-Token: {SHA1}4c1902de141cfa4b9f011235ddcac33f05ec0a09" -H "Content-Type: application/json" -H "X-OpenStack-Request-ID: req-7ab340f9-f0a4-4f41-9ed1-2666c38bab6b" -d '{"attachment": {"instance_uuid": "a03c334b-eaf0-4b73-92af-2242b894a9bb", "connector": null, "volume_uuid": "4ee80bab-342c-4d2f-a104-c94c847b0bc7"}}' _http_log_request /usr/lib/python2.7/site-packages/keystoneauth1/session.py:372
./sosreport-cloud4a-controller-0-2020-03-16-bmhcizu/var/log/containers/nova/nova-api.log.33.gz:36780:RESP BODY: {"attachment": {"status": "reserved", "detached_at": "", "connection_info": {}, "attached_at": "", "attach_mode": null, "instance": "a03c334b-eaf0-4b73-92af-2242b894a9bb", "volume_id": "4ee80bab-342c-4d2f-a104-c94c847b0bc7", "id": "f0e07a87-9e31-45bb-aa6b-40368b1117aa"}}

./sosreport-cloud4a-controller-0-2020-03-16-bmhcizu/var/log/containers/nova/nova-api.log.33.gz:36782:2020-03-13 18:57:25.155 24 DEBUG cinderclient.v3.client [req-7ab340f9-f0a4-4f41-9ed1-2666c38bab6b 80e47b6d05f74ca18c188e462f0d23d4 b17375fff9724e5c812d9e9df63d19f9 - default default] POST call to cinderv3 for http://10.148.31.138:8776/v3/b17375fff9724e5c812d9e9df63d19f9/attachments used request id req-1805257c-c6c8-4986-afd2-1f6c4b72b218 request /usr/lib/python2.7/site-packages/keystoneauth1/session.py:722

./sosreport-cloud4a-controller-0-2020-03-16-bmhcizu/var/log/containers/cinder/cinder-api.log.33.gz:33040:2020-03-13 18:57:25.069 23 INFO cinder.api.openstack.wsgi [req-1805257c-c6c8-4986-afd2-1f6c4b72b218 80e47b6d05f74ca18c188e462f0d23d4 b17375fff9724e5c812d9e9df63d19f9 - default default] POST http://10.148.31.138:8776/v3/b17375fff9724e5c812d9e9df63d19f9/attachments

./sosreport-cloud4a-controller-0-2020-03-16-bmhcizu/var/log/containers/cinder/cinder-api.log.33.gz:33041:2020-03-13 18:57:25.070 23 DEBUG cinder.api.openstack.wsgi [req-1805257c-c6c8-4986-afd2-1f6c4b72b218 80e47b6d05f74ca18c188e462f0d23d4 b17375fff9724e5c812d9e9df63d19f9 - default default] Action: 'create', calling method: <function version_select at 0x7f7b0897a668>, body: {"attachment": {"instance_uuid": "a03c334b-eaf0-4b73-92af-2242b894a9bb", "connector": null, "volume_uuid": "4ee80bab-342c-4d2f-a104-c94c847b0bc7"}} _process_stack /usr/lib/python2.7/site-packages/cinder/api/openstack/wsgi.py:868

## 3. There should be a request to set the attachment complete. But I did NOT find any logs (nova or cinder) about this.

## Summary:
Nova should send a request to set the attachment complete but didn't.
The requests should be like:

Use attachment id: f0e07a87-9e31-45bb-aa6b-40368b1117aa to locate in logs.

./sosreport-cloud4a-controller-0-2020-03-16-bmhcizu/var/log/containers/cinder/cinder-api.log.33.gz:33191:2020-03-13 18:57:36.171 22 INFO cinder.api.openstack.wsgi [req-9fabee55-59d4-47bc-80a9-627c1db3285e 80e47b6d05f74ca18c188e462f0d23d4 b17375fff9724e5c812d9e9df63d19f9 - default default] POST http://10.148.31.138:8776/v3/b17375fff9724e5c812d9e9df63d19f9/attachments/fcc8e4de-172c-4c54-9afa-4d4aea1b00b9/action
./sosreport-cloud4a-controller-0-2020-03-16-bmhcizu/var/log/containers/cinder/cinder-api.log.33.gz:33192:2020-03-13 18:57:36.171 22 DEBUG cinder.api.openstack.wsgi [req-9fabee55-59d4-47bc-80a9-627c1db3285e 80e47b6d05f74ca18c188e462f0d23d4 b17375fff9724e5c812d9e9df63d19f9 - default default] Action body: {"os-complete": null} get_method /usr/lib/python2.7/site-packages/cinder/api/openstack/wsgi.py:985

The sosreport didn't contain any nova compute logs. Don't know why the nova didn't send the request to set the attachment complete.

Comment 35 Pablo Caruana 2020-07-07 18:16:32 UTC
Just for keeping things crystal clear, as different environments, component versions were involved not necessary being the same cause/effect

From the volumes studied  c0f3d1f8-a352-4610-86d1-4cc8e3f7d32f and 6e8500fc-238e-447a-a547-70d0a6edd88e were stuck in a reserved state.

Basically the attachment of these volumes was delayed by ~44 hours in Nova due to libvirt hanging while attaching a different volume e0d2043b-9bd1-440a-8cb1-0ef02b271469 to instance d3b5ca0a-c5e4-4c60-8a63-adaddff21b91. By the time that cleared and we attempted to complete the attachments of c0f3d1f8-a352-4610-86d1-4cc8e3f7d32f and 6e8500fc-238e-447a-a547-70d0a6edd88e the user token associated with each request had expired leaving these volumes in a reserved state.

The simpler way  to recover volumes c0f3d1f8-a352-4610-86d1-4cc8e3f7d32f and 6e8500fc-238e-447a-a547-70d0a6edd88e by deleting any incomplete attachments, returning the volumes to an available state ready for attachment.

$ cinder attachment-list --volume-id c0f3d1f8-a352-4610-86d1-4cc8e3f7d32f
$ cinder attachment-delete $attachment_id

$ cinder attachment-list --volume-id 6e8500fc-238e-447a-a547-70d0a6edd88e
$ cinder attachment-delete $attachment_id

Regarding why the attachment of volume e0d2043b-9bd1-440a-8cb1-0ef02b271469 did hang leading to this,  os-brick incorrectly told Nova to attach /dev/dm-80 to the instance when it was actually faulty from a previous issue and already queuing data thus locking up libvirt when Nova asked it to attach the device to the instance.

This feels more on the RHEL bits os-brick and nova are depending as we observed in the past situations  were reported problems at device mapper with rhel 7.6 where  multipathd can occasionally lose track of the number of active paths  (There are plent of multipathd: xx:yy: reinstate around those devices)

Recomendation from storage PoV is updating to device-mapper-multipath-0.4.9-128.el7 at least on the compute nodes were this situation was noticed.
https://access.redhat.com/errata/RHBA-2020:1066


Additionally there were situations  were  device names in /dev/disk/by-path symlinks  are not persistent / unique and sg3_utils need to be updated at least to sg3_utils-1.37-19.el7
https://access.redhat.com/errata/RHBA-2020:1093 that can also lead the os-brick to this kind of situation

General recommendation for os-brick would be being sure running python2-os-brick-2.3.9-2.el7ost.noarch.rpm on the each compute node side at least.



Moving forward, if there are additional examples to look into at we'd suggest opening fresh bugs associated with a single case/env until we can prove what the actual underlying issue is. Additionally please make sure to clearly document the volumes, instances and if possible request ids we should be looking at.

Thanks!


Note You need to log in before you can comment on or make changes to this bug.