Bug 1056037 - GlusterFS Snapshot delete of attached volume fails if it runs > 10 minutes
Summary: GlusterFS Snapshot delete of attached volume fails if it runs > 10 minutes
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-cinder
Version: 4.0
Hardware: All
OS: All
urgent
urgent
Target Milestone: ---
: 6.0 (Juno)
Assignee: Eric Harney
QA Contact: Dafna Ron
URL:
Whiteboard: storage
: 1101504 (view as bug list)
Depends On: 1066167
Blocks: 1033652 1040711 1045196
TreeView+ depends on / blocked
 
Reported: 2014-01-21 12:53 UTC by Yogev Rabl
Modified: 2016-04-27 05:30 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: Cinder has a fixed timeout for GlusterFS driver snapshot create and delete operations Consequence: If a snapshot create/delete operation takes longer than 10 minutes to complete, Cinder will fail it even if it is still working correctly. Fix: Have Nova send Cinder updates during the process so it knows that the job is still active. Result: Snapshot operations can take as long as required without timing out as long as activity is still reported.
Clone Of:
: 1066167 1078975 (view as bug list)
Environment:
Last Closed: 2014-10-09 13:27:29 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
the cinder & compute logs (483.00 KB, application/x-bzip)
2014-01-21 12:53 UTC, Yogev Rabl
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1273984 0 None None None Never
OpenStack gerrit 69759 0 None None None Never
OpenStack gerrit 69761 0 None None None Never

Description Yogev Rabl 2014-01-21 12:53:27 UTC
Created attachment 853217 [details]
the cinder & compute logs

Description of problem:
While reproducing Bug 1033652, the cinder wasn't able to delete the snapshots of the volume attached to the instance. 

The system was installed with GlusterFS back end configured in the Packstack answer file. 
Both the Cinder & the Nova Compute servers had fuse installed:
fuse-libs-2.8.3-4.el6.x86_64
glusterfs-fuse-3.4.0.57rhs-1.el6_5.x86_64
fuse-2.8.3-4.el6.x86_64

And the SElinux was configured: 
# getsebool virt_use_fusefs
virt_use_fusefs --> on

According to the steps:
1. Created a volume from an image: 
# cinder create --image-id 52572739-a5e7-4232-a184-e267934cdd15 30
+---------------------+--------------------------------------+
|       Property      |                Value                 |
+---------------------+--------------------------------------+
|     attachments     |                  []                  |
|  availability_zone  |                 nova                 |
|       bootable      |                false                 |
|      created_at     |      2014-01-21T12:29:52.175178      |
| display_description |                 None                 |
|     display_name    |                 None                 |
|          id         | 83fc7617-7a95-4c6b-b631-28bbf991c120 |
|       image_id      | 52572739-a5e7-4232-a184-e267934cdd15 |
|       metadata      |                  {}                  |
|         size        |                  30                  |
|     snapshot_id     |                 None                 |
|     source_volid    |                 None                 |
|        status       |               creating               |
|     volume_type     |                 None                 |
+---------------------+--------------------------------------+

2. launched an instance from the volume named 'verify_bug'
3. create a snapshot from the instance named 'verify_bug_snap'
# cinder snapshot-list
+--------------------------------------+--------------------------------------+----------------+------------------------------+------+
|                  ID                  |              Volume ID               |     Status     |         Display Name         | Size |
+--------------------------------------+--------------------------------------+----------------+------------------------------+------+
 84c59525-63a9-4ebb-9125-e26e97bc1f51 | 83fc7617-7a95-4c6b-b631-28bbf991c120 |   available    | snapshot for verify_bug_snap |  30  |
+--------------------------------------+--------------------------------------+----------------+------------------------------+------+
From the nova compute server:
# ll /var/lib/nova/mnt/600bd85f165b39eac20b9779f0281317
-rw-rw-rw-. 1 qemu qemu 32212254720 Jan 21 14:35 volume-83fc7617-7a95-4c6b-b631-28bbf991c120
-rw-r--r--. 1 qemu qemu     7602176 Jan 21  2014 volume-83fc7617-7a95-4c6b-b631-28bbf991c120.84c59525-63a9-4ebb-9125-e26e97bc1f51
-rw-r--r--. 1  165  165         223 Jan 21 14:36 volume-83fc7617-7a95-4c6b-b631-28bbf991c120.info

The content of the info file is: 
# cat /var/lib/nova/mnt/600bd85f165b39eac20b9779f0281317/volume-83fc7617-7a95-4c6b-b631-28bbf991c120.info
{
 "84c59525-63a9-4ebb-9125-e26e97bc1f51": "volume-83fc7617-7a95-4c6b-b631-28bbf991c120.84c59525-63a9-4ebb-9125-e26e97bc1f51",
 "active": "volume-83fc7617-7a95-4c6b-b631-28bbf991c120.84c59525-63a9-4ebb-9125-e26e97bc1f51"
}

4. Delete the snapshot: 
# cinder snapshot-delete 84c59525-63a9-4ebb-9125-e26e97bc1f51
# cinder snapshot-list
+--------------------------------------+--------------------------------------+----------------+------------------------------+------+
|                  ID                  |              Volume ID               |     Status     |         Display Name         | Size |
+--------------------------------------+--------------------------------------+----------------+------------------------------+------+
| 84c59525-63a9-4ebb-9125-e26e97bc1f51 | 83fc7617-7a95-4c6b-b631-28bbf991c120 |    deleting    | snapshot for verify_bug_snap |  30  |
+--------------------------------------+--------------------------------------+----------------+------------------------------+------+


Version-Release number of selected component (if applicable):
python-novaclient-2.15.0-2.el6ost.noarch
python-nova-2013.2.1-2.el6ost.noarch
openstack-nova-compute-2013.2.1-2.el6ost.noarch
openstack-nova-common-2013.2.1-2.el6ost.noarch
libvirt-client-0.10.2-29.el6_5.2.x86_64
libvirt-0.10.2-29.el6_5.2.x86_64
libvirt-python-0.10.2-29.el6_5.2.x86_64
python-cinderclient-1.0.7-2.el6ost.noarch
openstack-cinder-2013.2.1-5.el6ost.noarch
python-cinder-2013.2.1-5.el6ost.noarch

How reproducible:
100%

Steps to Reproduce:
1. Create a volume from an image
2. Boot an instance from the volume
3. Create a snapshot from the instance.
4. Delete the snapshot

Actual results:
The snapshot deletion is stuck, and if interrupted it moves to error, thus the user can't delete the volume, as well.

Expected results:
The user can delete the snapshot.

Additional info:

the cinder & compute logs are attached.

Comment 2 Yogev Rabl 2014-01-21 12:55:03 UTC
This bug blocks the following bugs:
https://bugzilla.redhat.com/show_bug.cgi?id=1033652
https://bugzilla.redhat.com/show_bug.cgi?id=1040711

Comment 3 Eric Harney 2014-01-21 18:57:22 UTC
The basic problem here is that Cinder has a fixed time out when waiting for snapshot_delete operations on the Nova side to complete.  If they take too long (even when things are functioning correctly) Cinder will prematurely fail the operation.

To fix this, we need to have Nova send back updates of job percent complete while the block job is in-progress.  Cinder can then reset its timeout window based on these updates.  (This should be doable without changing how the APIs work between Cinder and Nova today.)


For testing in the meantime:
The longest operations are when deleting the only snapshot that exists, because in that case the whole base disk image has to be copied into the snapshot file.  Deletions of snapshots when other snapshots exist should be much quicker, which will let you avoid this bug while testing other pieces of this feature.

Comment 5 Dafna Ron 2014-05-27 11:24:13 UTC
*** Bug 1101504 has been marked as a duplicate of this bug. ***

Comment 8 Sean Cohen 2014-10-09 13:27:29 UTC
This likely indicates using a version of libvirt which had known bugs in it in this area. Closing pending further info on reproduction.


Note You need to log in before you can comment on or make changes to this bug.