Created attachment 919370 [details]
Description of problem: While testing cinder sanity, I failed to nova volume-detach one of the volumes. Cinder uses Gluster as backend, BTW another volume attached/detached just fine.
See compute.log trace :
2014-07-20 13:31:25.325 25142 ERROR oslo.messaging.rpc.dispatcher [-] Exception during message handling: expected string or buffer
Version-Release number of selected component (if applicable):
Happened only with one volume, other volume worked fine.
Steps to Reproduce:
1. Created instance
2. Created volume
3. Attached volume, wrote to it.
4. When I try to detach state changes to detaching
5. As temp workaround, terminating instance caused volume to detach (and delete as I also tired force-deleting volume), see end of log.
Status detaching, volume remains attached.
Volume should detach without error.
Good idea for RFE add Nova force-volume-detach option.
This looks like https://bugs.launchpad.net/nova/+bug/1327218 that we keep hitting sporadically in the gate as well (see http://status.openstack.org/elastic-recheck/).
I do not think that this is related to the Gluster backend - it seems like a race in the way Nova interacts with Cinder.
The upstream bug linked here seems to be fixed by this commit in Juno (the actual upstream commit from which this is cherry picked is at the bottom):
Author: Matt Riedemann <email@example.com>
Date: Wed Mar 18 12:42:42 2015 -0700
Save bdm.connection_info before calling volume_api.attach_volume
There is a race in attach/detach of a volume where the volume status
goes to 'in-use' before the bdm.connection_info data is stored in the
database. Since attach is a cast, the caller can see the volume go to
'in-use' and immediately try to detach the volume and blow up in the
compute manager because bdm.connection_info isn't set stored in the
This fixes the issue by saving the connection_info immediately before
calling volume_api.attach_volume (which sets the volume status to
NOTE(mriedem): The block_device conflicts are due to using dot
notation when accessing object fields and in kilo the context is
no longer passed to bdm.save(). The test conflicts are due to moving
the test modules in kilo and passing the context on save().
(cherry picked from commit 6fb2ef96d6aaf9ca0ad394fd7621ef1e6003f5a1)
This bug affects RHOS 5.0 on RHEL 7.1 as well and I have the confirmation from a customer that this bug is fixed with https://bugs.launchpad.net/nova/+bug/1327218 .
A comment from the case:
"When creating volumes under heavy load, attaching them to a nova instance and right away trying to detach those volumes, some volumes are stuck in 'detaching' state.
I found the following bug on launchpad and after implementing it manually in our lab, the issue was fixed there:
I also saw that there is also a bugzilla entry for RHOSP 5/EL 6: https://bugzilla.redhat.com/show_bug.cgi?id=1121390
On upstream it is backported for stable/juno: https://review.openstack.org/#/c/166017/
Would it be possible to also make it available for our RHOSP Juno?"
*** This bug has been marked as a duplicate of bug 1265745 ***
When I go to bug 1265745, it says I'm not authorized to see it :-(
Can updates still be provided on this bug, or the authorization loosened on 1265745? Thanks.
(In reply to Charles Crouch from comment #7)
> When I go to bug 1265745, it says I'm not authorized to see it :-(
> Can updates still be provided on this bug, or the authorization loosened on
> 1265745? Thanks.
Rather than opening it up I can point you towards the public errata page for the fix, listing the version where this landed :
openstack-nova bug fix advisory - openstack-nova-2014.1.5-7.el7ost
openstack-nova bug fix advisory - openstack-nova-2014.1.5-6.el6ost
Thanks very much Lee.