Created attachment 919370 [details] Nova compute.log Description of problem: While testing cinder sanity, I failed to nova volume-detach one of the volumes. Cinder uses Gluster as backend, BTW another volume attached/detached just fine. See compute.log trace : 2014-07-20 13:31:25.325 25142 ERROR oslo.messaging.rpc.dispatcher [-] Exception during message handling: expected string or buffer Version-Release number of selected component (if applicable): RHEL 6.5 openstack-nova-compute-2014.1.1-2.el6ost.noarch python-novaclient-2.17.0-2.el6ost.noarch openstack-cinder-2014.1.1-1.el6ost.noarch python-cinderclient-1.0.9-1.el6ost.noarch python-cinder-2014.1.1-1.el6ost.noarch How reproducible: Happened only with one volume, other volume worked fine. Steps to Reproduce: 1. Created instance 2. Created volume 3. Attached volume, wrote to it. 4. When I try to detach state changes to detaching 5. As temp workaround, terminating instance caused volume to detach (and delete as I also tired force-deleting volume), see end of log. Actual results: Status detaching, volume remains attached. Expected results: Volume should detach without error. Good idea for RFE add Nova force-volume-detach option. https://blueprints.launchpad.net/nova/+spec/add-force-detach-to-nova
This looks like https://bugs.launchpad.net/nova/+bug/1327218 that we keep hitting sporadically in the gate as well (see http://status.openstack.org/elastic-recheck/). I do not think that this is related to the Gluster backend - it seems like a race in the way Nova interacts with Cinder.
The upstream bug linked here seems to be fixed by this commit in Juno (the actual upstream commit from which this is cherry picked is at the bottom): commit bbf6348997fee02f9dadd556565f44005e2c7f23 Author: Matt Riedemann <mriedem.com> Date: Wed Mar 18 12:42:42 2015 -0700 Save bdm.connection_info before calling volume_api.attach_volume There is a race in attach/detach of a volume where the volume status goes to 'in-use' before the bdm.connection_info data is stored in the database. Since attach is a cast, the caller can see the volume go to 'in-use' and immediately try to detach the volume and blow up in the compute manager because bdm.connection_info isn't set stored in the database. This fixes the issue by saving the connection_info immediately before calling volume_api.attach_volume (which sets the volume status to 'in-use'). Closes-Bug: #1327218 Conflicts: nova/tests/unit/compute/test_compute.py nova/tests/unit/virt/test_block_device.py nova/virt/block_device.py NOTE(mriedem): The block_device conflicts are due to using dot notation when accessing object fields and in kilo the context is no longer passed to bdm.save(). The test conflicts are due to moving the test modules in kilo and passing the context on save(). Change-Id: Ib95c8f7b66aca0c4ac7b92d140cbeb5e85c2717f (cherry picked from commit 6fb2ef96d6aaf9ca0ad394fd7621ef1e6003f5a1)
This bug affects RHOS 5.0 on RHEL 7.1 as well and I have the confirmation from a customer that this bug is fixed with https://bugs.launchpad.net/nova/+bug/1327218 .
A comment from the case: "When creating volumes under heavy load, attaching them to a nova instance and right away trying to detach those volumes, some volumes are stuck in 'detaching' state. I found the following bug on launchpad and after implementing it manually in our lab, the issue was fixed there: https://bugs.launchpad.net/nova/+bug/1327218 I also saw that there is also a bugzilla entry for RHOSP 5/EL 6: https://bugzilla.redhat.com/show_bug.cgi?id=1121390 On upstream it is backported for stable/juno: https://review.openstack.org/#/c/166017/ Would it be possible to also make it available for our RHOSP Juno?"
*** This bug has been marked as a duplicate of bug 1265745 ***
When I go to bug 1265745, it says I'm not authorized to see it :-( Can updates still be provided on this bug, or the authorization loosened on 1265745? Thanks.
(In reply to Charles Crouch from comment #7) > When I go to bug 1265745, it says I'm not authorized to see it :-( > Can updates still be provided on this bug, or the authorization loosened on > 1265745? Thanks. Rather than opening it up I can point you towards the public errata page for the fix, listing the version where this landed : openstack-nova bug fix advisory - openstack-nova-2014.1.5-7.el7ost https://rhn.redhat.com/errata/RHBA-2015-2070.html openstack-nova bug fix advisory - openstack-nova-2014.1.5-6.el6ost https://rhn.redhat.com/errata/RHBA-2015-2075.html
Thanks very much Lee.