Bug 1121390

Summary: Nova volume-detach fails stuck in detaching state , Cinder using Gluster backend
Product: Red Hat OpenStack Reporter: Tzach Shefi <tshefi>
Component: openstack-novaAssignee: Nikola Dipanov <ndipanov>
Status: CLOSED DUPLICATE QA Contact: nlevinki <nlevinki>
Severity: high Docs Contact:
Priority: high    
Version: 5.0 (RHEL 6)CC: charcrou, dhill, eglynn, kchamart, lyarwood, ndipanov, sgordon, sreber, yeylon
Target Milestone: z6Keywords: ZStream
Target Release: 5.0 (RHEL 6)   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-12-15 16:45:42 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Nova compute.log none

Description Tzach Shefi 2014-07-20 10:54:01 UTC
Created attachment 919370 [details]
Nova compute.log

Description of problem:  While testing cinder sanity, I failed to nova volume-detach one of the volumes. Cinder uses Gluster as backend, BTW another volume attached/detached just fine. 

See compute.log trace :
2014-07-20 13:31:25.325 25142 ERROR oslo.messaging.rpc.dispatcher [-] Exception during message handling: expected string or buffer                                   


Version-Release number of selected component (if applicable):
RHEL 6.5
openstack-nova-compute-2014.1.1-2.el6ost.noarch
python-novaclient-2.17.0-2.el6ost.noarch
openstack-cinder-2014.1.1-1.el6ost.noarch
python-cinderclient-1.0.9-1.el6ost.noarch
python-cinder-2014.1.1-1.el6ost.noarch


How reproducible:
Happened only with one volume, other volume worked fine. 

Steps to Reproduce:
1. Created instance
2. Created volume 
3. Attached volume, wrote to it.
4. When I try to detach state changes to detaching 

5. As temp workaround, terminating instance caused volume to detach (and delete as I also tired force-deleting volume), see end of log. 

Actual results:
Status detaching, volume remains attached.  

Expected results:
Volume should detach without error.


Good idea for RFE add Nova force-volume-detach option. 
https://blueprints.launchpad.net/nova/+spec/add-force-detach-to-nova

Comment 1 Nikola Dipanov 2014-07-30 15:12:37 UTC
This looks like https://bugs.launchpad.net/nova/+bug/1327218 that we keep hitting sporadically in the gate as well (see http://status.openstack.org/elastic-recheck/).

I do not think that this is related to the Gluster backend - it seems like a race in the way Nova interacts with Cinder.

Comment 2 Kashyap Chamarthy 2015-06-08 16:21:13 UTC
The upstream bug linked here seems to be fixed by this commit in Juno (the actual upstream commit from which this is cherry picked is at the bottom):


commit bbf6348997fee02f9dadd556565f44005e2c7f23
Author: Matt Riedemann <mriedem.com>
Date: Wed Mar 18 12:42:42 2015 -0700

    Save bdm.connection_info before calling volume_api.attach_volume

    There is a race in attach/detach of a volume where the volume status
    goes to 'in-use' before the bdm.connection_info data is stored in the
    database. Since attach is a cast, the caller can see the volume go to
    'in-use' and immediately try to detach the volume and blow up in the
    compute manager because bdm.connection_info isn't set stored in the
    database.

    This fixes the issue by saving the connection_info immediately before
    calling volume_api.attach_volume (which sets the volume status to
    'in-use').

    Closes-Bug: #1327218

    Conflicts:
            nova/tests/unit/compute/test_compute.py
            nova/tests/unit/virt/test_block_device.py
            nova/virt/block_device.py

    NOTE(mriedem): The block_device conflicts are due to using dot
    notation when accessing object fields and in kilo the context is
    no longer passed to bdm.save(). The test conflicts are due to moving
    the test modules in kilo and passing the context on save().

    Change-Id: Ib95c8f7b66aca0c4ac7b92d140cbeb5e85c2717f
    (cherry picked from commit 6fb2ef96d6aaf9ca0ad394fd7621ef1e6003f5a1)

Comment 3 David Hill 2015-10-06 17:16:50 UTC
This bug affects RHOS 5.0 on RHEL 7.1 as well and I have the confirmation from a customer that this bug is fixed with https://bugs.launchpad.net/nova/+bug/1327218 .

Comment 4 David Hill 2015-10-06 17:19:32 UTC
A comment from the case:

"When creating volumes under heavy load, attaching them to a nova instance and right away trying to detach those volumes, some volumes are stuck in 'detaching' state.
I found the following bug on launchpad and after implementing it manually in our lab, the issue was fixed there: 
https://bugs.launchpad.net/nova/+bug/1327218

I also saw that there is also a bugzilla entry for RHOSP 5/EL 6: https://bugzilla.redhat.com/show_bug.cgi?id=1121390

On upstream it is backported for stable/juno: https://review.openstack.org/#/c/166017/

Would it be possible to also make it available for our RHOSP Juno?"

Comment 6 Lee Yarwood 2015-12-15 16:45:42 UTC

*** This bug has been marked as a duplicate of bug 1265745 ***

Comment 7 Charles Crouch 2015-12-15 16:52:46 UTC
When I go to bug 1265745, it says I'm not authorized to see it :-(
Can updates still be provided on this bug, or the authorization loosened on 1265745? Thanks.

Comment 8 Lee Yarwood 2015-12-15 17:02:25 UTC
(In reply to Charles Crouch from comment #7)
> When I go to bug 1265745, it says I'm not authorized to see it :-(
> Can updates still be provided on this bug, or the authorization loosened on
> 1265745? Thanks.

Rather than opening it up I can point you towards the public errata page for the fix, listing the version where this landed :

openstack-nova bug fix advisory - openstack-nova-2014.1.5-7.el7ost
https://rhn.redhat.com/errata/RHBA-2015-2070.html

openstack-nova bug fix advisory - openstack-nova-2014.1.5-6.el6ost
https://rhn.redhat.com/errata/RHBA-2015-2075.html

Comment 9 Charles Crouch 2015-12-15 17:19:20 UTC
Thanks very much Lee.