Bug 1489980

Summary: OSP 10 unable to remove instance in RBD logs "ondisk = -16 ((16) Device or resource busy)" and "-1 librbd: cannot obtain exclusive lock - not removing"
Product: Red Hat OpenStack Reporter: Vikhyat Umrao <vumrao>
Component: openstack-novaAssignee: Kashyap Chamarthy <kchamart>
Status: CLOSED DUPLICATE QA Contact: OSP DFG:Compute <osp-dfg-compute>
Severity: medium Docs Contact:
Priority: high    
Version: 10.0 (Newton)CC: ceph-eng-bugs, dasmith, eglynn, jdillama, jquinn, jwaterwo, kchamart, lyarwood, mbooth, mwitt, sbauza, sgordon, srevivo, vromanso, vumrao
Target Milestone: ---Keywords: Triaged, ZStream
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-10-28 13:24:12 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Vikhyat Umrao 2017-09-08 21:27:27 UTC
Description of problem:
OSP 10 unable to remove instance in RBD logs "ondisk = -16 ((16) Device or resource busy)" and "-1 librbd: cannot obtain exclusive lock - not removing"

Version-Release number of selected component (if applicable):
OSP 10
Red Hat Ceph Storage 2.1 - 10.2.3-17.el7cp


This looks like to me OSP 10 Nova RBD driver issue not librbd issue because the customer is able to delete the RBD image from RBD command.

I will add the logs and more information in the next comment. Most probably we need to send this bug to OSP team but I am creating this bug with us to cross verify if I am not missing anything.

Comment 1 Vikhyat Umrao 2017-09-08 21:28:59 UTC
Looks like Nova RBD driver is holding the lock and delete thread is not able to take the lock and failing to remove the image.

Comment 7 Jason Dillaman 2017-09-11 14:01:44 UTC
Please confirm whether or not rbd-mirror daemon is active on a secondary cluster and actively mirroring the images that cannot be deleted.

The fact that "rbd status" shows zero watchers on the image is suspicious since the exclusive-lock code showed that the lock owner was alive. Perhaps the qemu process was not shutdown cleanly so the image had a watcher for 30 seconds until the timed out. If you can re-create this scenario, try to run to collect a series of "rbd status" dumps (with timestamps) so it can be aligned w/ the Nova logs.

Comment 54 Lee Yarwood 2019-10-28 13:24:12 UTC

*** This bug has been marked as a duplicate of bug 1636190 ***