Description of problem:
OSP 10 unable to remove instance in RBD logs "ondisk = -16 ((16) Device or resource busy)" and "-1 librbd: cannot obtain exclusive lock - not removing"
Version-Release number of selected component (if applicable):
OSP 10
Red Hat Ceph Storage 2.1 - 10.2.3-17.el7cp
This looks like to me OSP 10 Nova RBD driver issue not librbd issue because the customer is able to delete the RBD image from RBD command.
I will add the logs and more information in the next comment. Most probably we need to send this bug to OSP team but I am creating this bug with us to cross verify if I am not missing anything.
Please confirm whether or not rbd-mirror daemon is active on a secondary cluster and actively mirroring the images that cannot be deleted.
The fact that "rbd status" shows zero watchers on the image is suspicious since the exclusive-lock code showed that the lock owner was alive. Perhaps the qemu process was not shutdown cleanly so the image had a watcher for 30 seconds until the timed out. If you can re-create this scenario, try to run to collect a series of "rbd status" dumps (with timestamps) so it can be aligned w/ the Nova logs.