Description of problem: Some times, when a disk detachment operation doesn't complete, status stays in detaching, it should change to a failed state tp give information to the operator. Version-Release number of selected component (if applicable): OSP 13 How reproducible: always Steps to Reproduce: - When volume deleted manually from itachi backenbd - Perhaps others Actual results: Status stays in detaching... Expected results: It should fail after some time Additional info:
Firstly I think this is a bug not an RFE, and the bug report should be: Volume remains in detaching state after failing to detach I believe the correct behaviour should probably be to set the volume state back to in-use, and add an instance fault. I'll need to understand exactly how to reproduce the problem, though. Please could you provide clear steps how to reproduce the initial detaching state, and corresponding DEBUG logs for cinder api, cinder volume, nova api, and nova compute?
Investigate https://review.openstack.org/#/c/590439/3/nova/virt/block_device.py
(In reply to Matthew Booth from comment #4) > Investigate > https://review.openstack.org/#/c/590439/3/nova/virt/block_device.py Adding a note here based on discussion in #rhos-compute: I realized that this bug fix ^ doesn't apply to Newton as the code is quite different. The bug fix above was restoring a roll_detaching call that was erroneously removed during a different, previous bug fix. But the erroneous removal happened _after_ Newton (OSP 10). In Newton, the roll_detaching calls during detach failures are in the compute manager. Looking at the code, I noticed that roll_detaching is _not_ called in when driver.detach_volume raises DiskNotFound (which seems like it would be raised if the volume was previously deleted manually from the storage backend): https://github.com/openstack/nova/blob/newton-eol/nova/compute/manager.py#L4744 Here we see that roll_detaching is called when Exception is caught, but it is not called when DiskNotFound is caught. I think this might be the bug.
I have closed this bug as it has been waiting for more info for at least 4 weeks. We only do this to ensure that we don't accumulate stale bugs which can't be addressed. If you are able to provide the requested information, please feel free to re-open this bug.
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days