The issue here is that volume operations interrupted when cinder-volume goes down are not rolled back, which leaves the volume status in a not consistent state. However, there are a few other implications here: 1. There's no easy way to know when something has gone wrong with one of the cinder-volumes node after the rpc.cast is sent unless we add a sentinel that watches over volume's status changes. 2. If the cinder-volume goes down *while* extending a volume, we can't simply restore the volume status because we don't actually know what the status is. The volume at this point could be completely broken and there's no way - without manual inspection - to know that. So, we could think about possible solutions for this issue but it would go way beyond this bug and Icehouse. It requires discussions upstream - if it is even considered a real issue. Thing is, if cinder-scheduler sends a cast command to cinder-volume and then cinder-volume goes down *while* executing that command, I think the wrong status in the volume is the least important issue a cloud operator would have. This issue probably requires a blueprint. I'm cloning it to keep track of where it came from.
This bug is too general and doesn't have a specific scenario
Verified in python-cinder-2015.1.0-3.el7ost.noarch openstack-cinder-2015.1.0-3.el7ost.noarch action: create a new volume & restart cinder services result: error in Cinder scheduler: cinder-volume in not available action: extend a volume & restart cinder-volume service result: the volume stuck in extending status
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2015:1548