There is currently a known issue when using a Red Hat Ceph Storage (RHCS) back end for volumes that can prevent instances from being rebooted, and may lead to data corruption. This occurs when all of the following conditions are met:
+
* RHCS is the back end for instance volumes.
* RHCS has multiple storage pools for volumes.
* A volume is being retyped where the new type requires the volume to be stored in a different pool than its current location.
* The retype call uses the `on-demand` migration_policy.
* The volume is attached to an instance.
+
Workaround: Do not retype `in-use` volumes that meet all of these listed conditions.
Created attachment 1979481[details]
reproduction notes
Description of problem:
Volume retype of an in-use RBD volume moves the RBD image but does not update location in the instance using the volume.
Steps to Reproduce:
1. Have an RBD volume attached to an instance
2. Retype w/ migrate the volume to a type that moves it to a different RBD pool (different c-vol backend)
3. Observe the RBD volumes' location (rbd ls volumes) vs. the location in the instance VM (virsh dumpxml).
4. Reboot the instance w/ openstack server reboot
Actual results:
Instance cannot boot.
Additional info:
The upstream bug contains more detailed notes on reproduction:
https://bugs.launchpad.net/cinder/+bug/2019190https://paste.openstack.org/raw/bNpzkjbeXrmTCwNHfDGs/