Bug 2226366

Summary: [RBD] Retyping of in-use boot volumes renders instances unusable (possible data corruption)
Product: Red Hat OpenStack Reporter: Eric Harney <eharney>
Component: openstack-cinderAssignee: Eric Harney <eharney>
Status: MODIFIED --- QA Contact: Evelina Shames <eshames>
Severity: high Docs Contact: Andy Stillman <astillma>
Priority: high    
Version: 17.1 (Wallaby)CC: aruffin, astupnik, brian.rosmaita, gcharot, gfidente, ifrangs, lsvaty, mwitt, pgrist
Target Milestone: z1Keywords: Regression, Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-cinder-18.2.2-17.1.20230816200905.f6b44fc.el9osttrunk Doc Type: Known Issue
Doc Text:
There is currently a known issue when using a Red Hat Ceph Storage (RHCS) back end for volumes that can prevent instances from being rebooted, and may lead to data corruption. This occurs when all of the following conditions are met: + * RHCS is the back end for instance volumes. * RHCS has multiple storage pools for volumes. * A volume is being retyped where the new type requires the volume to be stored in a different pool than its current location. * The retype call uses the `on-demand` migration_policy. * The volume is attached to an instance. + Workaround: Do not retype `in-use` volumes that meet all of these listed conditions.
Story Points: ---
Clone Of:
: 2229174 (view as bug list) Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
reproduction notes none

Description Eric Harney 2023-07-25 19:57:21 UTC
Created attachment 1979481 [details]
reproduction notes

Description of problem:
Volume retype of an in-use RBD volume moves the RBD image but does not update location in the instance using the volume.


Steps to Reproduce:
1.  Have an RBD volume attached to an instance
2.  Retype w/ migrate the volume to a type that moves it to a different RBD pool (different c-vol backend)
3.  Observe the RBD volumes' location (rbd ls volumes) vs. the location in the instance VM (virsh dumpxml).
4.  Reboot the instance w/ openstack server reboot

Actual results:
Instance cannot boot.

Additional info:
The upstream bug contains more detailed notes on reproduction:
https://bugs.launchpad.net/cinder/+bug/2019190
https://paste.openstack.org/raw/bNpzkjbeXrmTCwNHfDGs/

Comment 8 Brian Rosmaita 2023-08-04 15:07:18 UTC
The "known issue" BZ for 17.1 GA is https://bugzilla.redhat.com/show_bug.cgi?id=2229174

Comment 9 Andy Stillman 2023-08-09 13:26:28 UTC
*** Bug 2229174 has been marked as a duplicate of this bug. ***