Bug 2226366

Summary: [RBD] Retyping of in-use boot volumes renders instances unusable (possible data corruption)
Product: Red Hat OpenStack Reporter: Eric Harney <eharney>
Component: openstack-cinderAssignee: Eric Harney <eharney>
Status: CLOSED ERRATA QA Contact: Yosi Ben Shimon <ybenshim>
Severity: high Docs Contact: Ian Frangs <ifrangs>
Priority: high    
Version: 17.1 (Wallaby)CC: aruffin, astupnik, brian.rosmaita, gcharot, gfidente, ifrangs, lsvaty, ltoscano, mariel, mwitt, pgrist
Target Milestone: z1Keywords: Regression, Triaged
Target Release: 17.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-cinder-18.2.2-1.20230518161045.el9ost Doc Type: Bug Fix
Doc Text:
Before this update, when retyping `in-use` Red Hat Ceph Storage (RHCS) volumes to store the volume in a different pool than its current location, data could be corrupted or lost. With this update, the Block Storage RHCS back end resolves this issue.
Story Points: ---
Clone Of:
: 2229174 (view as bug list) Environment:
Last Closed: 2023-09-20 00:29:46 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
reproduction notes none

Description Eric Harney 2023-07-25 19:57:21 UTC
Created attachment 1979481 [details]
reproduction notes

Description of problem:
Volume retype of an in-use RBD volume moves the RBD image but does not update location in the instance using the volume.


Steps to Reproduce:
1.  Have an RBD volume attached to an instance
2.  Retype w/ migrate the volume to a type that moves it to a different RBD pool (different c-vol backend)
3.  Observe the RBD volumes' location (rbd ls volumes) vs. the location in the instance VM (virsh dumpxml).
4.  Reboot the instance w/ openstack server reboot

Actual results:
Instance cannot boot.

Additional info:
The upstream bug contains more detailed notes on reproduction:
https://bugs.launchpad.net/cinder/+bug/2019190
https://paste.openstack.org/raw/bNpzkjbeXrmTCwNHfDGs/

Comment 8 Brian Rosmaita 2023-08-04 15:07:18 UTC
The "known issue" BZ for 17.1 GA is https://bugzilla.redhat.com/show_bug.cgi?id=2229174

Comment 9 Andy Stillman 2023-08-09 13:26:28 UTC
*** Bug 2229174 has been marked as a duplicate of this bug. ***

Comment 25 Luigi Toscano 2023-09-14 13:53:32 UTC
After some manual application of the manual steps which confirmed the verification, the scenario was verified the additional confirmation of running the tests from the WIP tempest and cinder-tempest-plugin patches which can reproduce the problem, namely:
- https://review.opendev.org/c/openstack/tempest/+/890360
- https://review.opendev.org/c/openstack/cinder-tempest-plugin/+/894189

All those tests pass now (failing before). Kudos to Yosi for most of the verification.

openstack-cinder-18.2.2-1.20230518161045.el9ost.noarch
python3-cinder-18.2.2-1.20230518161045.el9ost.noarch
python3-cinder-common-18.2.2-1.20230518161045.el9ost.noarch

Comment 31 errata-xmlrpc 2023-09-20 00:29:46 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Release of components for Red Hat OpenStack Platform 17.1.1 (Wallaby)), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:5138