Description of problem: OSP14 introduces RBD cinder migration driver (BZ 1262068). When performing a cinder migrate, the driver is indeed used however when doing a cinder retype it is not, the generic driver is used instead (volume is migrated through the controllers). Version-Release number of selected component (if applicable): 14 How reproducible: Always Steps to Reproduce: 1. Define two cinder types pointing to two different cinder backends hosted on the same ceph cluster (diff pools) cinder type-list +--------------------------------------+----------+----------------------+-----------+ | ID | Name | Description | Is_Public | +--------------------------------------+----------+----------------------+-----------+ | 7b51de01-37c1-419b-85ca-dd0de7df3b2e | fast | Fast Volume Type | True | | ec995825-5232-417b-aca6-3aab621c0f7f | standard | Standard Volume Type | True | +--------------------------------------+----------+----------------------+-----------+ 2. Do a cinder retype cinder retype --migration-policy on-demand test fast 3. The migration process use the generic driver Actual results: Migration is not driver assisted Expected results: Migration should use the relevant (RBD in this case) driver. Additional info: Cinder volume debug logs http://pastebin.test.redhat.com/677486
I believe this is a limitation of how the features were designed. Currently the driver optimized retype is only called if the volume is is not encrypted and if the backend doesn't change. When the backend is different and we enable migrations then we call the normal migration, but telling it it has a new type, and this prevents the manager calling the driver optimized migration due to this check in the volume manager's `migrate_volume` method: if not force_host_copy and new_type_id is None: This is because a driver retype doesn't migrate a volume, and a migration doesn't retype a volume. From a quick look I see 2 alternatives: - Allow driver to opt in on a 2 step process: optimized migration and then driver specific retype. - Call optimized migration on retype when the only difference between the types is the destination backend. In my opinion we should enable the second option in any case, and it shouldn't bee too complicated.
I have opened a bug upstream for the second case I mentioned on comment #9 and I have proposed a fix that should resolve most of the use cases, as it will call driver assisted migration when the volume types only change the backend, which is the case for almost all cases described in this BZ's scenario.
Verified on: openstack-cinder-18.2.1-0.20220605050357.9a473fd.el9ost.noarch 1. Defined two backends, each using it's dedicated Ceph backend/rbd pool (overcloud) [stack@undercloud-0 ~]$ cinder type-list +--------------------------------------+-----------------+-------------+-----------+ | ID | Name | Description | Is_Public | +--------------------------------------+-----------------+-------------+-----------+ | aeb5337d-7090-42d7-9067-29f99b336066 | tripleo_default | - | True | | ec3de734-a560-4c3e-b442-2c301b1c83b6 | tripleo2 | - | True | +--------------------------------------+-----------------+-------------+-----------+ (overcloud) [stack@undercloud-0 ~]$ cinder extra-specs-list +--------------------------------------+-----------------+----------------------------------------------+ | ID | Name | extra_specs | +--------------------------------------+-----------------+----------------------------------------------+ | aeb5337d-7090-42d7-9067-29f99b336066 | tripleo_default | {} | -> the default one use the default "tripleo_ceph" | ec3de734-a560-4c3e-b442-2c301b1c83b6 | tripleo2 | {'volume_backend_name': 'tripleo_ceph_vol2'} | +--------------------------------------+-----------------+----------------------------------------------+ Cinder service-list: .. | cinder-volume | hostgroup@tripleo_ceph | nova | enabled | up | 2022-07-07T12:08:26.000000 | - | | cinder-volume | hostgroup@tripleo_ceph_vol2 | nova | enabled | up | 2022-07-07T12:08:26.000000 | - | 2. Created two volumes one on each backend: (overcloud) [stack@undercloud-0 ~]$ cinder list +--------------------------------------+-----------+----------------------+------+-----------------+----------+-------------+ | ID | Status | Name | Size | Volume Type | Bootable | Attached to | +--------------------------------------+-----------+----------------------+------+-----------------+----------+-------------+ | 2d627d92-f101-41ab-82e2-c1bfbec4ebc0 | available | tripleo_default_volA | 1 | tripleo_default | false | | | fe5fd6b7-ade9-4171-9db1-c3392a996a4f | available | tripleo2_volB | 1 | tripleo2 | false | | +--------------------------------------+-----------+----------------------+------+-----------------+----------+-------------+ 3. Now try to migrate each one to the other type/backend: (overcloud) [stack@undercloud-0 ~]$ cinder retype --migration-policy on-demand tripleo_default_volA tripleo2 And the second one too (overcloud) [stack@undercloud-0 ~]$ cinder retype --migration-policy on-demand tripleo2_volB tripleo_default (overcloud) [stack@undercloud-0 ~]$ cinder list +--------------------------------------+-----------+----------------------+------+-----------------+----------+-------------+ | ID | Status | Name | Size | Volume Type | Bootable | Attached to | +--------------------------------------+-----------+----------------------+------+-----------------+----------+-------------+ | 2d627d92-f101-41ab-82e2-c1bfbec4ebc0 | available | tripleo_default_volA | 1 | tripleo2 | false | | | fe5fd6b7-ade9-4171-9db1-c3392a996a4f | available | tripleo2_volB | 1 | tripleo_default | false | | +--------------------------------------+-----------+----------------------+------+-----------------+----------+-------------+ As can be seen both of them were retyped switched over to the other volume type/backend. If we check the logs here is one of them being "Successful RBD assisted volume migration" 2022-07-07 12:13:05.456 11 DEBUG cinder.volume.manager [req-506a06c2-b1e7-4923-8c1a-6cf0ac37f9f9 85321f14fc344d228df218d103c178a8 78cfade4d5854083964e41ac71921565 - - -] Issue driver.migrate_volume. migrate_volume /usr/lib/python3.9/site-packages/cinder/volume/manager.py:2609 2022-07-07 12:13:05.457 11 DEBUG cinder.volume.drivers.rbd [req-506a06c2-b1e7-4923-8c1a-6cf0ac37f9f9 85321f14fc344d228df218d103c178a8 78cfade4d5854083964e41ac71921565 - - -] Attempting RBD assisted volume migration. volume: 2d627d92-f101-41ab-82e2-c1bfbec4ebc0, host: {'host': 'hostgroup@tripleo_ceph_vol2#tripleo_ceph_vol2', 'cluster_name': None, 'capabilities': {'vendor_name': 'Open Source', 'driver_version': '1.2.0', 'storage_protocol': 'ceph', 'total_capacity_gb': 133.06, 'free_capacity_gb': 133.06, 'reserved_percentage': 0, 'multiattach': True, 'thin_provisioning_support': True, 'max_over_subscription_ratio': '20.0', 'location_info': 'ceph:/etc/ceph/ceph.conf:fd7ad824-a913-5b2f-acfc-fabb670e0ebc:openstack:vol2', 'backend_state': 'up', 'volume_backend_name': 'tripleo_ceph_vol2', 'replication_enabled': False, 'allocated_capacity_gb': 1, 'filter_function': None, 'goodness_function': None, 'timestamp': '2022-07-07T12:12:45.394331'}}, status=retyping. migrate_volume /usr/lib/python3.9/site-packages/cinder/volume/drivers/rbd.py:1924 2022-07-07 12:13:05.458 11 DEBUG os_brick.initiator.linuxrbd [req-506a06c2-b1e7-4923-8c1a-6cf0ac37f9f9 85321f14fc344d228df218d103c178a8 78cfade4d5854083964e41ac71921565 - - -] opening connection to ceph cluster (timeout=-1). connect /usr/lib/python3.9/site-packages/os_brick/initiator/linuxrbd.py:70 2022-07-07 12:13:05.482 11 DEBUG cinder.volume.drivers.rbd [req-506a06c2-b1e7-4923-8c1a-6cf0ac37f9f9 85321f14fc344d228df218d103c178a8 78cfade4d5854083964e41ac71921565 - - -] connecting to openstack@ceph (conf=/etc/ceph/ceph.conf, timeout=-1). _do_conn /usr/lib/python3.9/site-packages/cinder/volume/drivers/rbd.py:480 2022-07-07 12:13:05.504 11 DEBUG cinder.volume.drivers.rbd [req-506a06c2-b1e7-4923-8c1a-6cf0ac37f9f9 85321f14fc344d228df218d103c178a8 78cfade4d5854083964e41ac71921565 - - -] connecting to openstack@ceph (conf=/etc/ceph/ceph.conf, timeout=-1). _do_conn /usr/lib/python3.9/site-packages/cinder/volume/drivers/rbd.py:480 2022-07-07 12:13:05.689 11 DEBUG cinder.volume.drivers.rbd [req-506a06c2-b1e7-4923-8c1a-6cf0ac37f9f9 85321f14fc344d228df218d103c178a8 78cfade4d5854083964e41ac71921565 - - -] connecting to openstack@ceph (conf=/etc/ceph/ceph.conf, timeout=-1). _do_conn /usr/lib/python3.9/site-packages/cinder/volume/drivers/rbd.py:480 2022-07-07 12:13:05.731 11 DEBUG cinder.volume.drivers.rbd [req-506a06c2-b1e7-4923-8c1a-6cf0ac37f9f9 85321f14fc344d228df218d103c178a8 78cfade4d5854083964e41ac71921565 - - -] volume has no backup snaps _delete_backup_snaps /usr/lib/python3.9/site-packages/cinder/volume/drivers/rbd.py:1104 2022-07-07 12:13:05.732 11 DEBUG cinder.volume.drivers.rbd [req-506a06c2-b1e7-4923-8c1a-6cf0ac37f9f9 85321f14fc344d228df218d103c178a8 78cfade4d5854083964e41ac71921565 - - -] Volume volume-2d627d92-f101-41ab-82e2-c1bfbec4ebc0 is not a clone. _get_clone_info /usr/lib/python3.9/site-packages/cinder/volume/drivers/rbd.py:1127 2022-07-07 12:13:05.736 11 DEBUG cinder.volume.drivers.rbd [req-506a06c2-b1e7-4923-8c1a-6cf0ac37f9f9 85321f14fc344d228df218d103c178a8 78cfade4d5854083964e41ac71921565 - - -] deleting rbd volume volume-2d627d92-f101-41ab-82e2-c1bfbec4ebc0 delete_volume /usr/lib/python3.9/site-packages/cinder/volume/drivers/rbd.py:1247 2022-07-07 12:13:05.863 11 INFO cinder.volume.drivers.rbd [req-506a06c2-b1e7-4923-8c1a-6cf0ac37f9f9 85321f14fc344d228df218d103c178a8 78cfade4d5854083964e41ac71921565 - - -] Successful RBD assisted volume migration. 2022-07-07 12:13:05.879 11 INFO cinder.volume.manager [req-506a06c2-b1e7-4923-8c1a-6cf0ac37f9f9 85321f14fc344d228df218d103c178a8 78cfade4d5854083964e41ac71921565 - - -] Migrate volume completed successfully. Good to verify, this time using the RBD assisted driver migration rather than previously used generic driver.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Release of components for Red Hat OpenStack Platform 17.0 (Wallaby)), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2022:6543