Bug 2342822

Summary: [8.0z backport] RBD migration execute reports incorrect status when NBD export on the source is disconnected
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Bipin Kunal <bkunal>
Component: RBDAssignee: Ilya Dryomov <idryomov>
Status: CLOSED ERRATA QA Contact: Sunil Angadi <sangadi>
Severity: high Docs Contact:
Priority: unspecified    
Version: 8.0CC: bhkaur, bkunal, ceph-eng-bugs, cephqe-warriors, idryomov, rgeorge, sangadi, tserlin
Target Milestone: ---   
Target Release: 8.0z2   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ceph-19.2.0-76.el9cp Doc Type: Bug Fix
Doc Text:
.RBD migration execute command now no longer returns success when the import is interrupted Previously, due to an implementation defect, the rbd migration execute command returned success even if the import was interrupted due to a network issue. As a result, the rbd status command reported the migration as executed, even though the import was left unfinished. The import appeared partially or completely unusable to the user. With this fix, the rbd migration execute command no longer returns success when the import is interrupted due to a network issue, and the import remains usable to the user.
Story Points: ---
Clone Of: 2339092 Environment:
Last Closed: 2025-03-06 14:21:07 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Bipin Kunal 2025-01-29 16:06:06 UTC
+++ This bug was initially created as a clone of Bug #2339092 +++

Description of problem:
-----------------------
While performing RBD live migration from an external raw disk to Ceph NVMe using NBD, during the migration execute command, when the NBD export is disrupted, the progress of the migration execute command immediately reached 100% and reported a successful operation, even though the deep copy was not complete because of the disruption.

Version-Release number of selected component (if applicable):
-------------------------------------------------------------
ceph version 19.2.0-53.el9cp (677d8728b1c91c14d54eedf276ac61de636606f8) squid (stable)

How reproducible:
-----------------
Always


Steps to Reproduce:
-------------------
1. On the client node from where the data is to be migrated to Ceph NVMe, start the qemu-nbd export

#  qemu-nbd --format raw /dev/mapper/mpathd -t --shared 10

2. On the ceph admin node, run the migration prepare command

# cat raw_nbd.json
{
    "type": "raw",
    "stream": {
        "type": "nbd",
        "uri": "nbd://172.20.60.98:10809"
    }
}

# rbd migration prepare --import-only --source-spec-path "raw_nbd.json" nvmeof_pool01/nginx

# rbd status nvmeof_pool01/nginx
Watchers: none
Migration:
	source: {"stream":{"type":"nbd","uri":"nbd://172.20.60.98:10809"},"type":"raw"}
	destination: nvmeof_pool01/nginx (10362b2b11594)
	state: prepared


3. Run the migration execute command

# rbd migration execute nvmeof_pool01/nginx

4. While the previous command is in progress, stop the qemu-nbd export on the client node

#  qemu-nbd --format raw /dev/mapper/mpathd -t --shared 10
^C# ^C

5. Check the status of the rbd migration on ceph admin node

# rbd migration execute nvmeof_pool01/nginx

Image migration: 1% complete...
Image migration: 100% complete...done.

# rbd status nvmeof_pool01/nginx
Watchers:
	watcher=172.20.60.95:0/1533589181 client.25911 cookie=139667371081888
	watcher=172.20.60.96:0/1292776105 client.25917 cookie=140526968709328
Migration:
	source: {"stream":{"type":"nbd","uri":"nbd://172.20.60.98:10809"},"type":"raw"}
	destination: nvmeof_pool01/nginx (10362b2b11594)
	state: executed

# rbd du nvmeof_pool01/nginx
NAME   PROVISIONED  USED
nginx      300 GiB  5.5 GiB

# rbd info  nvmeof_pool01/nginx
rbd image 'nginx':
	size 300 GiB in 76800 objects
	order 22 (4 MiB objects)
	snapshot_count: 0
	id: 10362b2b11594
	block_name_prefix: rbd_data.10362b2b11594
	format: 2
	features: layering, exclusive-lock, object-map, fast-diff, deep-flatten, migrating
	op_features:
	flags:
	create_timestamp: Mon Jan 20 23:49:17 2025
	access_timestamp: Mon Jan 20 23:52:33 2025
	modify_timestamp: Tue Jan 21 00:13:38 2025


Actual results:
---------------
The migration execute command reports a successful execution as soon as the export is stopped on the client

Expected results:
-----------------
The execute command should error out or keep re-trying until the NBD export is started again. It should not report the progress as 100% and successful.

Comment 11 errata-xmlrpc 2025-03-06 14:21:07 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat Ceph Storage 8.0 security, bug fixes, and enhancement updates), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2025:2457