2342822 – [8.0z backport] RBD migration execute reports incorrect status when NBD export on the source is disconnected

Bug 2342822 - [8.0z backport] RBD migration execute reports incorrect status when NBD export on the source is disconnected

Summary: [8.0z backport] RBD migration execute reports incorrect status when NBD expor...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	RBD
Sub Component:
Version:	8.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	8.0z2
Assignee:	Ilya Dryomov
QA Contact:	Sunil Angadi
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2025-01-29 16:06 UTC by Bipin Kunal
Modified:	2025-03-06 14:21 UTC (History)
CC List:	8 users (show)
Fixed In Version:	ceph-19.2.0-76.el9cp
Doc Type:	Bug Fix
Doc Text:	.RBD migration execute command now no longer returns success when the import is interrupted Previously, due to an implementation defect, the rbd migration execute command returned success even if the import was interrupted due to a network issue. As a result, the rbd status command reported the migration as executed, even though the import was left unfinished. The import appeared partially or completely unusable to the user. With this fix, the rbd migration execute command no longer returns success when the import is interrupted due to a network issue, and the import remains usable to the user.
Clone Of:	2339092
Environment:
Last Closed:	2025-03-06 14:21:07 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Ceph Project Bug Tracker	58185	None	None	None	2025-01-30 10:12:00 UTC
Github	ceph ceph pull 61567	None	Merged	librbd: stop filtering async request error codes	2025-02-04 15:03:03 UTC
Red Hat Issue Tracker	RHCEPH-10534	None	None	None	2025-01-29 16:06:42 UTC
Red Hat Product Errata	RHBA-2025:2457	None	None	None	2025-03-06 14:21:11 UTC

Description Bipin Kunal 2025-01-29 16:06:06 UTC

+++ This bug was initially created as a clone of Bug #2339092 +++

Description of problem:
-----------------------
While performing RBD live migration from an external raw disk to Ceph NVMe using NBD, during the migration execute command, when the NBD export is disrupted, the progress of the migration execute command immediately reached 100% and reported a successful operation, even though the deep copy was not complete because of the disruption.

Version-Release number of selected component (if applicable):
-------------------------------------------------------------
ceph version 19.2.0-53.el9cp (677d8728b1c91c14d54eedf276ac61de636606f8) squid (stable)

How reproducible:
-----------------
Always


Steps to Reproduce:
-------------------
1. On the client node from where the data is to be migrated to Ceph NVMe, start the qemu-nbd export

#  qemu-nbd --format raw /dev/mapper/mpathd -t --shared 10

2. On the ceph admin node, run the migration prepare command

# cat raw_nbd.json
{
    "type": "raw",
    "stream": {
        "type": "nbd",
        "uri": "nbd://172.20.60.98:10809"
    }
}

# rbd migration prepare --import-only --source-spec-path "raw_nbd.json" nvmeof_pool01/nginx

# rbd status nvmeof_pool01/nginx
Watchers: none
Migration:
	source: {"stream":{"type":"nbd","uri":"nbd://172.20.60.98:10809"},"type":"raw"}
	destination: nvmeof_pool01/nginx (10362b2b11594)
	state: prepared


3. Run the migration execute command

# rbd migration execute nvmeof_pool01/nginx

4. While the previous command is in progress, stop the qemu-nbd export on the client node

#  qemu-nbd --format raw /dev/mapper/mpathd -t --shared 10
^C# ^C

5. Check the status of the rbd migration on ceph admin node

# rbd migration execute nvmeof_pool01/nginx

Image migration: 1% complete...
Image migration: 100% complete...done.

# rbd status nvmeof_pool01/nginx
Watchers:
	watcher=172.20.60.95:0/1533589181 client.25911 cookie=139667371081888
	watcher=172.20.60.96:0/1292776105 client.25917 cookie=140526968709328
Migration:
	source: {"stream":{"type":"nbd","uri":"nbd://172.20.60.98:10809"},"type":"raw"}
	destination: nvmeof_pool01/nginx (10362b2b11594)
	state: executed

# rbd du nvmeof_pool01/nginx
NAME   PROVISIONED  USED
nginx      300 GiB  5.5 GiB

# rbd info  nvmeof_pool01/nginx
rbd image 'nginx':
	size 300 GiB in 76800 objects
	order 22 (4 MiB objects)
	snapshot_count: 0
	id: 10362b2b11594
	block_name_prefix: rbd_data.10362b2b11594
	format: 2
	features: layering, exclusive-lock, object-map, fast-diff, deep-flatten, migrating
	op_features:
	flags:
	create_timestamp: Mon Jan 20 23:49:17 2025
	access_timestamp: Mon Jan 20 23:52:33 2025
	modify_timestamp: Tue Jan 21 00:13:38 2025


Actual results:
---------------
The migration execute command reports a successful execution as soon as the export is stopped on the client

Expected results:
-----------------
The execute command should error out or keep re-trying until the NBD export is started again. It should not report the progress as 100% and successful.

Comment 11 errata-xmlrpc 2025-03-06 14:21:07 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat Ceph Storage 8.0 security, bug fixes, and enhancement updates), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2025:2457

Note You need to log in before you can comment on or make changes to this bug.