Bug 1978526

Summary: Storage is not copied at all when do vm live migration with --copy-storage-inc
Product: Red Hat Enterprise Linux 9 Reporter: Fangge Jin <fjin>
Component: libvirtAssignee: Virtualization Maintenance <virt-maint>
libvirt sub component: Live Migration QA Contact: Virtualization Bugs <virt-bugs>
Status: CLOSED CURRENTRELEASE Docs Contact:
Severity: high    
Priority: unspecified CC: hhan, jdenemar, jsuchane, pkrempa, virt-maint, xuzhang
Version: 9.0Keywords: Regression, Triaged
Target Milestone: beta   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: libvirt-7.6.0-1.el9 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1978716 (view as bug list) Environment:
Last Closed: 2021-12-07 21:57:54 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version: 7.6.0
Embargoed:
Bug Depends On:    
Bug Blocks: 1978716    
Attachments:
Description Flags
libvirtd log
none
The qmp log of src and dest hosts none

Description Fangge Jin 2021-07-02 05:12:02 UTC
Created attachment 1797017 [details]
libvirtd log

Description of problem:
As subject

Version-Release number of selected component (if applicable):
qemu-kvm-6.0.0-7.el9.x86_64
libvirt-7.4.0-1.el9.x86_64


How reproducible:
100%

Steps to Reproduce:
1. Start a vm with local storage

2. Pre-create the disk image on target host manually
# qemu-img create -f qcow2 /var/lib/avocado/data/avocado-vt/images/jeos-27-x86_64.qcow2 10G

# qemu-img info /var/lib/avocado/data/avocado-vt/images/jeos-27-x86_64.qcow2 -U
image: /var/lib/avocado/data/avocado-vt/images/jeos-27-x86_64.qcow2
file format: qcow2
virtual size: 10 GiB (10737418240 bytes)
disk size: 196 KiB
cluster_size: 65536
Format specific information:
    compat: 1.1
    compression type: zlib
    lazy refcounts: false
    refcount bits: 16
    corrupt: false
    extended l2: false


2. Migrate vm with --copy-storage-inc
# virsh migrate avocado-vt-vm1 qemu+ssh://******/system --live --verbose --copy-storage-inc

3. Check the disk image on dest host, found its size is small
# qemu-img info /var/lib/avocado/data/avocado-vt/images/jeos-27-x86_64.qcow2 -U
image: /var/lib/avocado/data/avocado-vt/images/jeos-27-x86_64.qcow2
file format: qcow2
virtual size: 10 GiB (10737418240 bytes)
disk size: 5.01 MiB
cluster_size: 65536
Format specific information:
    compat: 1.1
    compression type: zlib
    lazy refcounts: false
    refcount bits: 16
    corrupt: false
    extended l2: false

4. Try to login to vm on dest host, failed

5. Check libvirtd.log, can't find blockdev-add 



Actual results:
As description


Expected results:
Disk should be copied during migration when --copy-storage-inc is used


Additional info:
Can't reproduce with --copy-storage-all

Comment 1 Fangge Jin 2021-07-02 05:12:26 UTC
Created attachment 1797029 [details]
domain xml

Comment 2 Han Han 2021-07-02 08:19:43 UTC
Created attachment 1797066 [details]
The qmp log of src and dest hosts

Reproduced on libvirt-7.4.0-1.el9.x86_64 qemu-kvm-6.0.0-4.el9.x86_64.
See the qmp log files generated by qemu-monitor.stp.

Comment 3 Peter Krempa 2021-07-02 11:58:59 UTC
The problem happens because a wrong constant was used in the logic expression which is used to determine whether storage migration needs to take place.

The original condition is:

bool storageMigration = flags & (VIR_MIGRATE_NON_SHARED_DISK | QEMU_MONITOR_MIGRATE_NON_SHARED_INC);

The correct one is:

bool storageMigration = flags & (VIR_MIGRATE_NON_SHARED_DISK | VIR_MIGRATE_NON_SHARED_INC);

QEMU_MONITOR_MIGRATE_NON_SHARED_INC equals to 0x04
VIR_MIGRATE_NON_SHARED_INC equals to 0x80

Comment 4 Peter Krempa 2021-07-12 14:42:27 UTC
Fixed upstream:

commit b249fa78718cd6c21109b385b568ecd3d6a3a8dd
Author: Peter Krempa <pkrempa>
Date:   Fri Jul 2 14:17:58 2021 +0200

    NEWS: Mention implications of the bug in migration code
    
    Wrong flag use could have user-visible implications. Mention the fix.
    
    Signed-off-by: Peter Krempa <pkrempa>
    Reviewed-by: Ján Tomko <jtomko>

commit f58349c9c6d26d98e7c8c195b1160d0c0cfff080
Author: Peter Krempa <pkrempa>
Date:   Fri Jul 2 14:17:57 2021 +0200

    qemu: migration: Use correct flag constant for enabling storage migration
    
    The 'storageMigration' flag is supposed to be true if storage migration
    is requested, which is based on VIR_MIGRATE_NON_SHARED_DISK or
    VIR_MIGRATE_NON_SHARED_INC flags. The assignment to the variable used
    QEMU_MONITOR_MIGRATE_NON_SHARED_INC (0x04) instead of
    VIR_MIGRATE_NON_SHARED_INC (0x80), caused libvirtd to skip the actual
    copy of data.
    
    Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1978526
    Fixes: da69f4b2084bff140238e450e264d6036ebef898
    Signed-off-by: Peter Krempa <pkrempa>
    Reviewed-by: Ján Tomko <jtomko>

v7.5.0-44-gb249fa7871

Comment 5 Han Han 2021-07-16 03:25:42 UTC
Test on libvirt v7.5.0-97-g133d05a15e and qemu-6.0.0-1.fc35.x86_64 as comment0. PASS

Comment 9 Han Han 2021-08-23 02:43:40 UTC
Verified on libvirt-7.6.0-2.el9.x86_64 qemu-kvm-6.0.0-12.el9.x86_64:
1. Prepare a nbd based on an image with OS
2. Create backing file of nbd on both hosts
# qemu-img info /var/lib/libvirt/images/backing.qcow2 
image: /var/lib/libvirt/images/backing.qcow2
file format: qcow2
virtual size: 10 GiB (10737418240 bytes)
disk size: 26 MiB
cluster_size: 65536
backing file: json:{"file":{"driver":"nbd","server":{"type":"inet","host":"10.0.150.247","port":"10809"}}}
backing file format: qcow2
Format specific information:
    compat: 1.1
    compression type: zlib
    lazy refcounts: false
    refcount bits: 16
    corrupt: false
    extended l2: false

3. Refresh the storage pool of backing image

4. Start a VM with backing image. Finish migration with --copy-storage-inc
# virsh migrate fedora34 qemu+ssh://root.150.247/system --live --verbose --copy-storage-inc
Migration: [100 %]

5. Login the VM and write some data to the disk
# dd if=/dev/zero of=file bs=10M count=10
10+0 records in
10+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 2.77315 s, 37.8 MB/s

No I/O error.

Comment 12 Han Han 2022-01-13 04:30:53 UTC
Covered by RHEL-120247