Bug 1823639 - Support copy storage migration on nvme disk
Summary: Support copy storage migration on nvme disk
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux Advanced Virtualization
Classification: Red Hat
Component: libvirt
Version: 8.2
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: rc
: 8.3
Assignee: Michal Privoznik
QA Contact: Han Han
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-04-14 06:28 UTC by Han Han
Modified: 2020-11-17 17:48 UTC (History)
10 users (show)

Fixed In Version: libvirt-6.5.0-1.el8
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-11-17 17:48:08 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
the VM xml of src and dest (1.51 KB, application/gzip)
2020-09-16 08:36 UTC, Han Han
no flags Details

Comment 1 Han Han 2020-04-14 07:05:45 UTC
VM xml, dest xml, libvirtd log

Description of problem:
As subject

Version-Release number of selected component (if applicable):
libvirt-6.0.0-17.module+el8.2.0+6257+0d066c28.x86_64
qemu-kvm-4.2.0-17.module+el8.2.0+6141+0f540f16.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Start an VM
2. copy storage migration to a nvme disk
➜  ~ xmllint --xpath //disk /tmp/nvme.xml                   
<disk type="nvme" device="disk">
      <driver name="qemu" type="raw" cache="directsync" error_policy="report" rerror_policy="report" io="native" discard="unmap" detect_zeroes="unmap"/>
         <source type="pci" managed="yes" namespace="1">
           <address domain="0x0000" bus="0x44" slot="0x00" function="0x0"/>
        </source>
      <backingStore/>
      <target dev="sda" bus="sata"/>
      <address type="drive" controller="0" bus="0" target="0" unit="0"/>
    </disk>

➜  ~ virsh migrate demo qemu+ssh://XXXXX/system --xml /tmp/nvme.xml  --verbose --copy-storage-all
error: internal error: cannot precreate storage for disk type 'nvme'


Actual results:
As above

Expected results:
vm migrated

Additional info:
Referring to https://libvirt.org/formatdomain.html, the migration should be suppported:

The difference between <disk type='nvme'> and <hostdev/> is that the latter is plain host device assignment with all its limitations (e.g. no live migration), while the former makes hypervisor to run the NVMe disk through hypervisor's block layer thus enabling all features provided by the layer (e.g. snapshots, domain migration, etc.).

Comment 2 Peter Krempa 2020-04-14 07:17:43 UTC
I don't think we will be able to pre-create storage for nvme and we are definitely not planning on adding a storage driver for NVMe disks. Without pre-creation (e.g. if you correctly format your destination drive) the storage migration should work.

Comment 3 Michal Privoznik 2020-04-14 14:21:17 UTC
Agreed with Peter here, I'm not quite sure how would one pre-create a PCI device. NVMe disks are PCI devices after all and thus are different to file based disks (where the corresponding file can be created anytime). In theory we could create NVMe namespaces, but that is not trivial and certainly supported by only a very few disks (in fact I haven't seen one yet). Therefore I think this should be closed.

Comment 4 Han Han 2020-04-15 00:36:07 UTC
(In reply to Michal Privoznik from comment #3)
> Agreed with Peter here, I'm not quite sure how would one pre-create a PCI
> device. NVMe disks are PCI devices after all and thus are different to file
> based disks (where the corresponding file can be created anytime). In theory
> we could create NVMe namespaces, but that is not trivial and certainly
> supported by only a very few disks (in fact I haven't seen one yet).
> Therefore I think this should be closed.

Note that the host nvme <address domain="0x0000" bus="0x44" slot="0x00" function="0x0"/> exists on host.
It should be resued by libvirt, just like copy-storage migration to host block device.

Comment 5 Michal Privoznik 2020-04-15 12:04:43 UTC
(In reply to Han Han from comment #4)
>

Ah, so maybe what we should do is to silently ignore precreate of NVMe disk if requested. I mean, for network disks, we already do that:

https://gitlab.com/libvirt/libvirt/-/blob/master/src/qemu/qemu_migration.c#L223

and maybe we can do the same for NVMe disks.

Comment 6 Michal Privoznik 2020-05-07 07:56:12 UTC
Patch proposed upstream:

https://www.redhat.com/archives/libvir-list/2020-May/msg00253.html

but according to my review I think we need slightly better approach. My previous comment simplified things too much.

Comment 7 Han Han 2020-05-09 04:27:18 UTC
(In reply to Michal Privoznik from comment #6)
> Patch proposed upstream:
> 
> https://www.redhat.com/archives/libvir-list/2020-May/msg00253.html
> 
> but according to my review I think we need slightly better approach. My
> previous comment simplified things too much.

The v2 doesn't work:
➜  ~ virsh -k0 migrate demo qemu+ssh://hp-dl385g10-15.lab.eng.pek2.redhat.com/system --verbose --copy-storage-all --xml /tmp/demo.xml                                                                             
error: internal error: cannot precreate storage for disk type 'nvme'


It will not meet the condition:
        /* Skip disks we don't want to migrate and already existing disks. */
        if (!qemuMigrationAnyCopyDisk(disk, nmigrate_disks, migrate_disks) ||
            (diskSrcPath && virFileExists(diskSrcPath))) {
            continue;
        }

Gdb debug process:
(gdb) s
qemuMigrationDstPrecreateStorage (incremental=false, migrate_disks=0x0, nmigrate_disks=0, nbd=0x7fffd800f5f0, vm=0x7fffd8008be0) at ../../src/qemu/qemu_migration.c:312                                           
312         if (!(conn = virGetConnectStorage()))
(gdb) n
315         for (i = 0; i < nbd->ndisks; i++) {
(gdb)
318             g_autofree char *nvmePath = NULL;
(gdb)
320             VIR_DEBUG("Looking up disk target '%s' (capacity=%llu)",
(gdb)
323             if (!(disk = virDomainDiskByTarget(vm->def, nbd->disks[i].target))) {
(gdb)
330             if (disk->src->type == VIR_STORAGE_TYPE_NVME) {
(gdb)
331                virPCIDeviceAddressGetSysfsFile(&disk->src->nvme->pciAddr, &nvmePath);
(gdb)
332                diskSrcPath = nvmePath;
(gdb)
338             if (!qemuMigrationAnyCopyDisk(disk, nmigrate_disks, migrate_disks) ||
(gdb) p nvmePath
$1 = 0x7fffd801a970 "/sys/bus/pci/devices/0000:45:00.0"
(gdb) n
343             if (incremental) {
(gdb) p diskSrcPath

Comment 8 Michal Privoznik 2020-05-11 06:44:04 UTC
(In reply to Han Han from comment #7)
> (In reply to Michal Privoznik from comment #6)
> > Patch proposed upstream:
> > 
> > https://www.redhat.com/archives/libvir-list/2020-May/msg00253.html
> > 
> > but according to my review I think we need slightly better approach. My
> > previous comment simplified things too much.
> 
> The v2 doesn't work:
> ➜  ~ virsh -k0 migrate demo
> qemu+ssh://hp-dl385g10-15.lab.eng.pek2.redhat.com/system --verbose
> --copy-storage-all --xml /tmp/demo.xml                                      
> 
> error: internal error: cannot precreate storage for disk type 'nvme'
> 
> 
> It will not meet the condition:
>         /* Skip disks we don't want to migrate and already existing disks. */
>         if (!qemuMigrationAnyCopyDisk(disk, nmigrate_disks, migrate_disks) ||
>             (diskSrcPath && virFileExists(diskSrcPath))) {
>             continue;
>         }
> 
> Gdb debug process:
> (gdb) s
> qemuMigrationDstPrecreateStorage (incremental=false, migrate_disks=0x0,
> nmigrate_disks=0, nbd=0x7fffd800f5f0, vm=0x7fffd8008be0) at
> ../../src/qemu/qemu_migration.c:312                                         
> 
> 312         if (!(conn = virGetConnectStorage()))
> (gdb) n
> 315         for (i = 0; i < nbd->ndisks; i++) {
> (gdb)
> 318             g_autofree char *nvmePath = NULL;
> (gdb)
> 320             VIR_DEBUG("Looking up disk target '%s' (capacity=%llu)",
> (gdb)
> 323             if (!(disk = virDomainDiskByTarget(vm->def,
> nbd->disks[i].target))) {
> (gdb)
> 330             if (disk->src->type == VIR_STORAGE_TYPE_NVME) {
> (gdb)
> 331               
> virPCIDeviceAddressGetSysfsFile(&disk->src->nvme->pciAddr, &nvmePath);
> (gdb)
> 332                diskSrcPath = nvmePath;
> (gdb)
> 338             if (!qemuMigrationAnyCopyDisk(disk, nmigrate_disks,
> migrate_disks) ||
> (gdb) p nvmePath
> $1 = 0x7fffd801a970 "/sys/bus/pci/devices/0000:45:00.0"
> (gdb) n
> 343             if (incremental) {
> (gdb) p diskSrcPath

This if() statement on line 338 looks like this:

        /* Skip disks we don't want to migrate and already existing disks. */
        if (!qemuMigrationAnyCopyDisk(disk, nmigrate_disks, migrate_disks) ||
            (diskSrcPath && virFileExists(diskSrcPath))) {
            continue;
        }

Since diskSrcPath is set to nvmePath (and thus both are set), then I guess virFileExists() must have returned false for the path. Can you please check if that is so?

I've just tested what I was suggesting and it works for me. There is one quirk - the device has to exist. In my case:

$ test -e /sys/bus/pci/devices/0000\:02\:00.0/; echo $?
0

Comment 9 Han Han 2020-05-11 08:57:21 UTC
(In reply to Michal Privoznik from comment #8)
> (In reply to Han Han from comment #7)
> > (In reply to Michal Privoznik from comment #6)
> > > Patch proposed upstream:
> > > 
> > > https://www.redhat.com/archives/libvir-list/2020-May/msg00253.html
> > > 
> > > but according to my review I think we need slightly better approach. My
> > > previous comment simplified things too much.
> > 
> > The v2 doesn't work:
> > ➜  ~ virsh -k0 migrate demo
> > qemu+ssh://hp-dl385g10-15.lab.eng.pek2.redhat.com/system --verbose
> > --copy-storage-all --xml /tmp/demo.xml                                      
> > 
> > error: internal error: cannot precreate storage for disk type 'nvme'
> > 
> > 
> > It will not meet the condition:
> >         /* Skip disks we don't want to migrate and already existing disks. */
> >         if (!qemuMigrationAnyCopyDisk(disk, nmigrate_disks, migrate_disks) ||
> >             (diskSrcPath && virFileExists(diskSrcPath))) {
> >             continue;
> >         }
> > 
> > Gdb debug process:
> > (gdb) s
> > qemuMigrationDstPrecreateStorage (incremental=false, migrate_disks=0x0,
> > nmigrate_disks=0, nbd=0x7fffd800f5f0, vm=0x7fffd8008be0) at
> > ../../src/qemu/qemu_migration.c:312                                         
> > 
> > 312         if (!(conn = virGetConnectStorage()))
> > (gdb) n
> > 315         for (i = 0; i < nbd->ndisks; i++) {
> > (gdb)
> > 318             g_autofree char *nvmePath = NULL;
> > (gdb)
> > 320             VIR_DEBUG("Looking up disk target '%s' (capacity=%llu)",
> > (gdb)
> > 323             if (!(disk = virDomainDiskByTarget(vm->def,
> > nbd->disks[i].target))) {
> > (gdb)
> > 330             if (disk->src->type == VIR_STORAGE_TYPE_NVME) {
> > (gdb)
> > 331               
> > virPCIDeviceAddressGetSysfsFile(&disk->src->nvme->pciAddr, &nvmePath);
> > (gdb)
> > 332                diskSrcPath = nvmePath;
> > (gdb)
> > 338             if (!qemuMigrationAnyCopyDisk(disk, nmigrate_disks,
> > migrate_disks) ||
> > (gdb) p nvmePath
> > $1 = 0x7fffd801a970 "/sys/bus/pci/devices/0000:45:00.0"
> > (gdb) n
> > 343             if (incremental) {
> > (gdb) p diskSrcPath
> 
> This if() statement on line 338 looks like this:
> 
>         /* Skip disks we don't want to migrate and already existing disks. */
>         if (!qemuMigrationAnyCopyDisk(disk, nmigrate_disks, migrate_disks) ||
>             (diskSrcPath && virFileExists(diskSrcPath))) {
>             continue;
>         }
> 
> Since diskSrcPath is set to nvmePath (and thus both are set), then I guess
> virFileExists() must have returned false for the path. Can you please check
> if that is so?
> 
Yes. qemuMigrationAnyCopyDisk() returns true and returns false:
338             if (!qemuMigrationAnyCopyDisk(disk, nmigrate_disks, migrate_disks) ||
(gdb) s
qemuMigrationAnyCopyDisk (nmigrate_disks=nmigrate_disks@entry=0, migrate_disks=migrate_disks@entry=0x0, disk=<optimized out>, disk=<optimized out>) at ../../src/qemu/qemu_migration.c:283                        
283         if (nmigrate_disks) {
(gdb) finish
Run till exit from #0  qemuMigrationAnyCopyDisk (nmigrate_disks=nmigrate_disks@entry=0, migrate_disks=migrate_disks@entry=0x0, disk=<optimized out>, disk=<optimized out>) at ../../src/qemu/qemu_migration.c:283 
0x00007fffae5766d2 in qemuMigrationDstPrecreateStorage (incremental=false, migrate_disks=0x0, nmigrate_disks=0, nbd=0x7fffdc00e4b0, vm=0x7fffdc019dd0) at ../../src/qemu/qemu_migration.c:338                     
338             if (!qemuMigrationAnyCopyDisk(disk, nmigrate_disks, migrate_disks) ||
Value returned is $4 = true
(gdb) s
virFileExists (path=path@entry=0x7fffdc01cca0 "/sys/bus/pci/devices/0000:45:00.0") at ../../src/util/virfile.c:1876                                                                                               
1876        return access(path, F_OK) == 0;
(gdb) finish
Run till exit from #0  virFileExists (path=path@entry=0x7fffdc01cca0 "/sys/bus/pci/devices/0000:45:00.0") at ../../src/util/virfile.c:1876                                                                        
0x00007fffae5766eb in qemuMigrationDstPrecreateStorage (incremental=false, migrate_disks=0x0, nmigrate_disks=0, nbd=0x7fffdc00e4b0, vm=0x7fffdc019dd0) at ../../src/qemu/qemu_migration.c:338                     
338             if (!qemuMigrationAnyCopyDisk(disk, nmigrate_disks, migrate_disks) ||
Value returned is $5 = false


> I've just tested what I was suggesting and it works for me. There is one
> quirk - the device has to exist. In my case:
> 
> $ test -e /sys/bus/pci/devices/0000\:02\:00.0/; echo $?
> 0

Comment 10 Michal Privoznik 2020-05-11 09:17:06 UTC
(In reply to Han Han from comment #9)
> virFileExists (path=path@entry=0x7fffdc01cca0
> "/sys/bus/pci/devices/0000:45:00.0") at ../../src/util/virfile.c:1876       
> 
> 1876        return access(path, F_OK) == 0;
> (gdb) finish
> Run till exit from #0  virFileExists (path=path@entry=0x7fffdc01cca0
> "/sys/bus/pci/devices/0000:45:00.0") at ../../src/util/virfile.c:1876       
> 
> 0x00007fffae5766eb in qemuMigrationDstPrecreateStorage (incremental=false,
> migrate_disks=0x0, nmigrate_disks=0, nbd=0x7fffdc00e4b0, vm=0x7fffdc019dd0)
> at ../../src/qemu/qemu_migration.c:338                     
> 338             if (!qemuMigrationAnyCopyDisk(disk, nmigrate_disks,
> migrate_disks) ||
> Value returned is $5 = false
> 

This means that the path "/sys/bus/pci/devices/0000:45:00.0" is not accessible. This means that the PCI device doesn't exist and therefore libvirt tries to pre-create the NVMe disk which fails. Are you sure the path really exist? e.g. ls -lZ $path or something similar.

Comment 11 Han Han 2020-05-11 09:54:02 UTC
(In reply to Michal Privoznik from comment #10)
> (In reply to Han Han from comment #9)
> > virFileExists (path=path@entry=0x7fffdc01cca0
> > "/sys/bus/pci/devices/0000:45:00.0") at ../../src/util/virfile.c:1876       
> > 
> > 1876        return access(path, F_OK) == 0;
> > (gdb) finish
> > Run till exit from #0  virFileExists (path=path@entry=0x7fffdc01cca0
> > "/sys/bus/pci/devices/0000:45:00.0") at ../../src/util/virfile.c:1876       
> > 
> > 0x00007fffae5766eb in qemuMigrationDstPrecreateStorage (incremental=false,
> > migrate_disks=0x0, nmigrate_disks=0, nbd=0x7fffdc00e4b0, vm=0x7fffdc019dd0)
> > at ../../src/qemu/qemu_migration.c:338                     
> > 338             if (!qemuMigrationAnyCopyDisk(disk, nmigrate_disks,
> > migrate_disks) ||
> > Value returned is $5 = false
> > 
> 
> This means that the path "/sys/bus/pci/devices/0000:45:00.0" is not
> accessible. This means that the PCI device doesn't exist and therefore
> libvirt tries to pre-create the NVMe disk which fails. Are you sure the path
> really exist? e.g. ls -lZ $path or something similar.

Sorry I used the wrong nvme pci address on v2 test above...
In fact it works:
➜  ~ virsh -k0 migrate demo qemu+ssh://hp-dl385g10-15.lab.eng.pek2.redhat.com/system --verbose --copy-storage-all --xml /tmp/demo.xml
Migration: [100 %]

Comment 12 Michal Privoznik 2020-05-11 10:23:29 UTC
(In reply to Han Han from comment #11)

> Sorry I used the wrong nvme pci address on v2 test above...
> In fact it works:
> ➜  ~ virsh -k0 migrate demo
> qemu+ssh://hp-dl385g10-15.lab.eng.pek2.redhat.com/system --verbose
> --copy-storage-all --xml /tmp/demo.xml
> Migration: [100 %]

Great! Can you send the patch then please? I think this is the best behaviour we can expect. Creating NVMe disks is oxymoron (we could potentially create namespaces, but let's not go down that hole), and if correct PCI address is provided then migration succeeds, if a wrong PCI address is provided then libvirt prints a sensible error, in my opinion.

Comment 13 Michal Privoznik 2020-05-26 14:30:45 UTC
I've posted the patch upstream:

https://www.redhat.com/archives/libvir-list/2020-May/msg01152.html

Comment 14 Michal Privoznik 2020-06-02 10:44:34 UTC
Pushed to master as:

a5a297f387 qemu: Skip pre-creation of NVMe disks

v6.4.0-4-ga5a297f387

Comment 17 Han Han 2020-09-16 08:36:57 UTC
Created attachment 1715033 [details]
the VM xml of src and dest

Verified on 
libvirt-6.6.0-5.module+el8.3.0+8092+f9e72d7e.x86_64
qemu-kvm-5.1.0-7.module+el8.3.0+8099+dba2fe3e.x86_64

Steps:
1. Start a VM and prepare a destination VM xml with nvme disk:
<domain>
...
    <disk type='nvme' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <target dev='vda' bus='virtio'/>
          <source type='pci' managed='yes' namespace='1'>
             <address domain='0x0000' bus='0x44' slot='0x00' function='0x0'/>
          </source>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
    </disk>
...
</domain>

2. Format the nvme disk on dest host with the same size of src disk
# virsh domblklist q35
 Target   Source
---------------------------------------------
 vda      /var/lib/libvirt/images/q35.qcow2

# qemu-img info -U /var/lib/libvirt/images/q35.qcow2
image: /var/lib/libvirt/images/q35.qcow2
file format: qcow2
virtual size: 10 GiB (10737418240 bytes)
disk size: 850 MiB
cluster_size: 65536
Format specific information:
    compat: 1.1
    compression type: zlib
    lazy refcounts: false
    refcount bits: 16
    corrupt: false

➜  ~ qemu-img create /dev/nvme0n1 10G -f qcow2
Formatting '/dev/nvme0n1', fmt=qcow2 cluster_size=65536 compression_type=zlib size=10737418240 lazy_refcounts=off refcount_bits=16


3. Migrate the vm with --copy-storage-all and --xml
# virsh migrate q35 qemu+ssh://XXXX/system --copy-storage-all --xml /tmp/src-dst-XMLs/q35-dst.xml

If the virtual size of src is not equal to that in dest, it will hit the error of https://bugzilla.redhat.com/show_bug.cgi?id=1822538#c14

Comment 20 errata-xmlrpc 2020-11-17 17:48:08 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (virt:8.3 bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:5137


Note You need to log in before you can comment on or make changes to this bug.