Description of problem: During migrate vm with hugepages used from qemu 5.1 (RHEL-AV 8.3) to qemu5.2 (RHEL-AV 8.4), migration failed with below error - qemu-kvm: Unknown ramblock "pc.ram", cannot accept migration qemu-kvm: error while loading state for instance 0x0 of device 'ram' qemu-kvm: load of migration failed: Invalid argument Version-Release number of selected component (if applicable): Source versions: libvirt-daemon-6.6.0-9.module+el8.3.1+9131+fb7f8c9f.x86_64 & qemu-kvm-5.1.0-16.module+el8.3.1+8958+410ab178.x86_64 Target versions: libvirt-daemon-6.10.0-1.module+el8.4.0+8898+a84e86e1.x86_64 & qemu-kvm-5.2.0-0.module+el8.4.0+8855+a9e237a9.x86_64 How reproducible: Always Steps to Reproduce: 1.Start vm with below configuration in source machine - <memoryBacking> <hugepages> <page size='2048' unit='KiB'/> </hugepages> </memoryBacking> <vcpu placement='static'>4</vcpu> ... <os> <type arch='x86_64' machine='pc-i440fx-rhel7.6.0'>hvm</type> <boot dev='hd'/> </os> 2.# virsh migrate rhel8 qemu+ssh://10.73.xx.xx/system --live --verbose root.xx.xx's password: error: internal error: qemu unexpectedly closed the monitor: 2020-12-14T09:05:25.349045Z qemu-kvm: -device cirrus-vga,id=video0,bus=pci.0,addr=0x2: warning: 'cirrus-vga' is deprecated, please use a different VGA card instead 2020-12-14T09:05:25.669997Z qemu-kvm: Unknown ramblock "pc.ram", cannot accept migration 2020-12-14T09:05:25.670041Z qemu-kvm: error while loading state for instance 0x0 of device 'ram' 2020-12-14T09:05:25.670457Z qemu-kvm: load of migration failed: Invalid argument Expected results: Migration succeed Actual results: Migration failed. Additional info: With adding 'x-use-canonical-path-for-ramblock-id=off', the migration can succeed. (Pls see details in comment 24 of bug 1836043 )
The whole qemu-cmd lines in source and target machines are in the attachment of bug 1836043. Thanks
Created attachment 1744242 [details] domain xml file
On our 5.2 in info mtree I see: address-space: cpu-memory-0 0000000000000000-ffffffffffffffff (prio 0, i/o): system 0000000000000000-000000003fffffff (prio 0, ram): alias ram-below-4g @/objects/pc.ram 0000000000000000-000000003fffffff where as on upstream I see: address-space: cpu-memory-0 0000000000000000-ffffffffffffffff (prio 0, i/o): system 0000000000000000-000000003fffffff (prio 0, ram): alias ram-below-4g @pc.ram 0000000000000000-000000003fffffff on ours, using -M q35 seems to give @pc.ram, so whatever broke is just on the pc side.
Reproduce this bz on hosts[1] with clis[2] [1] src host: qemu-img-5.1.0-16.module+el8.3.1+8958+410ab178.x86_64 dst host: qemu-img-5.2.0-2.module+el8.4.0+9186+ec44380f.x86_64 [2] src qemu cli: /usr/libexec/qemu-kvm \ -name "mouse-vm" \ -sandbox off \ -machine pc-i440fx-rhel7.6.0 \ -cpu IvyBridge-IBRS \ ... -m 2560 \ -mem-path /dev/hugepages \ -mem-prealloc \ -overcommit mem-lock=off \ dst qemu cli: /usr/libexec/qemu-kvm \ -name "mouse-vm" \ -sandbox off \ -machine pc-i440fx-rhel7.6.0,memory-backend=pc.ram \ -cpu IvyBridge-IBRS \ ... -m 2560 \ -object memory-backend-memfd,id=pc.ram,hugetlb=yes,hugetlbsize=2097152,prealloc=yes,size=2684354560 \ -overcommit mem-lock=off \ -incoming defer \ But couldn't migrate successfully use same clis and environment as reproduced test but update the dst cli "-object memory-backend-memfd,id=pc.ram,hugetlb=yes,hugetlbsize=2097152,prealloc=yes,size=2684354560" to "-object memory-backend-file,id=pc.ram,mem-path=/dev/hugepages,prealloc=yes,size=2684354560,x-use-canonical-path-for-ramblock-id=off"(or to "-object memory-backend-memfd,id=pc.ram,hugetlb=yes,hugetlbsize=2097152,prealloc=yes,size=2684354560,x-use-canonical-path-for-ramblock-id=off"). Migration failed with error on dst host: (qemu) qemu-kvm: get_pci_config_device: Bad config data: i=0x9a read: 3 device: 6 cmask: ff wmask: 0 w1cmask:0 qemu-kvm: Failed to load PCIDevice:config qemu-kvm: Failed to load virtio-scsi:virtio qemu-kvm: error while loading state for instance 0x0 of device '0000:00:04.0/virtio-scsi' qemu-kvm: load of migration failed: Invalid argument virtio-scsi system disk cli: -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=0x4 \ -device scsi-hd,id=image1,drive=drive_image1,bus=virtio_scsi_pci0.0,channel=0,scsi-id=0,lun=0,bootindex=0 \ -blockdev driver=file,cache.direct=on,cache.no-flush=off,filename=/mnt/glusterfs/rhel840-64-virtio-scsi.qcow2,node-name=drive_sys1 \ -blockdev driver=qcow2,node-name=drive_image1,file=drive_sys1 \ Qi Jing, I see you use ide system disk, do you know the root of the failure about virtio-scsi system disk?
According to Comment 3, update qemu commands from pc to q35 machine type, but still fail to migration with same environment as Comment 4: (qemu) qemu-kvm: get_pci_config_device: Bad config data: i=0x71 read: a0 device: 0 cmask: ff wmask: 0 w1cmask:0 qemu-kvm: Failed to load PCIDevice:config qemu-kvm: Failed to load xhci:parent_obj qemu-kvm: error while loading state for instance 0x0 of device '0000:00:02.0:00.0/xhci' qemu-kvm: load of migration failed: Invalid argument [root@lenovo-sr630-01 home]# diff dst.sh src.sh 4c4 < -machine q35,memory-backend=pc.ram \ --- > -machine q35 \ 30c30,31 < -object memory-backend-memfd,id=pc.ram,hugetlb=yes,hugetlbsize=2097152,prealloc=yes,size=2684354560 \ --- > -mem-path /dev/hugepages \ > -mem-prealloc \ 42d42 < -incoming defer \ src qemu command: # cat src.sh /usr/libexec/qemu-kvm \ -name "mouse-vm" \ -sandbox off \ -machine q35 \ -device vmcoreinfo \ -cpu IvyBridge-IBRS \ -nodefaults \ -vga std \ -chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/monitor-qmpmonitor1,server,nowait \ -chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/monitor-catch_monitor,server,nowait \ -mon chardev=qmp_id_qmpmonitor1,mode=control \ -mon chardev=qmp_id_catch_monitor,mode=control \ -device pcie-root-port,port=0x10,chassis=1,id=root0,bus=pcie.0,multifunction=on,addr=0x2 \ -device pcie-root-port,port=0x11,chassis=2,id=root1,bus=pcie.0,addr=0x2.0x1 \ -device pcie-root-port,port=0x12,chassis=3,id=root2,bus=pcie.0,addr=0x2.0x2 \ -device pcie-root-port,port=0x13,chassis=4,id=root3,bus=pcie.0,addr=0x2.0x3 \ -device pcie-root-port,port=0x14,chassis=5,id=root4,bus=pcie.0,addr=0x2.0x4 \ -device pcie-root-port,port=0x15,chassis=6,id=root5,bus=pcie.0,addr=0x2.0x5 \ -device pcie-root-port,port=0x16,chassis=7,id=root6,bus=pcie.0,addr=0x2.0x6 \ -device pcie-root-port,port=0x17,chassis=8,id=root7,bus=pcie.0,addr=0x2.0x7 \ -device nec-usb-xhci,id=usb1,bus=root0 \ -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=root1 \ -device scsi-hd,id=image1,drive=drive_image1,bus=virtio_scsi_pci0.0,channel=0,scsi-id=0,lun=0,bootindex=0 \ -device virtio-net-pci,mac=9a:8a:8b:8c:8d:8e,id=net0,vectors=4,netdev=tap0,bus=root2 \ -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \ -blockdev driver=file,cache.direct=on,cache.no-flush=off,filename=/mnt/glusterfs/rhel840-64-virtio-scsi.qcow2,node-name=drive_sys1 \ -blockdev driver=qcow2,node-name=drive_image1,file=drive_sys1 \ -netdev tap,id=tap0,vhost=on \ -m 2560 \ -mem-path /dev/hugepages \ -mem-prealloc \ -overcommit mem-lock=off \ -smp 4,maxcpus=4,cores=2,threads=1,sockets=2 \ -vnc :10 \ -rtc base=utc,clock=host \ -boot menu=off,strict=off,order=cdn,once=c \ -enable-kvm \ -qmp tcp:0:3333,server,nowait \ -qmp tcp:0:9999,server,nowait \ -qmp tcp:0:9888,server,nowait \ -serial tcp:0:4444,server,nowait \ -monitor stdio \
I have filed two bzs about issues in Comment 4 & 5 according to Dave 's advise, thanks. Bug 1912846 - qemu-kvm: Failed to load xhci:parent_obj during migration Bug 1912842 - qemu-kvm: Failed to load virtio-scsi:virtio during migration
I think this is the x-use-canonical-path-for-ramblock-id hack which still looks like it's there to me in hw_compat_rhel_8_0_len and I see that's called in pc_machine_rhel760_options
(In reply to Li Xiaohui from comment #4) > Reproduce this bz on hosts[1] with clis[2] > > [1] > src host: qemu-img-5.1.0-16.module+el8.3.1+8958+410ab178.x86_64 > dst host: qemu-img-5.2.0-2.module+el8.4.0+9186+ec44380f.x86_64 > > [2] > src qemu cli: > /usr/libexec/qemu-kvm \ > -name "mouse-vm" \ > -sandbox off \ > -machine pc-i440fx-rhel7.6.0 \ > -cpu IvyBridge-IBRS \ > ... > -m 2560 \ > -mem-path /dev/hugepages \ > -mem-prealloc \ > -overcommit mem-lock=off \ > > dst qemu cli: > /usr/libexec/qemu-kvm \ > -name "mouse-vm" \ > -sandbox off \ > -machine pc-i440fx-rhel7.6.0,memory-backend=pc.ram \ > -cpu IvyBridge-IBRS \ > ... > -m 2560 \ > -object > memory-backend-memfd,id=pc.ram,hugetlb=yes,hugetlbsize=2097152,prealloc=yes, > size=2684354560 \ > -overcommit mem-lock=off \ > -incoming defer \ > > > But couldn't migrate successfully use same clis and environment as > reproduced test but update the dst cli "-object > memory-backend-memfd,id=pc.ram,hugetlb=yes,hugetlbsize=2097152,prealloc=yes, > size=2684354560" to "-object > memory-backend-file,id=pc.ram,mem-path=/dev/hugepages,prealloc=yes, > size=2684354560,x-use-canonical-path-for-ramblock-id=off"(or to "-object > memory-backend-memfd,id=pc.ram,hugetlb=yes,hugetlbsize=2097152,prealloc=yes, > size=2684354560,x-use-canonical-path-for-ramblock-id=off"). > > Migration failed with error on dst host: > (qemu) qemu-kvm: get_pci_config_device: Bad config data: i=0x9a read: 3 > device: 6 cmask: ff wmask: 0 w1cmask:0 > qemu-kvm: Failed to load PCIDevice:config > qemu-kvm: Failed to load virtio-scsi:virtio > qemu-kvm: error while loading state for instance 0x0 of device > '0000:00:04.0/virtio-scsi' > qemu-kvm: load of migration failed: Invalid argument > > virtio-scsi system disk cli: > -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=0x4 \ > -device > scsi-hd,id=image1,drive=drive_image1,bus=virtio_scsi_pci0.0,channel=0,scsi- > id=0,lun=0,bootindex=0 \ > -blockdev > driver=file,cache.direct=on,cache.no-flush=off,filename=/mnt/glusterfs/ > rhel840-64-virtio-scsi.qcow2,node-name=drive_sys1 \ > -blockdev driver=qcow2,node-name=drive_image1,file=drive_sys1 \ > > Qi Jing, I see you use ide system disk, do you know the root of the failure > about virtio-scsi system disk? For this disk issue, I asked the @fjin in our team. She said that some downstream patches was not in current RHEL-AV8.4 and the migration test has not been started now.
Nothing much seems to have changed recently on the qemu side: /usr/libexec/qemu-kvm -cpu host -machine pc-i440fx-rhel7.6.0 -m 1G -mem-path /dev/shm -nographic uses pc.ram on 5.2,5.1 and 4.2 downstream qemu /usr/libexec/qemu-kvm -cpu host -M pc-i440fx-rhel7.6.0,accel=kvm,memory-backend=pc.ram -object memory-backend-memfd,id=pc.ram,size=1G -m 1G -nographic uses /objects/pc.ram on 5.2.0 5.1.0 and 5.0.0 downstream qemu (4.2 doesn't have memory-backend) the hw_compat_rhel_8_0 has: /* hw_compat_rhel_8_0 from hw_compat_3_1 */ { "memory-backend-file", "x-use-canonical-path-for-ramblock-id", "true" }, /* hw_compat_rhel_8_0 from hw_compat_3_1 */ { "memory-backend-memfd", "x-use-canonical-path-for-ramblock-id", "true" }, so I think that's what qemu is expecting to do for older machine types specified with -object
Issue is that when -machine memory-backend=pc.ram \ -object memory-backend-memfd,id=pc.ram created backend may change MemoryRegion name to canonical one that adds '/object/' to 'id', when compat property is set to 'x-use-canonical-path-for-ramblock-id=on'. so for old machine types where 'x-use-canonical-path-for-ramblock-id=on', -object creates backend with MemoryRegion 'id' that contains '/object/' in it, hen migration fails as old target QEMU or QEMU started without -machine memory-backend expect prefix-less 'id' in migration stream. It's possible to fix on libvirt side by always adding 'x-use-canonical-path-for-ramblock-id=off' to to the backend that will be used with -machine memory-backend. (That's what legacy -m X, does under hood, x-use-canonical-path-for-ramblock-id option is available since qemu-4.0) Another alternative could be not using '-machine memory-backend' for machines older than upstream 4.0 and downstream rhel8.0.
Libvirt will use memory-backend only if it finds 'default-ram-id' attribute in 'query-machines' output (it's stored per machine type, just like other attributes: cpu-max, hotpluggable-cpus, etc.). When constructing cmd line and no 'default-ram-id' was reported, then the old style is used (-m X). Since migration between machine types is not supported, can we just make qemu NOT report default-ram-id? Or is it too late for that? Also, the "x-" prefix doesn't strike confidence ;-) Is there a plan to stabilize the attribute?
(In reply to Michal Privoznik from comment #11) > Libvirt will use memory-backend only if it finds 'default-ram-id' attribute > in 'query-machines' output (it's stored per machine type, just like other > attributes: cpu-max, hotpluggable-cpus, etc.). When constructing cmd line > and no 'default-ram-id' was reported, then the old style is used (-m X). > Since migration between machine types is not supported, can we just make > qemu NOT report default-ram-id? Or is it too late for that? yes, it's too late for that and there is not reason to avoid using memory-backend with old machine types as long as QEMU supports it. > Also, the "x-" prefix doesn't strike confidence ;-) Is there a plan to > stabilize the attribute? 'x-' is used not only for temporary but for internal knobs. In this case it's compat knob that's not going to change or be removed. (removal could become possible only after we deprecate/remove 4.0 machine type). We can drop 'x-' prefix if it's must, it shouldn't break anything on QEMU side (aside downstream shall be careful on rebase and rename it also).
(In reply to Igor Mammedov from comment #12) > (In reply to Michal Privoznik from comment #11) > > Libvirt will use memory-backend only if it finds 'default-ram-id' attribute > > in 'query-machines' output (it's stored per machine type, just like other > > attributes: cpu-max, hotpluggable-cpus, etc.). When constructing cmd line > > and no 'default-ram-id' was reported, then the old style is used (-m X). > > Since migration between machine types is not supported, can we just make > > qemu NOT report default-ram-id? Or is it too late for that? > yes, it's too late for that > and there is not reason to avoid using memory-backend with old machine > types as long as QEMU supports it. Another idea (but not really I guess) is that along with 'default-ram-id' for given machine type qemu would report also value of 'x-use-canonical-path-for-ramblock-id'. And then libvirt would use -machine memory-backend if and only if both were true. It's basically the same idea as not reporting 'default-ram-id' but implemented differently. > > > Also, the "x-" prefix doesn't strike confidence ;-) Is there a plan to > > stabilize the attribute? > 'x-' is used not only for temporary but for internal knobs. > In this case it's compat knob that's not going to change or be removed. > (removal could become possible only after we deprecate/remove 4.0 machine > type). > > We can drop 'x-' prefix if it's must, it shouldn't break anything on QEMU > side (aside downstream shall be careful on rebase and rename it also). Yeah, now that I think about it more, let's not rename this because then libvirt would have to differentiate whether it's talking to a version where it isn't renamed or it is renamed. Now it can rely solely on the fact that 'default-ram-id' reporting was implemented after 'x-use-canonical-....'. But if libvirt enables this - we are stuck with it forever, aren't we? I mean, the object will always have non-canonical path, regardless of machine type or qemu version.
(In reply to Michal Privoznik from comment #13) > (In reply to Igor Mammedov from comment #12) > > (In reply to Michal Privoznik from comment #11) > > > Libvirt will use memory-backend only if it finds 'default-ram-id' attribute > > > in 'query-machines' output (it's stored per machine type, just like other > > > attributes: cpu-max, hotpluggable-cpus, etc.). When constructing cmd line > > > and no 'default-ram-id' was reported, then the old style is used (-m X). > > > Since migration between machine types is not supported, can we just make > > > qemu NOT report default-ram-id? Or is it too late for that? > > yes, it's too late for that > > and there is not reason to avoid using memory-backend with old machine > > types as long as QEMU supports it. > > > Another idea (but not really I guess) is that along with 'default-ram-id' > for given machine type qemu would report also value of > 'x-use-canonical-path-for-ramblock-id'. And then libvirt would use -machine > memory-backend if and only if both were true. It's basically the same idea > as not reporting 'default-ram-id' but implemented differently. I'm afarid it's alredy too late for that. > > > > > > Also, the "x-" prefix doesn't strike confidence ;-) Is there a plan to > > > stabilize the attribute? > > 'x-' is used not only for temporary but for internal knobs. > > In this case it's compat knob that's not going to change or be removed. > > (removal could become possible only after we deprecate/remove 4.0 machine > > type). > > > > We can drop 'x-' prefix if it's must, it shouldn't break anything on QEMU > > side (aside downstream shall be careful on rebase and rename it also). > > Yeah, now that I think about it more, let's not rename this because then > libvirt would have to differentiate whether it's talking to a version where > it isn't renamed or it is renamed. Now it can rely solely on the fact that > 'default-ram-id' reporting was implemented after 'x-use-canonical-....'. > > But if libvirt enables this - we are stuck with it forever, aren't we? I > mean, the object will always have non-canonical path, regardless of machine > type or qemu version. it's by default non canonical, so we stuck with it till machine version 4.0 is deprecated and removed (in practice for foreseeable future). If you can limit using 'x-use-canonical-....' only for 4.0 and older machines types, it will eventually wean out and when 4.0 is removed all libvirt would need to do is drop dead by then code that adds 'x-use-canonical-...'.
I've posted patches upstream: https://www.redhat.com/archives/libvir-list/2021-January/msg00601.html but they depend on Igor's patch: https://lists.nongnu.org/archive/html/qemu-devel/2021-01/msg01979.html
v2: https://www.redhat.com/archives/libvir-list/2021-January/msg00684.html
QEMU patch documenting use of 'x-use-canonical-path-for-ramblock-id' is in master now: commit 8db0b20415c129cf5e577a593a4a0372d90b7cc9 machine: add missing doc for memory-backend option
v3: https://listman.redhat.com/archives/libvir-list/2021-February/msg00629.html
Pushed upstream as: 677c90cc1d qemu: Do not Use canonical path for system memory 204dfbe15d qemu_capabilities: Introduce QEMU_CAPS_X_USE_CANONICAL_PATH_FOR_RAMBLOCK_ID v7.0.0-384-g677c90cc1d To POST: http://post-office.corp.redhat.com/archives/rhvirt-patches/2021-February/msg00265.html
Verified with versions: target: libvirt-7.0.0-4.module+el8.4.0+10093+e085f1eb.x86_64 & qemu-kvm-5.2.0-5.module+el8.4.0+9775+0937c167.x86_64 source: libvirt-6.6.0-9.module+el8.3.1+9131+fb7f8c9f.x86_64 & qemu-kvm-5.1.0-16.module+el8.3.1+8958+410ab178.x86_64 1.Start vm with below configuration in source machine - <memoryBacking> <hugepages> <page size='2048' unit='KiB'/> </hugepages> </memoryBacking> <vcpu placement='static'>4</vcpu> ... <os> <type arch='x86_64' machine='pc-i440fx-rhel7.6.0'>hvm</type> <boot dev='hd'/> </os> 2. Migrate vm back and forth, both succeeded. virsh migrate rhel qemu+ssh://10.*.*.*/system --live
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (virt:av bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:2098