Description of problem: When IOMMU is enabled on "virtio-net-pci", the migration fails. -device virtio-net-pci,disable-legacy=on,disable-modern=off,iommu_platform=on Send migration command, guest hang, the following error is encountered: (qemu) qemu-kvm: GLib: gmem.c:135: failed to allocate 17592186147560 bytes Version-Release number of selected component (if applicable): Host name: ibm-p9b-13.ibm2.lab.eng.bos.redhat.com Host Distro: RHEL-8.3.0-20200912.n.0 BaseOS ppc64le Host Kernel: 4.18.0-236.el8.ppc64le Guest Kernel: 4.18.0-236.el8.ppc64le Qemu-kvm: qemu-kvm-5.1.0-6.module+el8.3.0+8041+42ff16b8 How reproducible: 100% Steps to Reproduce: 1.boot guest /usr/libexec/qemu-kvm \ -name 'avocado-vt-vm1' \ -sandbox on \ -machine pseries \ -nodefaults \ -device VGA,bus=pci.0,addr=0x2 \ -m 20480 \ -smp 32,maxcpus=32,cores=16,threads=1,sockets=2 \ -cpu 'host' \ -chardev socket,server,id=qmp_id_qmpmonitor1,path=/var/tmp/monitor-qmpmonitor1,nowait \ -mon chardev=qmp_id_qmpmonitor1,mode=control \ -chardev socket,server,id=chardev_serial0,path=/var/tmp/serial-serial0,nowait \ -device spapr-vty,id=serial0,reg=0x30000000,chardev=chardev_serial0 \ -device qemu-xhci,id=usb1,bus=pci.0,addr=0x3 \ -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \ -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=0x4 \ -blockdev node-name=file_image1,driver=file,aio=threads,filename=/home/kar/vt_test_images/rhel830-ppc64le-virtio-scsi.qcow2,cache.direct=on,cache.no-flush=off \ -blockdev node-name=drive_image1,driver=qcow2,cache.direct=on,cache.no-flush=off,file=file_image1 \ -device scsi-hd,id=image1,drive=drive_image1,write-cache=on \ -device virtio-net-pci,mac=9a:4c:f5:51:1a:1f,id=idI0Gfwv,netdev=idSFFpRU,bus=pci.0,addr=0x5,disable-legacy=on,disable-modern=off,iommu_platform=on \ ================================== Turn on IOMMU -netdev tap,id=idSFFpRU,vhost=on \ -vnc :20 \ -rtc base=utc,clock=host \ -boot menu=off,order=cdn,once=c,strict=off \ -enable-kvm \ -monitor stdio ============================================================ /usr/libexec/qemu-kvm \ -name 'avocado-vt-vm1' \ -sandbox on \ -machine pseries \ -nodefaults \ -device VGA,bus=pci.0,addr=0x2 \ -m 20480 \ -smp 32,maxcpus=32,cores=16,threads=1,sockets=2 \ -cpu 'host' \ -chardev socket,server,id=qmp_id_qmpmonitor1,path=/var/tmp/monitor-qmpmonitor2,nowait \ -mon chardev=qmp_id_qmpmonitor1,mode=control \ -chardev socket,server,id=chardev_serial0,path=/var/tmp/serial-serial2,nowait \ -device spapr-vty,id=serial0,reg=0x30000000,chardev=chardev_serial0 \ -device qemu-xhci,id=usb1,bus=pci.0,addr=0x3 \ -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \ -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=0x4 \ -blockdev node-name=file_image1,driver=file,aio=threads,filename=/home/kar/vt_test_images/rhel830-ppc64le-virtio-scsi.qcow2,cache.direct=on,cache.no-flush=off \ -blockdev node-name=drive_image1,driver=qcow2,cache.direct=on,cache.no-flush=off,file=file_image1 \ -device scsi-hd,id=image1,drive=drive_image1,write-cache=on \ -device virtio-net-pci,mac=9a:4c:f5:51:1a:1f,id=idI0Gfwv,netdev=idSFFpRU,bus=pci.0,addr=0x5,disable-legacy=on,disable-modern=off,iommu_platform=on \ ================================== Turn on IOMMU -netdev tap,id=idSFFpRU,vhost=on \ -vnc :30 \ -rtc base=utc,clock=host \ -boot menu=off,order=cdn,once=c,strict=off \ -enable-kvm \ -monitor stdio \ -incoming defer 2.Send migration command: # nc -U /var/tmp/monitor-qmpmonitor2 {"QMP": {"version": {"qemu": {"micro": 0, "minor": 1, "major": 5}, "package": "qemu-kvm-5.1.0-6.module+el8.3.0+8041+42ff16b8"}, "capabilities": ["oob"]}} {"execute": "qmp_capabilities", "id": "YQbCZ0Th"} {"return": {}, "id": "YQbCZ0Th"} {"execute": "migrate-incoming", "arguments": {"uri": "tcp:[::]:5467"}, "id": "xg5eKNBA"} {"return": {}, "id": "xg5eKNBA"} ======================================================== # nc -U /var/tmp/monitor-qmpmonitor1 {"QMP": {"version": {"qemu": {"micro": 0, "minor": 1, "major": 5}, "package": "qemu-kvm-5.1.0-6.module+el8.3.0+8041+42ff16b8"}, "capabilities": ["oob"]}} {"execute": "qmp_capabilities", "id": "YQbCZ0Th"} {"return": {}, "id": "YQbCZ0Th"} {"execute": "migrate", "arguments": {"uri": "tcp:localhost:5467", "blk": false, "inc": false}, "id": "hVNuVzEv"} {"return": {}, "id": "hVNuVzEv"} 3. hit this issue, the guest hang: (qemu) qemu-kvm: GLib: gmem.c:135: failed to allocate 17592186147560 bytes Actual results: guest hang Expected results: The migration can be completed normally Additional info: Not sure if the x86 platform has the same issue x86 platform test, update later
Didn't hit this issue when migrate vm with following qemu clis on x86 hosts: Environments: hosts: kernel-4.18.0-234.el8.x86_64 & qemu-kvm-5.1.0-6.module+el8.3.0+8041+42ff16b8.x86_64 guest: kernel-4.18.0-232.el8.x86_64 qemu clis: /usr/libexec/qemu-kvm \ -name "mouse-vm",debug-threads=on \ -sandbox off \ -machine q35 \ -cpu Skylake-Client \ -nodefaults \ -device VGA \ -chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/monitor-qmpmonitor1,server,nowait \ -chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/monitor-catch_monitor,server,nowait \ -mon chardev=qmp_id_qmpmonitor1,mode=control \ -mon chardev=qmp_id_catch_monitor,mode=control \ -device pcie-root-port,port=0x10,chassis=1,id=root0,bus=pcie.0,multifunction=on,addr=0x2 \ -device pcie-root-port,port=0x11,chassis=2,id=root1,bus=pcie.0,addr=0x2.0x1 \ -device pcie-root-port,port=0x12,chassis=3,id=root2,bus=pcie.0,addr=0x2.0x2 \ -device pcie-root-port,port=0x13,chassis=4,id=root3,bus=pcie.0,addr=0x2.0x3 \ -device pcie-root-port,port=0x14,chassis=5,id=root4,bus=pcie.0,addr=0x2.0x4 \ -device pcie-root-port,port=0x15,chassis=6,id=root5,bus=pcie.0,addr=0x2.0x5 \ -device pcie-root-port,port=0x16,chassis=7,id=root6,bus=pcie.0,addr=0x2.0x6 \ -device pcie-root-port,port=0x17,chassis=8,id=root7,bus=pcie.0,addr=0x2.0x7 \ -device nec-usb-xhci,id=usb1,bus=root0 \ -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=root1 \ -device scsi-hd,id=image1,drive=drive_image1,bus=virtio_scsi_pci0.0,channel=0,scsi-id=0,lun=0,bootindex=0 \ -device virtio-net-pci,mac=9a:8a:8b:8c:8d:8e,id=net0,vectors=4,netdev=tap0,bus=root2,disable-legacy=on,disable-modern=off,iommu_platform=on \ #---->virtio-net-pci configure -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \ -blockdev driver=file,cache.direct=on,cache.no-flush=off,filename=/mnt/nfs/rhel830-64-virtio-scsi-0818.qcow2,node-name=drive_sys1 \ -blockdev driver=qcow2,node-name=drive_image1,file=drive_sys1 \ -netdev tap,id=tap0,vhost=on \ -m 4096 \ -smp 4,maxcpus=4,cores=2,threads=1,sockets=2 \ -vnc :10 \ -rtc base=utc,clock=host \ -boot menu=off,strict=off,order=cdn,once=c \ -enable-kvm \ -qmp tcp:0:3333,server,nowait \ -serial tcp:0:4444,server,nowait \ -monitor stdio \ Notes: try migration with pc+seabios guest, still not hit this issue.
(In reply to Li Xiaohui from comment #1) Thanks, there is no hit this issue on the X86 platform, so set to ppc only.
The root cause is that QEMU allocates a bitmap to log pages that are dirtied by the in-kernel vhost device during migration. This bitmap accounts for all the guest RAM and DMA. In the case of POWER, DMA addresses start at 0x800000000000000ULL by default. This causes the bitmap size to be insanely big and g_malloc0() fails. The guest is stuck because QEMU is aborting actually. Lowering the DMA addresses to a smaller value, eg. adding: -global spapr-pci-host-bridge.dma64_win_addr=0x80000000 on the QEMU command line allows the migration to succeed. So this raise several questions: - the default base for DMA addresses (0x800000000000000ULL) is a guest visible setting we have since QEMU-2.7.0 and I'm not sure we can/want to change it - the vhost log bitmap as it is implemented today doesn't scale with huge addresses - the virtqueue used ring address that is used to compute the size of the bitmap comes from the guest. It is certainly possible to hack an x86 guest to provide a similarly huge address and crash QEMU - all the vhost code is architecture agnostic Not sure if the POWER virt team is the best fit to address this.
Bug reproduction: Host: [root@ibm-p9b-27 home]# uname -r 4.18.0-236.el8.ppc64le qemu-kvm-5.1.0-6.module+el8.3.0+8041+42ff16b8.ppc64le SLOF-20200717-1.gite18ddad8.module+el8.3.0+7638+07cf13d2.noarch qemu cli: /usr/libexec/qemu-kvm \ -name 'avocado-vt-vm1' \ -sandbox on \ -machine pseries \ -nodefaults \ -device VGA,bus=pci.0,addr=0x2 \ -m 4096 \ -smp 80,maxcpus=80,cores=40,threads=1,sockets=2 \ -cpu 'host' \ -object iothread,id=iothread0 \ -chardev socket,path=/var/tmp/monitor-qmpmonitor1,nowait,id=qmp_id_qmpmonitor1,server \ -mon chardev=qmp_id_qmpmonitor1,mode=control \ -chardev socket,path=/var/tmp/serial-serial0,nowait,id=chardev_serial0,server \ -device spapr-vty,id=serial0,reg=0x30000000,chardev=chardev_serial0 \ -device qemu-xhci,id=usb1,bus=pci.0,addr=0x3 \ -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \ -device usb-kbd,id=usb-kbd1,bus=usb1.0,port=2 \ -device usb-mouse,id=usb-mouse1,bus=usb1.0,port=3 \ -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=0x4,iothread=iothread0 \ -blockdev node-name=file_image1,driver=file,aio=threads,filename=/home/xianwang/rhel830-ppc64le-virtio-scsi_p9.qcow2,cache.direct=on,cache.no-flush=off \ -blockdev node-name=drive_image1,driver=qcow2,cache.direct=on,cache.no-flush=off,file=file_image1 \ -device scsi-hd,id=image1,drive=drive_image1,write-cache=on,bus=virtio_scsi_pci0.0,channel=0,scsi-id=0,lun=0 \ -device virtio-net-pci,mac=9a:58:80:27:08:7c,id=idji2KPU,netdev=idYBxx2l,bus=pci.0,addr=0x5,disable-legacy=on,disable-modern=off,iommu_platform=on \ -netdev tap,id=idYBxx2l,vhost=on \ -vnc :11 \ -rtc base=utc,clock=host \ -boot menu=off,order=cdn,once=c,strict=off \ -enable-kvm \ -monitor stdio source side: (qemu) migrate -d tcp:127.0.0.1:5801 (qemu) qemu-kvm: GLib: gmem.c:135: failed to allocate 17592186150680 bytes
After discussion with David, I'll dig some more to see if we can support a sparse bitmap for the vhost log.
test with fixed qemu-kvm and kernel builds, it works well on x86 hosts: hosts info: kernel-4.18.0-239.1.el8.bz1883084.x86_64&qemu-img-5.1.0-9.module+el8.3.0+7652+b30e6901.bz1879349.x86_64 (1)Migration, reboot and shutdown successfully when boot q35+seabios rhel8.3.0 guest with network commands: -device virtio-net-pci,mac=9a:8a:8b:8c:8d:8e,id=net0,vectors=4,netdev=tap0,bus=root2,disable-legacy=on,disable-modern=off,iommu_platform=on \ -netdev tap,id=tap0,vhost=on \ (2)Migration, reboot and shutdown successfully when boot pc+seabios rhel8.3.0 guest with network commands -device virtio-net-pci,mac=9a:8a:8b:8c:8d:8e,id=net0,vectors=4,netdev=tap0,bus=pci.0,addr=0x5,disable-legacy=on,disable-modern=off,iommu_platform=on \ -netdev tap,id=tap0,vhost=on \
*** Bug 1882393 has been marked as a duplicate of this bug. ***
Now waiting for the following patch to be merged upstream: http://patchwork.ozlabs.org/project/qemu-devel/patch/160208823418.29027.15172801181796272300.stgit@bahia.lan/
Fix merged upstream as: commit 170a6794efde98fb1ad70f59d4cd9af7decf279d Author: Greg Kurz <groug> Date: Wed Oct 7 18:30:34 2020 +0200 vhost: Don't special case vq->used_phys in vhost_get_log_size() The first loop in vhost_get_log_size() computes the size of the dirty log bitmap so that it allows to track changes in the entire guest memory, in terms of GPA. When not using a vIOMMU, the address of the vring's used structure, vq->used_phys, is a GPA. It is thus already covered by the first loop. When using a vIOMMU, vq->used_phys is a GIOVA that will be translated to an HVA when the vhost backend needs to update the used structure. It will log the corresponding GPAs into the bitmap but it certainly won't log the GIOVA. So in any case, vq->used_phys shouldn't be explicitly used to size the bitmap. Drop the second loop. This fixes a crash of the source when migrating a guest using in-kernel vhost-net and iommu_platform=on on POWER, because DMA regions are put over 0x800000000000000ULL. The resulting insanely huge log size causes g_malloc0() to abort. BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1879349 Signed-off-by: Greg Kurz <groug> Message-Id: <160208823418.29027.15172801181796272300.stgit> Acked-by: Jason Wang <jasowang> Reviewed-by: Michael S. Tsirkin <mst> Signed-off-by: Michael S. Tsirkin <mst>
Bug verification: Host: [root@ibm-p9b-26 home]# uname -r 4.18.0-252.el8.dt2.ppc64le qemu-kvm-5.2.0-0.module+el8.4.0+8855+a9e237a9.ppc64le SLOF-20200717-1.gite18ddad8.module+el8.4.0+8855+a9e237a9.noarch Guest: 4.18.0-252.el8.ppc64le Steps are same with comment4. Result: Migration completed and vm works well after migration. So, this issue is fixed.
Hi, Greg, For comment 15, I want to modify its ITM to ITM6 or ITM 7, do you think it's ok?
Referring to comment 14, this bz is verified pass, will move it to verified after ON_QA and update it to ITM8 now.
Referring to comment 14, move it to verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (virt:av bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:2098