Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1879349

Summary: Migration failed, when 'iommu_platform' is enabled on ‘virtio-net-pci’ device
Product: Red Hat Enterprise Linux Advanced Virtualization Reporter: Zhenyu Zhang <zhenyzha>
Component: qemu-kvmAssignee: Greg Kurz <gkurz>
qemu-kvm sub component: Live Migration QA Contact: Virtualization Bugs <virt-bugs>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: bfu, bugproxy, chayang, coli, ddepaula, dgibson, gkurz, hannsj_uhl, jinzhao, juzhang, lvivier, mdeng, mrezanin, ngu, qzhang, virt-maint, xiaohli, xuma, yihyu
Version: 8.3Flags: xianwang: needinfo-
pm-rhel: mirror+
Target Milestone: rc   
Target Release: 8.4   
Hardware: ppc64le   
OS: Linux   
Whiteboard:
Fixed In Version: qemu-kvm-5.2.0-0.module+el8.4.0+8855+a9e237a9 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-05-25 06:43:34 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1730194    
Bug Blocks: 1789757, 1796871, 1883084    

Description Zhenyu Zhang 2020-09-16 03:28:36 UTC
Description of problem:
When IOMMU is enabled on "virtio-net-pci", the migration fails.
-device virtio-net-pci,disable-legacy=on,disable-modern=off,iommu_platform=on

Send migration command, guest hang, the following error is encountered:
(qemu) qemu-kvm: GLib: gmem.c:135: failed to allocate 17592186147560 bytes


Version-Release number of selected component (if applicable):
Host name: ibm-p9b-13.ibm2.lab.eng.bos.redhat.com
Host Distro: RHEL-8.3.0-20200912.n.0 BaseOS ppc64le
Host Kernel: 4.18.0-236.el8.ppc64le
Guest Kernel: 4.18.0-236.el8.ppc64le
Qemu-kvm: qemu-kvm-5.1.0-6.module+el8.3.0+8041+42ff16b8

How reproducible:
100%

Steps to Reproduce:
1.boot guest
/usr/libexec/qemu-kvm \
-name 'avocado-vt-vm1'  \
-sandbox on  \
-machine pseries  \
-nodefaults \
-device VGA,bus=pci.0,addr=0x2 \
-m 20480  \
-smp 32,maxcpus=32,cores=16,threads=1,sockets=2  \
-cpu 'host' \
-chardev socket,server,id=qmp_id_qmpmonitor1,path=/var/tmp/monitor-qmpmonitor1,nowait  \
-mon chardev=qmp_id_qmpmonitor1,mode=control \
-chardev socket,server,id=chardev_serial0,path=/var/tmp/serial-serial0,nowait \
-device spapr-vty,id=serial0,reg=0x30000000,chardev=chardev_serial0 \
-device qemu-xhci,id=usb1,bus=pci.0,addr=0x3 \
-device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
-device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=0x4 \
-blockdev node-name=file_image1,driver=file,aio=threads,filename=/home/kar/vt_test_images/rhel830-ppc64le-virtio-scsi.qcow2,cache.direct=on,cache.no-flush=off \
-blockdev node-name=drive_image1,driver=qcow2,cache.direct=on,cache.no-flush=off,file=file_image1 \
-device scsi-hd,id=image1,drive=drive_image1,write-cache=on \
-device virtio-net-pci,mac=9a:4c:f5:51:1a:1f,id=idI0Gfwv,netdev=idSFFpRU,bus=pci.0,addr=0x5,disable-legacy=on,disable-modern=off,iommu_platform=on  \      ================================== Turn on IOMMU
-netdev tap,id=idSFFpRU,vhost=on  \
-vnc :20  \
-rtc base=utc,clock=host  \
-boot menu=off,order=cdn,once=c,strict=off \
-enable-kvm \
-monitor stdio 

============================================================
/usr/libexec/qemu-kvm \
-name 'avocado-vt-vm1'  \
-sandbox on  \
-machine pseries  \
-nodefaults \
-device VGA,bus=pci.0,addr=0x2 \
-m 20480  \
-smp 32,maxcpus=32,cores=16,threads=1,sockets=2  \
-cpu 'host' \
-chardev socket,server,id=qmp_id_qmpmonitor1,path=/var/tmp/monitor-qmpmonitor2,nowait  \
-mon chardev=qmp_id_qmpmonitor1,mode=control \
-chardev socket,server,id=chardev_serial0,path=/var/tmp/serial-serial2,nowait \
-device spapr-vty,id=serial0,reg=0x30000000,chardev=chardev_serial0 \
-device qemu-xhci,id=usb1,bus=pci.0,addr=0x3 \
-device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
-device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=0x4 \
-blockdev node-name=file_image1,driver=file,aio=threads,filename=/home/kar/vt_test_images/rhel830-ppc64le-virtio-scsi.qcow2,cache.direct=on,cache.no-flush=off \
-blockdev node-name=drive_image1,driver=qcow2,cache.direct=on,cache.no-flush=off,file=file_image1 \
-device scsi-hd,id=image1,drive=drive_image1,write-cache=on \
-device virtio-net-pci,mac=9a:4c:f5:51:1a:1f,id=idI0Gfwv,netdev=idSFFpRU,bus=pci.0,addr=0x5,disable-legacy=on,disable-modern=off,iommu_platform=on  \      ================================== Turn on IOMMU
-netdev tap,id=idSFFpRU,vhost=on  \
-vnc :30  \
-rtc base=utc,clock=host  \
-boot menu=off,order=cdn,once=c,strict=off \
-enable-kvm \
-monitor stdio \
-incoming defer

2.Send migration command:
# nc -U /var/tmp/monitor-qmpmonitor2
{"QMP": {"version": {"qemu": {"micro": 0, "minor": 1, "major": 5}, "package": "qemu-kvm-5.1.0-6.module+el8.3.0+8041+42ff16b8"}, "capabilities": ["oob"]}}
{"execute": "qmp_capabilities", "id": "YQbCZ0Th"}
{"return": {}, "id": "YQbCZ0Th"}
{"execute": "migrate-incoming", "arguments": {"uri": "tcp:[::]:5467"}, "id": "xg5eKNBA"}
{"return": {}, "id": "xg5eKNBA"}
========================================================
# nc -U /var/tmp/monitor-qmpmonitor1
{"QMP": {"version": {"qemu": {"micro": 0, "minor": 1, "major": 5}, "package": "qemu-kvm-5.1.0-6.module+el8.3.0+8041+42ff16b8"}, "capabilities": ["oob"]}}
{"execute": "qmp_capabilities", "id": "YQbCZ0Th"}
{"return": {}, "id": "YQbCZ0Th"}
{"execute": "migrate", "arguments": {"uri": "tcp:localhost:5467", "blk": false, "inc": false}, "id": "hVNuVzEv"}
{"return": {}, "id": "hVNuVzEv"}

3. hit this issue, the guest hang:
(qemu) qemu-kvm: GLib: gmem.c:135: failed to allocate 17592186147560 bytes


Actual results:
guest hang

Expected results:
The migration can be completed normally

Additional info:
Not sure if the x86 platform has the same issue
x86 platform test, update later

Comment 1 Li Xiaohui 2020-09-16 04:30:13 UTC
Didn't hit this issue when migrate vm with following qemu clis on x86 hosts:
Environments:
hosts: kernel-4.18.0-234.el8.x86_64 & qemu-kvm-5.1.0-6.module+el8.3.0+8041+42ff16b8.x86_64
guest: kernel-4.18.0-232.el8.x86_64

qemu clis:
/usr/libexec/qemu-kvm  \
-name "mouse-vm",debug-threads=on \
-sandbox off \
-machine q35 \
-cpu Skylake-Client \
-nodefaults  \
-device VGA \
-chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/monitor-qmpmonitor1,server,nowait \
-chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/monitor-catch_monitor,server,nowait \
-mon chardev=qmp_id_qmpmonitor1,mode=control \
-mon chardev=qmp_id_catch_monitor,mode=control \
-device pcie-root-port,port=0x10,chassis=1,id=root0,bus=pcie.0,multifunction=on,addr=0x2 \
-device pcie-root-port,port=0x11,chassis=2,id=root1,bus=pcie.0,addr=0x2.0x1 \
-device pcie-root-port,port=0x12,chassis=3,id=root2,bus=pcie.0,addr=0x2.0x2 \
-device pcie-root-port,port=0x13,chassis=4,id=root3,bus=pcie.0,addr=0x2.0x3 \
-device pcie-root-port,port=0x14,chassis=5,id=root4,bus=pcie.0,addr=0x2.0x4 \
-device pcie-root-port,port=0x15,chassis=6,id=root5,bus=pcie.0,addr=0x2.0x5 \
-device pcie-root-port,port=0x16,chassis=7,id=root6,bus=pcie.0,addr=0x2.0x6 \
-device pcie-root-port,port=0x17,chassis=8,id=root7,bus=pcie.0,addr=0x2.0x7 \
-device nec-usb-xhci,id=usb1,bus=root0 \
-device virtio-scsi-pci,id=virtio_scsi_pci0,bus=root1 \
-device scsi-hd,id=image1,drive=drive_image1,bus=virtio_scsi_pci0.0,channel=0,scsi-id=0,lun=0,bootindex=0 \
-device virtio-net-pci,mac=9a:8a:8b:8c:8d:8e,id=net0,vectors=4,netdev=tap0,bus=root2,disable-legacy=on,disable-modern=off,iommu_platform=on \               #---->virtio-net-pci configure 
-device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
-blockdev driver=file,cache.direct=on,cache.no-flush=off,filename=/mnt/nfs/rhel830-64-virtio-scsi-0818.qcow2,node-name=drive_sys1 \
-blockdev driver=qcow2,node-name=drive_image1,file=drive_sys1 \
-netdev tap,id=tap0,vhost=on \
-m 4096 \
-smp 4,maxcpus=4,cores=2,threads=1,sockets=2 \
-vnc :10 \
-rtc base=utc,clock=host \
-boot menu=off,strict=off,order=cdn,once=c \
-enable-kvm  \
-qmp tcp:0:3333,server,nowait \
-serial tcp:0:4444,server,nowait \
-monitor stdio \

Notes: try migration with pc+seabios guest, still not hit this issue.

Comment 2 Zhenyu Zhang 2020-09-16 06:03:21 UTC
(In reply to Li Xiaohui from comment #1)

Thanks, there is no hit this issue on the X86 platform, so set to ppc only.

Comment 3 Greg Kurz 2020-09-16 16:42:16 UTC
The root cause is that QEMU allocates a bitmap to log pages that are
dirtied by the in-kernel vhost device during migration. This bitmap
accounts for all the guest RAM and DMA. In the case of POWER, DMA
addresses start at 0x800000000000000ULL by default. This causes the
bitmap size to be insanely big and g_malloc0() fails. The guest is
stuck because QEMU is aborting actually.

Lowering the DMA addresses to a smaller value, eg. adding:

-global spapr-pci-host-bridge.dma64_win_addr=0x80000000

on the QEMU command line allows the migration to succeed.


So this raise several questions:

- the default base for DMA addresses (0x800000000000000ULL) is
  a guest visible setting we have since QEMU-2.7.0 and I'm not
  sure we can/want to change it

- the vhost log bitmap as it is implemented today doesn't scale
  with huge addresses

- the virtqueue used ring address that is used to compute the
  size of the bitmap comes from the guest. It is certainly
  possible to hack an x86 guest to provide a similarly huge
  address and crash QEMU

- all the vhost code is architecture agnostic

Not sure if the POWER virt team is the best fit to address this.

Comment 4 xianwang 2020-09-17 07:24:44 UTC
Bug reproduction:
Host:
[root@ibm-p9b-27 home]# uname -r
4.18.0-236.el8.ppc64le
qemu-kvm-5.1.0-6.module+el8.3.0+8041+42ff16b8.ppc64le
SLOF-20200717-1.gite18ddad8.module+el8.3.0+7638+07cf13d2.noarch

qemu cli:
/usr/libexec/qemu-kvm \
-name 'avocado-vt-vm1'  \
-sandbox on  \
-machine pseries  \
-nodefaults \
-device VGA,bus=pci.0,addr=0x2 \
-m 4096  \
-smp 80,maxcpus=80,cores=40,threads=1,sockets=2  \
-cpu 'host' \
-object iothread,id=iothread0 \
-chardev socket,path=/var/tmp/monitor-qmpmonitor1,nowait,id=qmp_id_qmpmonitor1,server  \
-mon chardev=qmp_id_qmpmonitor1,mode=control \
-chardev socket,path=/var/tmp/serial-serial0,nowait,id=chardev_serial0,server \
-device spapr-vty,id=serial0,reg=0x30000000,chardev=chardev_serial0 \
-device qemu-xhci,id=usb1,bus=pci.0,addr=0x3 \
-device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
-device usb-kbd,id=usb-kbd1,bus=usb1.0,port=2 \
-device usb-mouse,id=usb-mouse1,bus=usb1.0,port=3 \
-device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=0x4,iothread=iothread0 \
-blockdev node-name=file_image1,driver=file,aio=threads,filename=/home/xianwang/rhel830-ppc64le-virtio-scsi_p9.qcow2,cache.direct=on,cache.no-flush=off \
-blockdev node-name=drive_image1,driver=qcow2,cache.direct=on,cache.no-flush=off,file=file_image1 \
-device scsi-hd,id=image1,drive=drive_image1,write-cache=on,bus=virtio_scsi_pci0.0,channel=0,scsi-id=0,lun=0 \
-device virtio-net-pci,mac=9a:58:80:27:08:7c,id=idji2KPU,netdev=idYBxx2l,bus=pci.0,addr=0x5,disable-legacy=on,disable-modern=off,iommu_platform=on  \
-netdev tap,id=idYBxx2l,vhost=on \
-vnc :11  \
-rtc base=utc,clock=host  \
-boot menu=off,order=cdn,once=c,strict=off \
-enable-kvm \
-monitor stdio

source side:
(qemu) migrate -d tcp:127.0.0.1:5801
(qemu) qemu-kvm: GLib: gmem.c:135: failed to allocate 17592186150680 bytes

Comment 5 Greg Kurz 2020-09-18 13:22:58 UTC
After discussion with David, I'll dig some more to see if we can
support a sparse bitmap for the vhost log.

Comment 9 Li Xiaohui 2020-09-28 13:40:29 UTC
test with fixed qemu-kvm and kernel builds, it works well on x86 hosts:
hosts info:
kernel-4.18.0-239.1.el8.bz1883084.x86_64&qemu-img-5.1.0-9.module+el8.3.0+7652+b30e6901.bz1879349.x86_64

(1)Migration, reboot and shutdown successfully when boot q35+seabios rhel8.3.0 guest with network commands: 
-device virtio-net-pci,mac=9a:8a:8b:8c:8d:8e,id=net0,vectors=4,netdev=tap0,bus=root2,disable-legacy=on,disable-modern=off,iommu_platform=on \
-netdev tap,id=tap0,vhost=on \
(2)Migration, reboot and shutdown successfully when boot pc+seabios rhel8.3.0 guest with network commands
-device virtio-net-pci,mac=9a:8a:8b:8c:8d:8e,id=net0,vectors=4,netdev=tap0,bus=pci.0,addr=0x5,disable-legacy=on,disable-modern=off,iommu_platform=on \
-netdev tap,id=tap0,vhost=on \

Comment 10 bfu 2020-09-30 01:49:25 UTC
*** Bug 1882393 has been marked as a duplicate of this bug. ***

Comment 11 Greg Kurz 2020-10-13 07:52:25 UTC
Now waiting for the following patch to be merged upstream:

http://patchwork.ozlabs.org/project/qemu-devel/patch/160208823418.29027.15172801181796272300.stgit@bahia.lan/

Comment 13 Greg Kurz 2020-11-02 07:36:58 UTC
Fix merged upstream as:

commit 170a6794efde98fb1ad70f59d4cd9af7decf279d
Author: Greg Kurz <groug>
Date:   Wed Oct 7 18:30:34 2020 +0200

    vhost: Don't special case vq->used_phys in vhost_get_log_size()
    
    The first loop in vhost_get_log_size() computes the size of the dirty log
    bitmap so that it allows to track changes in the entire guest memory, in
    terms of GPA.
    
    When not using a vIOMMU, the address of the vring's used structure,
    vq->used_phys, is a GPA. It is thus already covered by the first loop.
    
    When using a vIOMMU, vq->used_phys is a GIOVA that will be translated
    to an HVA when the vhost backend needs to update the used structure. It
    will log the corresponding GPAs into the bitmap but it certainly won't
    log the GIOVA.
    
    So in any case, vq->used_phys shouldn't be explicitly used to size the
    bitmap. Drop the second loop.
    
    This fixes a crash of the source when migrating a guest using in-kernel
    vhost-net and iommu_platform=on on POWER, because DMA regions are put
    over 0x800000000000000ULL. The resulting insanely huge log size causes
    g_malloc0() to abort.
    
    BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1879349
    Signed-off-by: Greg Kurz <groug>
    Message-Id: <160208823418.29027.15172801181796272300.stgit>
    Acked-by: Jason Wang <jasowang>
    Reviewed-by: Michael S. Tsirkin <mst>
    Signed-off-by: Michael S. Tsirkin <mst>

Comment 14 xianwang 2020-11-27 07:31:04 UTC
Bug verification:
Host:
[root@ibm-p9b-26 home]# uname -r
4.18.0-252.el8.dt2.ppc64le
qemu-kvm-5.2.0-0.module+el8.4.0+8855+a9e237a9.ppc64le
SLOF-20200717-1.gite18ddad8.module+el8.4.0+8855+a9e237a9.noarch

Guest:
4.18.0-252.el8.ppc64le

Steps are same with comment4.

Result:
Migration completed and vm works well after migration.

So, this issue is fixed.

Comment 16 xianwang 2020-12-09 01:23:38 UTC
Hi, Greg,
For comment 15, I want to modify its ITM to ITM6 or ITM 7, do you think it's ok?

Comment 17 xianwang 2020-12-09 09:10:04 UTC
Referring to comment 14, this bz is verified pass, will move it to verified after ON_QA and update it to ITM8 now.

Comment 20 xianwang 2020-12-16 01:41:00 UTC
Referring to comment 14, move it to verified.

Comment 23 errata-xmlrpc 2021-05-25 06:43:34 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (virt:av bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:2098