Bug 1879349 - Migration failed, when 'iommu_platform' is enabled on ‘virtio-net-pci’ device
Summary: Migration failed, when 'iommu_platform' is enabled on ‘virtio-net-pci’ device
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux Advanced Virtualization
Classification: Red Hat
Component: qemu-kvm
Version: 8.3
Hardware: ppc64le
OS: Linux
high
high
Target Milestone: rc
: 8.4
Assignee: Greg Kurz
QA Contact: Virtualization Bugs
URL:
Whiteboard:
: 1882393 (view as bug list)
Depends On: 1730194
Blocks: 1789757 1796871 1883084
TreeView+ depends on / blocked
 
Reported: 2020-09-16 03:28 UTC by Zhenyu Zhang
Modified: 2021-05-25 06:44 UTC (History)
19 users (show)

Fixed In Version: qemu-kvm-5.2.0-0.module+el8.4.0+8855+a9e237a9
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-05-25 06:43:34 UTC
Type: Bug
Target Upstream Version:
Embargoed:
xianwang: needinfo-


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
IBM Linux Technology Center 188660 0 None None None 2020-10-13 14:11:44 UTC

Description Zhenyu Zhang 2020-09-16 03:28:36 UTC
Description of problem:
When IOMMU is enabled on "virtio-net-pci", the migration fails.
-device virtio-net-pci,disable-legacy=on,disable-modern=off,iommu_platform=on

Send migration command, guest hang, the following error is encountered:
(qemu) qemu-kvm: GLib: gmem.c:135: failed to allocate 17592186147560 bytes


Version-Release number of selected component (if applicable):
Host name: ibm-p9b-13.ibm2.lab.eng.bos.redhat.com
Host Distro: RHEL-8.3.0-20200912.n.0 BaseOS ppc64le
Host Kernel: 4.18.0-236.el8.ppc64le
Guest Kernel: 4.18.0-236.el8.ppc64le
Qemu-kvm: qemu-kvm-5.1.0-6.module+el8.3.0+8041+42ff16b8

How reproducible:
100%

Steps to Reproduce:
1.boot guest
/usr/libexec/qemu-kvm \
-name 'avocado-vt-vm1'  \
-sandbox on  \
-machine pseries  \
-nodefaults \
-device VGA,bus=pci.0,addr=0x2 \
-m 20480  \
-smp 32,maxcpus=32,cores=16,threads=1,sockets=2  \
-cpu 'host' \
-chardev socket,server,id=qmp_id_qmpmonitor1,path=/var/tmp/monitor-qmpmonitor1,nowait  \
-mon chardev=qmp_id_qmpmonitor1,mode=control \
-chardev socket,server,id=chardev_serial0,path=/var/tmp/serial-serial0,nowait \
-device spapr-vty,id=serial0,reg=0x30000000,chardev=chardev_serial0 \
-device qemu-xhci,id=usb1,bus=pci.0,addr=0x3 \
-device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
-device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=0x4 \
-blockdev node-name=file_image1,driver=file,aio=threads,filename=/home/kar/vt_test_images/rhel830-ppc64le-virtio-scsi.qcow2,cache.direct=on,cache.no-flush=off \
-blockdev node-name=drive_image1,driver=qcow2,cache.direct=on,cache.no-flush=off,file=file_image1 \
-device scsi-hd,id=image1,drive=drive_image1,write-cache=on \
-device virtio-net-pci,mac=9a:4c:f5:51:1a:1f,id=idI0Gfwv,netdev=idSFFpRU,bus=pci.0,addr=0x5,disable-legacy=on,disable-modern=off,iommu_platform=on  \      ================================== Turn on IOMMU
-netdev tap,id=idSFFpRU,vhost=on  \
-vnc :20  \
-rtc base=utc,clock=host  \
-boot menu=off,order=cdn,once=c,strict=off \
-enable-kvm \
-monitor stdio 

============================================================
/usr/libexec/qemu-kvm \
-name 'avocado-vt-vm1'  \
-sandbox on  \
-machine pseries  \
-nodefaults \
-device VGA,bus=pci.0,addr=0x2 \
-m 20480  \
-smp 32,maxcpus=32,cores=16,threads=1,sockets=2  \
-cpu 'host' \
-chardev socket,server,id=qmp_id_qmpmonitor1,path=/var/tmp/monitor-qmpmonitor2,nowait  \
-mon chardev=qmp_id_qmpmonitor1,mode=control \
-chardev socket,server,id=chardev_serial0,path=/var/tmp/serial-serial2,nowait \
-device spapr-vty,id=serial0,reg=0x30000000,chardev=chardev_serial0 \
-device qemu-xhci,id=usb1,bus=pci.0,addr=0x3 \
-device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
-device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=0x4 \
-blockdev node-name=file_image1,driver=file,aio=threads,filename=/home/kar/vt_test_images/rhel830-ppc64le-virtio-scsi.qcow2,cache.direct=on,cache.no-flush=off \
-blockdev node-name=drive_image1,driver=qcow2,cache.direct=on,cache.no-flush=off,file=file_image1 \
-device scsi-hd,id=image1,drive=drive_image1,write-cache=on \
-device virtio-net-pci,mac=9a:4c:f5:51:1a:1f,id=idI0Gfwv,netdev=idSFFpRU,bus=pci.0,addr=0x5,disable-legacy=on,disable-modern=off,iommu_platform=on  \      ================================== Turn on IOMMU
-netdev tap,id=idSFFpRU,vhost=on  \
-vnc :30  \
-rtc base=utc,clock=host  \
-boot menu=off,order=cdn,once=c,strict=off \
-enable-kvm \
-monitor stdio \
-incoming defer

2.Send migration command:
# nc -U /var/tmp/monitor-qmpmonitor2
{"QMP": {"version": {"qemu": {"micro": 0, "minor": 1, "major": 5}, "package": "qemu-kvm-5.1.0-6.module+el8.3.0+8041+42ff16b8"}, "capabilities": ["oob"]}}
{"execute": "qmp_capabilities", "id": "YQbCZ0Th"}
{"return": {}, "id": "YQbCZ0Th"}
{"execute": "migrate-incoming", "arguments": {"uri": "tcp:[::]:5467"}, "id": "xg5eKNBA"}
{"return": {}, "id": "xg5eKNBA"}
========================================================
# nc -U /var/tmp/monitor-qmpmonitor1
{"QMP": {"version": {"qemu": {"micro": 0, "minor": 1, "major": 5}, "package": "qemu-kvm-5.1.0-6.module+el8.3.0+8041+42ff16b8"}, "capabilities": ["oob"]}}
{"execute": "qmp_capabilities", "id": "YQbCZ0Th"}
{"return": {}, "id": "YQbCZ0Th"}
{"execute": "migrate", "arguments": {"uri": "tcp:localhost:5467", "blk": false, "inc": false}, "id": "hVNuVzEv"}
{"return": {}, "id": "hVNuVzEv"}

3. hit this issue, the guest hang:
(qemu) qemu-kvm: GLib: gmem.c:135: failed to allocate 17592186147560 bytes


Actual results:
guest hang

Expected results:
The migration can be completed normally

Additional info:
Not sure if the x86 platform has the same issue
x86 platform test, update later

Comment 1 Li Xiaohui 2020-09-16 04:30:13 UTC
Didn't hit this issue when migrate vm with following qemu clis on x86 hosts:
Environments:
hosts: kernel-4.18.0-234.el8.x86_64 & qemu-kvm-5.1.0-6.module+el8.3.0+8041+42ff16b8.x86_64
guest: kernel-4.18.0-232.el8.x86_64

qemu clis:
/usr/libexec/qemu-kvm  \
-name "mouse-vm",debug-threads=on \
-sandbox off \
-machine q35 \
-cpu Skylake-Client \
-nodefaults  \
-device VGA \
-chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/monitor-qmpmonitor1,server,nowait \
-chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/monitor-catch_monitor,server,nowait \
-mon chardev=qmp_id_qmpmonitor1,mode=control \
-mon chardev=qmp_id_catch_monitor,mode=control \
-device pcie-root-port,port=0x10,chassis=1,id=root0,bus=pcie.0,multifunction=on,addr=0x2 \
-device pcie-root-port,port=0x11,chassis=2,id=root1,bus=pcie.0,addr=0x2.0x1 \
-device pcie-root-port,port=0x12,chassis=3,id=root2,bus=pcie.0,addr=0x2.0x2 \
-device pcie-root-port,port=0x13,chassis=4,id=root3,bus=pcie.0,addr=0x2.0x3 \
-device pcie-root-port,port=0x14,chassis=5,id=root4,bus=pcie.0,addr=0x2.0x4 \
-device pcie-root-port,port=0x15,chassis=6,id=root5,bus=pcie.0,addr=0x2.0x5 \
-device pcie-root-port,port=0x16,chassis=7,id=root6,bus=pcie.0,addr=0x2.0x6 \
-device pcie-root-port,port=0x17,chassis=8,id=root7,bus=pcie.0,addr=0x2.0x7 \
-device nec-usb-xhci,id=usb1,bus=root0 \
-device virtio-scsi-pci,id=virtio_scsi_pci0,bus=root1 \
-device scsi-hd,id=image1,drive=drive_image1,bus=virtio_scsi_pci0.0,channel=0,scsi-id=0,lun=0,bootindex=0 \
-device virtio-net-pci,mac=9a:8a:8b:8c:8d:8e,id=net0,vectors=4,netdev=tap0,bus=root2,disable-legacy=on,disable-modern=off,iommu_platform=on \               #---->virtio-net-pci configure 
-device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
-blockdev driver=file,cache.direct=on,cache.no-flush=off,filename=/mnt/nfs/rhel830-64-virtio-scsi-0818.qcow2,node-name=drive_sys1 \
-blockdev driver=qcow2,node-name=drive_image1,file=drive_sys1 \
-netdev tap,id=tap0,vhost=on \
-m 4096 \
-smp 4,maxcpus=4,cores=2,threads=1,sockets=2 \
-vnc :10 \
-rtc base=utc,clock=host \
-boot menu=off,strict=off,order=cdn,once=c \
-enable-kvm  \
-qmp tcp:0:3333,server,nowait \
-serial tcp:0:4444,server,nowait \
-monitor stdio \

Notes: try migration with pc+seabios guest, still not hit this issue.

Comment 2 Zhenyu Zhang 2020-09-16 06:03:21 UTC
(In reply to Li Xiaohui from comment #1)

Thanks, there is no hit this issue on the X86 platform, so set to ppc only.

Comment 3 Greg Kurz 2020-09-16 16:42:16 UTC
The root cause is that QEMU allocates a bitmap to log pages that are
dirtied by the in-kernel vhost device during migration. This bitmap
accounts for all the guest RAM and DMA. In the case of POWER, DMA
addresses start at 0x800000000000000ULL by default. This causes the
bitmap size to be insanely big and g_malloc0() fails. The guest is
stuck because QEMU is aborting actually.

Lowering the DMA addresses to a smaller value, eg. adding:

-global spapr-pci-host-bridge.dma64_win_addr=0x80000000

on the QEMU command line allows the migration to succeed.


So this raise several questions:

- the default base for DMA addresses (0x800000000000000ULL) is
  a guest visible setting we have since QEMU-2.7.0 and I'm not
  sure we can/want to change it

- the vhost log bitmap as it is implemented today doesn't scale
  with huge addresses

- the virtqueue used ring address that is used to compute the
  size of the bitmap comes from the guest. It is certainly
  possible to hack an x86 guest to provide a similarly huge
  address and crash QEMU

- all the vhost code is architecture agnostic

Not sure if the POWER virt team is the best fit to address this.

Comment 4 xianwang 2020-09-17 07:24:44 UTC
Bug reproduction:
Host:
[root@ibm-p9b-27 home]# uname -r
4.18.0-236.el8.ppc64le
qemu-kvm-5.1.0-6.module+el8.3.0+8041+42ff16b8.ppc64le
SLOF-20200717-1.gite18ddad8.module+el8.3.0+7638+07cf13d2.noarch

qemu cli:
/usr/libexec/qemu-kvm \
-name 'avocado-vt-vm1'  \
-sandbox on  \
-machine pseries  \
-nodefaults \
-device VGA,bus=pci.0,addr=0x2 \
-m 4096  \
-smp 80,maxcpus=80,cores=40,threads=1,sockets=2  \
-cpu 'host' \
-object iothread,id=iothread0 \
-chardev socket,path=/var/tmp/monitor-qmpmonitor1,nowait,id=qmp_id_qmpmonitor1,server  \
-mon chardev=qmp_id_qmpmonitor1,mode=control \
-chardev socket,path=/var/tmp/serial-serial0,nowait,id=chardev_serial0,server \
-device spapr-vty,id=serial0,reg=0x30000000,chardev=chardev_serial0 \
-device qemu-xhci,id=usb1,bus=pci.0,addr=0x3 \
-device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
-device usb-kbd,id=usb-kbd1,bus=usb1.0,port=2 \
-device usb-mouse,id=usb-mouse1,bus=usb1.0,port=3 \
-device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=0x4,iothread=iothread0 \
-blockdev node-name=file_image1,driver=file,aio=threads,filename=/home/xianwang/rhel830-ppc64le-virtio-scsi_p9.qcow2,cache.direct=on,cache.no-flush=off \
-blockdev node-name=drive_image1,driver=qcow2,cache.direct=on,cache.no-flush=off,file=file_image1 \
-device scsi-hd,id=image1,drive=drive_image1,write-cache=on,bus=virtio_scsi_pci0.0,channel=0,scsi-id=0,lun=0 \
-device virtio-net-pci,mac=9a:58:80:27:08:7c,id=idji2KPU,netdev=idYBxx2l,bus=pci.0,addr=0x5,disable-legacy=on,disable-modern=off,iommu_platform=on  \
-netdev tap,id=idYBxx2l,vhost=on \
-vnc :11  \
-rtc base=utc,clock=host  \
-boot menu=off,order=cdn,once=c,strict=off \
-enable-kvm \
-monitor stdio

source side:
(qemu) migrate -d tcp:127.0.0.1:5801
(qemu) qemu-kvm: GLib: gmem.c:135: failed to allocate 17592186150680 bytes

Comment 5 Greg Kurz 2020-09-18 13:22:58 UTC
After discussion with David, I'll dig some more to see if we can
support a sparse bitmap for the vhost log.

Comment 9 Li Xiaohui 2020-09-28 13:40:29 UTC
test with fixed qemu-kvm and kernel builds, it works well on x86 hosts:
hosts info:
kernel-4.18.0-239.1.el8.bz1883084.x86_64&qemu-img-5.1.0-9.module+el8.3.0+7652+b30e6901.bz1879349.x86_64

(1)Migration, reboot and shutdown successfully when boot q35+seabios rhel8.3.0 guest with network commands: 
-device virtio-net-pci,mac=9a:8a:8b:8c:8d:8e,id=net0,vectors=4,netdev=tap0,bus=root2,disable-legacy=on,disable-modern=off,iommu_platform=on \
-netdev tap,id=tap0,vhost=on \
(2)Migration, reboot and shutdown successfully when boot pc+seabios rhel8.3.0 guest with network commands
-device virtio-net-pci,mac=9a:8a:8b:8c:8d:8e,id=net0,vectors=4,netdev=tap0,bus=pci.0,addr=0x5,disable-legacy=on,disable-modern=off,iommu_platform=on \
-netdev tap,id=tap0,vhost=on \

Comment 10 bfu 2020-09-30 01:49:25 UTC
*** Bug 1882393 has been marked as a duplicate of this bug. ***

Comment 11 Greg Kurz 2020-10-13 07:52:25 UTC
Now waiting for the following patch to be merged upstream:

http://patchwork.ozlabs.org/project/qemu-devel/patch/160208823418.29027.15172801181796272300.stgit@bahia.lan/

Comment 13 Greg Kurz 2020-11-02 07:36:58 UTC
Fix merged upstream as:

commit 170a6794efde98fb1ad70f59d4cd9af7decf279d
Author: Greg Kurz <groug>
Date:   Wed Oct 7 18:30:34 2020 +0200

    vhost: Don't special case vq->used_phys in vhost_get_log_size()
    
    The first loop in vhost_get_log_size() computes the size of the dirty log
    bitmap so that it allows to track changes in the entire guest memory, in
    terms of GPA.
    
    When not using a vIOMMU, the address of the vring's used structure,
    vq->used_phys, is a GPA. It is thus already covered by the first loop.
    
    When using a vIOMMU, vq->used_phys is a GIOVA that will be translated
    to an HVA when the vhost backend needs to update the used structure. It
    will log the corresponding GPAs into the bitmap but it certainly won't
    log the GIOVA.
    
    So in any case, vq->used_phys shouldn't be explicitly used to size the
    bitmap. Drop the second loop.
    
    This fixes a crash of the source when migrating a guest using in-kernel
    vhost-net and iommu_platform=on on POWER, because DMA regions are put
    over 0x800000000000000ULL. The resulting insanely huge log size causes
    g_malloc0() to abort.
    
    BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1879349
    Signed-off-by: Greg Kurz <groug>
    Message-Id: <160208823418.29027.15172801181796272300.stgit>
    Acked-by: Jason Wang <jasowang>
    Reviewed-by: Michael S. Tsirkin <mst>
    Signed-off-by: Michael S. Tsirkin <mst>

Comment 14 xianwang 2020-11-27 07:31:04 UTC
Bug verification:
Host:
[root@ibm-p9b-26 home]# uname -r
4.18.0-252.el8.dt2.ppc64le
qemu-kvm-5.2.0-0.module+el8.4.0+8855+a9e237a9.ppc64le
SLOF-20200717-1.gite18ddad8.module+el8.4.0+8855+a9e237a9.noarch

Guest:
4.18.0-252.el8.ppc64le

Steps are same with comment4.

Result:
Migration completed and vm works well after migration.

So, this issue is fixed.

Comment 16 xianwang 2020-12-09 01:23:38 UTC
Hi, Greg,
For comment 15, I want to modify its ITM to ITM6 or ITM 7, do you think it's ok?

Comment 17 xianwang 2020-12-09 09:10:04 UTC
Referring to comment 14, this bz is verified pass, will move it to verified after ON_QA and update it to ITM8 now.

Comment 20 xianwang 2020-12-16 01:41:00 UTC
Referring to comment 14, move it to verified.

Comment 23 errata-xmlrpc 2021-05-25 06:43:34 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (virt:av bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:2098


Note You need to log in before you can comment on or make changes to this bug.