Bug 1448813 - qemu crash when shutdown guest with '-device intel-iommu' and '-device vfio-pci'
Summary: qemu crash when shutdown guest with '-device intel-iommu' and '-device vfio-pci'
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm-rhev   
(Show other bugs)
Version: 7.4
Hardware: x86_64
OS: Linux
high
urgent
Target Milestone: rc
: ---
Assignee: Peter Xu
QA Contact: Pei Zhang
URL:
Whiteboard:
Keywords:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-05-08 09:06 UTC by Pei Zhang
Modified: 2017-08-02 04:38 UTC (History)
11 users (show)

Fixed In Version: qemu-kvm-rhev-2.9.0-9.el7
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-08-02 04:38:29 UTC
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2017:2392 normal SHIPPED_LIVE Important: qemu-kvm-rhev security, bug fix, and enhancement update 2017-08-01 20:04:36 UTC

Description Pei Zhang 2017-05-08 09:06:45 UTC
Description of problem:
Boot guest with iommu and network devices assigned, then shutdown guest, qemu will crash. 

Version-Release number of selected component (if applicable):
3.10.0-663.el7.x86_64
qemu-img-rhev-2.9.0-3.el7.x86_64


How reproducible:
100%

Steps to Reproduce:
1. In host, bind network devices to vfio
# ls /sys/bus/pci/drivers/vfio-pci/
0000:04:00.0  0000:04:00.1  bind  module  new_id  remove_id  uevent  unbind

2. Boot VM with iommu and above network devices. There are warning info shows info qemu terminal.
/usr/libexec/qemu-kvm -name rhel7.4 -M q35,kernel-irqchip=split \
-device intel-iommu,device-iotlb=on,intremap,caching-mode=true \
-cpu Haswell-noTSX -m 8G -numa node \
-smp 4,sockets=1,cores=4,threads=1 \
-device vfio-pci,host=0000:04:00.0 \
-device vfio-pci,host=0000:04:00.1 \
-drive file=/mnt/nfv/rhel7.4_nonrt.qcow2,format=qcow2,if=none,id=drive-virtio-blk0,werror=stop,rerror=stop \
-device virtio-blk-pci,drive=drive-virtio-blk0,id=virtio-blk0 \
-vnc :2 \
-monitor stdio \
-netdev tap,id=hostnet0,vhost=on \
-device virtio-net-pci,netdev=hostnet0,id=net0,mac=88:66:da:5f:dd:01 \

(qemu) qemu-kvm: iommu has granularity incompatible with target AS
qemu-kvm: iommu map to non memory area 280000000
qemu-kvm: iommu has granularity incompatible with target AS
qemu-kvm: iommu map to non memory area 280000000


3. Shutdown guest, qemu will crash.
# shutdown -h now

(qemu) qemu-kvm: pci_get_msi_message: unknown interrupt type
Aborted


Actual results:
Qemu crash.


Expected results:
1. Qemu should quit normally without crash.
2. Warning info should not show in qemu terminal when booting the guest.


Additional info:

Comment 2 Peter Xu 2017-05-08 11:23:31 UTC
Pei,

IIUC one important information for this bug is that we need to setup "iommu=pt" in the guest, am I correct?

If so, please mention it in the procedures, and my suggestion is in the subject as well. This subject is too general imho.

I am investigating this.

Comment 3 Pei Zhang 2017-05-09 01:16:53 UTC
(In reply to Peter Xu from comment #2)
> Pei,
> 
> IIUC one important information for this bug is that we need to setup
> "iommu=pt" in the guest, am I correct?
> 
> If so, please mention it in the procedures, and my suggestion is in the
> subject as well. This subject is too general imho.
> 
> I am investigating this.

Peter, without "iommu=pt" in the guest, qemu still crash, but there are no warning messages show when booting the guest.

Comment 4 Pei Zhang 2017-05-09 01:21:22 UTC
More additional info:
1. Reboot and shutdown guest will both cause qemu crash.

2. Host kernel command line:
# cat /proc/cmdline 
BOOT_IMAGE=/vmlinuz-3.10.0-663.el7.x86_64 root=/dev/mapper/rhel_dell--per730--11-root ro crashkernel=auto rd.lvm.lv=rhel_dell-per730-11/root rd.lvm.lv=rhel_dell-per730-11/swap console=ttyS0,115200n81 default_hugepagesz=1G iommu=pt intel_iommu=on LANG=en_US.UTF-8

3. Guest kernel command line:
# cat /proc/cmdline 
BOOT_IMAGE=/vmlinuz-3.10.0-663.el7.x86_64 root=/dev/mapper/rhel_bootp--73--75--117-root ro console=tty0 console=ttyS0,115200n8 biosdevname=0 crashkernel=auto rd.lvm.lv=rhel_bootp-73-75-117/root rd.lvm.lv=rhel_bootp-73-75-117/swap rhgb quiet default_hugepagesz=1G intel_iommu=on LANG=en_US.UTF-8

Comment 5 Peter Xu 2017-05-09 07:26:25 UTC
(In reply to Pei Zhang from comment #3)
> (In reply to Peter Xu from comment #2)
> > Pei,
> > 
> > IIUC one important information for this bug is that we need to setup
> > "iommu=pt" in the guest, am I correct?
> > 
> > If so, please mention it in the procedures, and my suggestion is in the
> > subject as well. This subject is too general imho.
> > 
> > I am investigating this.
> 
> Peter, without "iommu=pt" in the guest, qemu still crash, but there are no
> warning messages show when booting the guest.

Ok. Thanks for confirmation, Pei. Then let's keep it as it is.

Actually the warning is not related to the crash. I'll discuss them one by one.

For the crash
=============

I posted a fix for the crash problem upstream:

https://lists.gnu.org/archive/html/qemu-devel/2017-05/msg01947.html

It's an irqfd bug introduced along with interrupt remapping. Looks like it could only be triggered by this special configuration (vtd + vfio-pci + virtio devices).

For the two warnings
====================

In general, the two warnings are harmless here.

They only appears when guest specifies "iommu=pt" (guest IOMMU is using passthrough mode). Currently VT-d emulation still does not support hardware passthrough. When it is specified, guest will try to build up a software identity mapping for the whole guest memory address space.

"qemu-kvm: iommu has granularity incompatible with target AS" is a warning when guest wants to map the very beginning of the guest memory address space (0x0-0x1fffff). I believe that's not a memory region that will be used by kernel driver, so that should be fine.

"qemu-kvm: iommu map to non memory area 280000000" should be an off-by-one thing in guest IOMMU driver when building up the identity mapping (e.g., when guest has memory 0-0x1ffff, seems like it'll try to map until 0x20000, which is actually not a real RAM address), which does not matter as well.

My suggestion on the warnings is: let's open another bz to support passthrough, so that guest will use hardware passthrough mode, and we'll get rid of all these warnings.

It's already in progress upstream:

https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg02583.html

Besides the two warnings, a much bigger problem of not supporting VT-d passthrough is that, when using software passthrough with vfio-pci (or say, when VT-d does not support hardware passthrough, meanwhile guest provided "iommu=pt"), we'll lock up the whole guest memory at the very beginning. That's very bad, since if so memory thin provisioning is not working any more.

I'll open a BZ for VT-d passthrough mode support for better tracking.

Peter

Comment 6 Pei Zhang 2017-05-12 05:36:37 UTC
Update:

If bind assigned network devices to vfio driver, then shutdown/reboot guest, qemu will works well, no crash.

Steps.
1. Boot guest like Description.

2. In guest, load vfio
# modprobe vfio
# modprobe vfio-pci

3. In guest, bind NIC1 to vfio
# dpdk-devbind --bind=vfio-pci 0000:01:00.0

4. In guest, bind NIC2 to vfio
# dpdk-devbind --bind=vfio-pci 0000:02:00.0

5. In guest, return NIC1 to kernel ixgbe driver
# dpdk-devbind --bind=ixgbe 0000:01:00.0

6. In guest, return NIC2 to kernel ixgbe driver
# dpdk-devbind --bind=ixgbe 0000:02:00.0


Scenarios with different steps    qemu status after reboot/shutdown guest
1,2,3,4                           works well
1,2,3                             crash
1,2                               crash
1                                 crash
1,2,3,4,5                         crash
1,2,3,4,5,6                       crash


Best Regards,
Pei

Comment 7 Peter Xu 2017-05-12 06:08:42 UTC
(In reply to Pei Zhang from comment #6)
> Update:
> 
> If bind assigned network devices to vfio driver, then shutdown/reboot guest,
> qemu will works well, no crash.
> 
> Steps.
> 1. Boot guest like Description.
> 
> 2. In guest, load vfio
> # modprobe vfio
> # modprobe vfio-pci
> 
> 3. In guest, bind NIC1 to vfio
> # dpdk-devbind --bind=vfio-pci 0000:01:00.0
> 
> 4. In guest, bind NIC2 to vfio
> # dpdk-devbind --bind=vfio-pci 0000:02:00.0
> 
> 5. In guest, return NIC1 to kernel ixgbe driver
> # dpdk-devbind --bind=ixgbe 0000:01:00.0
> 
> 6. In guest, return NIC2 to kernel ixgbe driver
> # dpdk-devbind --bind=ixgbe 0000:02:00.0
> 
> 
> Scenarios with different steps    qemu status after reboot/shutdown guest
> 1,2,3,4                           works well

After some thought, I think this can be explained.

Because when both cards are managed by vfio-pci, then no interrupt is used at all by vfio-pci devices (unless dpdk further enables it, but I guess dpdk don't really need IRQs :). To be more specific, the interrupt resources are released when the port is unbinding from ixgbe drivers.

While, the crash should only happen when both vfio-pci and virtio devices are using the interrupts (e.g., when boot with only virtio, or with only vfio-pci, we won't hit the crash).

Anyway, I'll backport the fix asap when upstream is ready.

> 1,2,3                             crash
> 1,2                               crash
> 1                                 crash
> 1,2,3,4,5                         crash
> 1,2,3,4,5,6                       crash
> 
> 
> Best Regards,
> Pei

Peter

Comment 10 Miroslav Rezanina 2017-06-08 16:27:10 UTC
Fix included in qemu-kvm-rhev-2.9.0-9.el7

Comment 12 Pei Zhang 2017-06-14 05:38:47 UTC
==Verification==

Versions:
3.10.0-679.el7.x86_64
qemu-kvm-rhev-2.9.0-9.el7.x86_64

Steps:
1. In host, add "iommu=pt intel_iommu=on" to kernel line

2. In host, bind network devices to vfio
# ls /sys/bus/pci/drivers/vfio-pci/
0000:04:00.0  0000:04:00.1  bind  module  new_id  remove_id  uevent  unbind

3. Boot VM with iommu and above network devices. 
Same with Description.

4. Add "intel_iommu=on" to kernel line of guest

4. Reboot/shutdown guest several times, guest works well.

So this bug has been fixed well. Thanks Peter.

Move status of this bug to 'VERIFIED'.

Comment 14 errata-xmlrpc 2017-08-02 04:38:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392


Note You need to log in before you can comment on or make changes to this bug.