Bug 1448813
Summary: | qemu crash when shutdown guest with '-device intel-iommu' and '-device vfio-pci' | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Pei Zhang <pezhang> |
Component: | qemu-kvm-rhev | Assignee: | Peter Xu <peterx> |
Status: | CLOSED ERRATA | QA Contact: | Pei Zhang <pezhang> |
Severity: | urgent | Docs Contact: | |
Priority: | high | ||
Version: | 7.4 | CC: | atragler, chayang, hhuang, jinzhao, juzhang, lmiksik, michen, mtessun, peterx, virt-maint, yfu |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | qemu-kvm-rhev-2.9.0-9.el7 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2017-08-02 04:38:29 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Pei Zhang
2017-05-08 09:06:45 UTC
Pei, IIUC one important information for this bug is that we need to setup "iommu=pt" in the guest, am I correct? If so, please mention it in the procedures, and my suggestion is in the subject as well. This subject is too general imho. I am investigating this. (In reply to Peter Xu from comment #2) > Pei, > > IIUC one important information for this bug is that we need to setup > "iommu=pt" in the guest, am I correct? > > If so, please mention it in the procedures, and my suggestion is in the > subject as well. This subject is too general imho. > > I am investigating this. Peter, without "iommu=pt" in the guest, qemu still crash, but there are no warning messages show when booting the guest. More additional info: 1. Reboot and shutdown guest will both cause qemu crash. 2. Host kernel command line: # cat /proc/cmdline BOOT_IMAGE=/vmlinuz-3.10.0-663.el7.x86_64 root=/dev/mapper/rhel_dell--per730--11-root ro crashkernel=auto rd.lvm.lv=rhel_dell-per730-11/root rd.lvm.lv=rhel_dell-per730-11/swap console=ttyS0,115200n81 default_hugepagesz=1G iommu=pt intel_iommu=on LANG=en_US.UTF-8 3. Guest kernel command line: # cat /proc/cmdline BOOT_IMAGE=/vmlinuz-3.10.0-663.el7.x86_64 root=/dev/mapper/rhel_bootp--73--75--117-root ro console=tty0 console=ttyS0,115200n8 biosdevname=0 crashkernel=auto rd.lvm.lv=rhel_bootp-73-75-117/root rd.lvm.lv=rhel_bootp-73-75-117/swap rhgb quiet default_hugepagesz=1G intel_iommu=on LANG=en_US.UTF-8 (In reply to Pei Zhang from comment #3) > (In reply to Peter Xu from comment #2) > > Pei, > > > > IIUC one important information for this bug is that we need to setup > > "iommu=pt" in the guest, am I correct? > > > > If so, please mention it in the procedures, and my suggestion is in the > > subject as well. This subject is too general imho. > > > > I am investigating this. > > Peter, without "iommu=pt" in the guest, qemu still crash, but there are no > warning messages show when booting the guest. Ok. Thanks for confirmation, Pei. Then let's keep it as it is. Actually the warning is not related to the crash. I'll discuss them one by one. For the crash ============= I posted a fix for the crash problem upstream: https://lists.gnu.org/archive/html/qemu-devel/2017-05/msg01947.html It's an irqfd bug introduced along with interrupt remapping. Looks like it could only be triggered by this special configuration (vtd + vfio-pci + virtio devices). For the two warnings ==================== In general, the two warnings are harmless here. They only appears when guest specifies "iommu=pt" (guest IOMMU is using passthrough mode). Currently VT-d emulation still does not support hardware passthrough. When it is specified, guest will try to build up a software identity mapping for the whole guest memory address space. "qemu-kvm: iommu has granularity incompatible with target AS" is a warning when guest wants to map the very beginning of the guest memory address space (0x0-0x1fffff). I believe that's not a memory region that will be used by kernel driver, so that should be fine. "qemu-kvm: iommu map to non memory area 280000000" should be an off-by-one thing in guest IOMMU driver when building up the identity mapping (e.g., when guest has memory 0-0x1ffff, seems like it'll try to map until 0x20000, which is actually not a real RAM address), which does not matter as well. My suggestion on the warnings is: let's open another bz to support passthrough, so that guest will use hardware passthrough mode, and we'll get rid of all these warnings. It's already in progress upstream: https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg02583.html Besides the two warnings, a much bigger problem of not supporting VT-d passthrough is that, when using software passthrough with vfio-pci (or say, when VT-d does not support hardware passthrough, meanwhile guest provided "iommu=pt"), we'll lock up the whole guest memory at the very beginning. That's very bad, since if so memory thin provisioning is not working any more. I'll open a BZ for VT-d passthrough mode support for better tracking. Peter Update: If bind assigned network devices to vfio driver, then shutdown/reboot guest, qemu will works well, no crash. Steps. 1. Boot guest like Description. 2. In guest, load vfio # modprobe vfio # modprobe vfio-pci 3. In guest, bind NIC1 to vfio # dpdk-devbind --bind=vfio-pci 0000:01:00.0 4. In guest, bind NIC2 to vfio # dpdk-devbind --bind=vfio-pci 0000:02:00.0 5. In guest, return NIC1 to kernel ixgbe driver # dpdk-devbind --bind=ixgbe 0000:01:00.0 6. In guest, return NIC2 to kernel ixgbe driver # dpdk-devbind --bind=ixgbe 0000:02:00.0 Scenarios with different steps qemu status after reboot/shutdown guest 1,2,3,4 works well 1,2,3 crash 1,2 crash 1 crash 1,2,3,4,5 crash 1,2,3,4,5,6 crash Best Regards, Pei (In reply to Pei Zhang from comment #6) > Update: > > If bind assigned network devices to vfio driver, then shutdown/reboot guest, > qemu will works well, no crash. > > Steps. > 1. Boot guest like Description. > > 2. In guest, load vfio > # modprobe vfio > # modprobe vfio-pci > > 3. In guest, bind NIC1 to vfio > # dpdk-devbind --bind=vfio-pci 0000:01:00.0 > > 4. In guest, bind NIC2 to vfio > # dpdk-devbind --bind=vfio-pci 0000:02:00.0 > > 5. In guest, return NIC1 to kernel ixgbe driver > # dpdk-devbind --bind=ixgbe 0000:01:00.0 > > 6. In guest, return NIC2 to kernel ixgbe driver > # dpdk-devbind --bind=ixgbe 0000:02:00.0 > > > Scenarios with different steps qemu status after reboot/shutdown guest > 1,2,3,4 works well After some thought, I think this can be explained. Because when both cards are managed by vfio-pci, then no interrupt is used at all by vfio-pci devices (unless dpdk further enables it, but I guess dpdk don't really need IRQs :). To be more specific, the interrupt resources are released when the port is unbinding from ixgbe drivers. While, the crash should only happen when both vfio-pci and virtio devices are using the interrupts (e.g., when boot with only virtio, or with only vfio-pci, we won't hit the crash). Anyway, I'll backport the fix asap when upstream is ready. > 1,2,3 crash > 1,2 crash > 1 crash > 1,2,3,4,5 crash > 1,2,3,4,5,6 crash > > > Best Regards, > Pei Peter Fix included in qemu-kvm-rhev-2.9.0-9.el7 ==Verification== Versions: 3.10.0-679.el7.x86_64 qemu-kvm-rhev-2.9.0-9.el7.x86_64 Steps: 1. In host, add "iommu=pt intel_iommu=on" to kernel line 2. In host, bind network devices to vfio # ls /sys/bus/pci/drivers/vfio-pci/ 0000:04:00.0 0000:04:00.1 bind module new_id remove_id uevent unbind 3. Boot VM with iommu and above network devices. Same with Description. 4. Add "intel_iommu=on" to kernel line of guest 4. Reboot/shutdown guest several times, guest works well. So this bug has been fixed well. Thanks Peter. Move status of this bug to 'VERIFIED'. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:2392 |