Bug 1738440

Summary: For intel-iommu, qemu shows conflict behaviors between booting a guest with vfio and hot plugging vfio device
Product: Red Hat Enterprise Linux 8 Reporter: Pei Zhang <pezhang>
Component: qemu-kvmAssignee: Peter Xu <peterx>
qemu-kvm sub component: General QA Contact: Pei Zhang <pezhang>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: medium CC: alex.williamson, bdas, chayang, ddepaula, jinzhao, juzhang, knoel, peterx, rbalakri, virt-maint, yanghliu
Version: 8.1   
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: qemu-kvm-2.12.0-89.module+el8.2.0+4436+f3a2188d Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1738450 (view as bug list) Environment:
Last Closed: 2020-04-28 15:32:15 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1738450    

Description Pei Zhang 2019-08-07 07:59:12 UTC
Description of problem:

(1)Booting guest with device assignment and vIOMMU without "caching-mode=on", qemu can work well. 

(2)However when booting guest with vIOMMU without "caching-mode=on", then hot plug a vfio device, qemu will quit.

So qemu shows conflict behaviors. If caching-mode=on is a must for intel-iommu to enable device assignment with IOMMU protection, qemu should reject to boot in scenario (1). If caching-mode=on is not a must, qemu should allow hot plug a vfio device.


Version-Release number of selected component (if applicable):
4.18.0-128.el8.x86_64
qemu-kvm-2.12.0-83.module+el8.1.0+3852+0ba8aef0.x86_64


How reproducible:
100%

Steps to Reproduce:
1. Boot qemu with vIOMMU without "caching-mode=on", full cmd refer to [1]

-device intel-iommu,intremap=on,device-iotlb=on \

2. Hot plug a vfio device, qemu quit.

(qemu) device_add vfio-pci,host=0000:5e:00.0,bus=root.3 
We need to set caching-mode=1 for intel-iommu to enable device assignment with IOMMU protection.
# 

3. Boot qemu with vfio device and vIOMMU without "caching-mode=on", qemu works well.

-device intel-iommu,intremap=on,device-iotlb=on \
-device vfio-pci,host=0000:5e:00.0,bus=root.3 \


Actual results:
qemu shows conflict behaviors when boot/hot-plug vfio with vIOMMU but without caching-mode=on.

Expected results:
qemu should show consistent behavior when boot/hot-plug vfio with vIOMMU but without caching-mode=on.

Additional info:
1. This bug has been closed in rhel7.
Bug 1441605 - qemu crash when attach a hostdev device to the guest with intel-iommu device enabled

2. Bug 1622209 - libvirt should notify the user if caching-mode isn't specified for iommu emulation in certain cases 


Reference:
[1]
/usr/libexec/qemu-kvm \
-name rhel8.1 \
-M q35,kernel-irqchip=split \
-cpu Haswell-noTSX \
-m 3G \
-device intel-iommu,intremap=on,device-iotlb=on \
-smp 4,sockets=1,cores=4,threads=1 \
-device pcie-root-port,id=root.1,chassis=1 \
-device pcie-root-port,id=root.2,chassis=2 \
-device pcie-root-port,id=root.3,chassis=3 \
-blockdev driver=file,cache.direct=off,cache.no-flush=on,filename=/home/images_nfv-virt-rt-kvm//rhel8.0.qcow2,node-name=my_file \
-blockdev driver=qcow2,node-name=my,file=my_file \
-device virtio-blk-pci,drive=my,id=virtio-blk0,bus=root.1 \
-vnc :1 \
-monitor stdio \
-netdev tap,id=hostnet0 \
-device virtio-net-pci,netdev=hostnet0,id=net0,mac=88:66:da:5f:dd:01,bus=root.2 \

Comment 3 Bandan Das 2019-08-08 19:01:34 UTC
(In reply to Pei Zhang from comment #0)
> Description of problem:
> 
> (1)Booting guest with device assignment and vIOMMU without
> "caching-mode=on", qemu can work well. 
> 
> (2)However when booting guest with vIOMMU without "caching-mode=on", then
> hot plug a vfio device, qemu will quit.
> 
> So qemu shows conflict behaviors. If caching-mode=on is a must for
> intel-iommu to enable device assignment with IOMMU protection, qemu should
> reject to boot in scenario (1). If caching-mode=on is not a must, qemu
> should allow hot plug a vfio device.
> 
> 
> Version-Release number of selected component (if applicable):
> 4.18.0-128.el8.x86_64
> qemu-kvm-2.12.0-83.module+el8.1.0+3852+0ba8aef0.x86_64
> 
> 
> How reproducible:
> 100%
> 
> Steps to Reproduce:
> 1. Boot qemu with vIOMMU without "caching-mode=on", full cmd refer to [1]
> 
> -device intel-iommu,intremap=on,device-iotlb=on \
> 
> 2. Hot plug a vfio device, qemu quit.
> 
> (qemu) device_add vfio-pci,host=0000:5e:00.0,bus=root.3 
> We need to set caching-mode=1 for intel-iommu to enable device assignment
> with IOMMU protection.
> # 
> 
> 3. Boot qemu with vfio device and vIOMMU without "caching-mode=on", qemu
> works well.
> 
> -device intel-iommu,intremap=on,device-iotlb=on \
> -device vfio-pci,host=0000:5e:00.0,bus=root.3 \
> 
> 
> Actual results:
> qemu shows conflict behaviors when boot/hot-plug vfio with vIOMMU but
> without caching-mode=on.
> 
> Expected results:
> qemu should show consistent behavior when boot/hot-plug vfio with vIOMMU but
> without caching-mode=on.
> 
What exactly are you thinking in terms of consistent behavior ?
Caching mode can't be dynamically set as far as I understand.

> Additional info:
> 1. This bug has been closed in rhel7.
> Bug 1441605 - qemu crash when attach a hostdev device to the guest with
> intel-iommu device enabled
> 
> 2. Bug 1622209 - libvirt should notify the user if caching-mode isn't
> specified for iommu emulation in certain cases 
>

Comment 4 Alex Williamson 2019-08-09 16:22:57 UTC
An unmentioned component in the configuration is that the guest is booted with the option intel_iommu=on, which is actually the key issue that causes this apparent inconsistency.  The intel-iommu device can be present, but it's only potentially enabled during guest boot, when the option intel_iommu=on is provided.  If the vIOMMU is never enabled, then specifying caching-mode is never required.  When the VM is instantiated with a vfio-pci device, the vIOMMU is not yet enabled, therefore no fault occurs.  If the VM is allowed to continue to boot, AND the guest OS enables the IOMMU, then a fault occurs due to the lack of caching-mode.  In the case of the hot-added vfio-pci device, the guest has already enabled the IOMMU, therefore the fault occurs as soon as the vfio-pci device is added.  In order to generate the fault, the following needs to be true:

 1) intel-iommu device is present
 2) vfio-pci device is present
 3) caching-mode is not specified for the intel-iommu device
 4) the vIOMMU is enabled in the VM by the guest OS

The original report is not taking 4) into account.  Please verify.

I think this should probably be closed notabug, but are there suggestions to improve the consistency?  The decision has been made not to enable caching-mode by default due to the additional overhead when used with emulated and paravirt devices rather than vfio-pci.  I believe the interrupt remapping aspects of intel-iommu might be usable without caching-mode, even with a vfio-pci device, therefore we cannot simply detect the intel-iommu and vfio-pci devices as being mutually exclusive w/o caching-mode.  Personally I think the behavior is consistent when all aspects are taken into account, but it is a complicated issue.

Comment 5 Pei Zhang 2019-08-12 07:40:27 UTC
(In reply to Alex Williamson from comment #4)
> An unmentioned component in the configuration is that the guest is booted
> with the option intel_iommu=on, which is actually the key issue that causes
> this apparent inconsistency.  The intel-iommu device can be present, but
> it's only potentially enabled during guest boot, when the option
> intel_iommu=on is provided.  If the vIOMMU is never enabled, then specifying
> caching-mode is never required.  When the VM is instantiated with a vfio-pci
> device, the vIOMMU is not yet enabled, therefore no fault occurs.  If the VM
> is allowed to continue to boot, AND the guest OS enables the IOMMU, then a
> fault occurs due to the lack of caching-mode.  In the case of the hot-added
> vfio-pci device, the guest has already enabled the IOMMU, therefore the
> fault occurs as soon as the vfio-pci device is added.  In order to generate
> the fault, the following needs to be true:
> 
>  1) intel-iommu device is present
>  2) vfio-pci device is present
>  3) caching-mode is not specified for the intel-iommu device
>  4) the vIOMMU is enabled in the VM by the guest OS
> 
> The original report is not taking 4) into account.  Please verify.

Hi Alex,

Thank you very much for thorough explain. It's really very helpful.

About item 4), yes, I was testing with intel_iommu=on in guest kernel line.

I also tried without intel_iommu=on in guest kernel line, the vfio device can be hot plug successfully without caching-mode. 

Best regards,

Pei

> 
> I think this should probably be closed notabug, but are there suggestions to
> improve the consistency?  The decision has been made not to enable
> caching-mode by default due to the additional overhead when used with
> emulated and paravirt devices rather than vfio-pci.  I believe the interrupt
> remapping aspects of intel-iommu might be usable without caching-mode, even
> with a vfio-pci device, therefore we cannot simply detect the intel-iommu
> and vfio-pci devices as being mutually exclusive w/o caching-mode. 
> Personally I think the behavior is consistent when all aspects are taken
> into account, but it is a complicated issue.

Comment 6 Pei Zhang 2019-08-12 07:53:44 UTC
(In reply to Bandan Das from comment #3)
> (In reply to Pei Zhang from comment #0)
[...]
> > 
> What exactly are you thinking in terms of consistent behavior ?
> Caching mode can't be dynamically set as far as I understand.
> 

I was trying to say boot vfio and hot-plug vfio, qemu shows different behavior without caching-mode.

As Alex explained in Comment 4, I've understand this is reasonable.

Best regards,

Pei

Comment 7 Alex Williamson 2019-08-12 16:28:58 UTC
I'd probably close this as notabug, but Peter is suggesting further configuration restraints and user protections upstream, so reassigning to him.

Comment 11 Pei Zhang 2019-12-06 09:13:14 UTC
Verified this bz with qemu-kvm-2.12.0-92.module+el8.2.0+5014+5115d99d.x86_64:

Steps following Description.

After step 2, qemu prompts warning info and works well. This is expected behavior.

(qemu) device_add vfio-pci,host=0000:5e:00.0,bus=root.3 
Device assignment is not allowed without enabling caching-mode=on for Intel IOMMU.


After step 3, qemu fails reboot without "caching-mode=on". This is expected behavior.

-device intel-iommu,intremap=on,device-iotlb=on \
-device vfio-pci,host=0000:5e:00.0,bus=root.3 \

(qemu) qemu-kvm: -device vfio-pci,host=0000:5e:00.0,bus=root.3: vfio warning: 0000:5e:00.0: failed to setup resample irqfd: Invalid argument
qemu-kvm: We need to set caching-mode=on for intel-iommu to enable device assignment with IOMMU protection.


So this bug has been fixed very well. Move to 'Verified'.

Comment 12 Ademar Reis 2020-02-05 23:02:08 UTC
QEMU has been recently split into sub-components and as a one-time operation to avoid breakage of tools, we are setting the QEMU sub-component of this BZ to "General". Please review and change the sub-component if necessary the next time you review this BZ. Thanks

Comment 14 errata-xmlrpc 2020-04-28 15:32:15 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2020:1587