Description of problem: When migrating rhel8.4 guest with iommu form 8.3.1 host to 8.4 host, des host qemu quit with errror: (qemu) qemu-kvm: get_pci_config_device: Bad config data: i=0x104 read: 0 device: 20 cmask: ff wmask: 0 w1cmask:0 qemu-kvm: Failed to load PCIDevice:config qemu-kvm: Failed to load virtio-blk:virtio qemu-kvm: error while loading state for instance 0x0 of device '0000:00:03.0:00.0/virtio-blk' qemu-kvm: load of migration failed: Invalid argument Version-Release number of selected component (if applicable): src host: RHEL-AV 8.3.1: 4.18.0-240.18.1.el8_3.x86_64 qemu-kvm-5.1.0-20.module+el8.3.1+9918+230f5c26.x86_64 openvswitch2.13-2.13.0-97.el8fdp.x86_64 libvirt-6.6.0-13.1.module+el8.3.1+10185+675b2148.x86_64 des host: RHEL-AV 8.4: 4.18.0-298.el8.x86_64 qemu-kvm-5.2.0-13.module+el8.4.0+10369+fd280775.x86_64 openvswitch2.15-2.15.0-3.el8fdp.x86_64 libvirt-7.0.0-9.module+el8.4.0+10326+5e50a3b6.x86_64 VM:RHEL 8.4 4.18.0-298.el8.x86_64 How reproducible: 100% Steps to Reproduce: 1. Boot qemu with iommu enabled on rhel8.3.1 src host /usr/libexec/qemu-kvm \ -name guest=rhel8.4 \ -machine pc-q35-rhel8.3.0,kernel_irqchip=split \ -cpu Skylake-Server-IBRS \ -m 8192 \ -smp 6,sockets=6,cores=1,threads=1 \ -device intel-iommu,intremap=on,caching-mode=on,device-iotlb=on \ -device pcie-root-port,port=0x10,chassis=1,id=pci.1,bus=pcie.0,multifunction=on \ -device pcie-root-port,port=0x11,chassis=2,id=pci.2,bus=pcie.0 \ -blockdev driver=file,cache.direct=off,cache.no-flush=on,filename=/mnt/nfv/rhel8.4.qcow2,node-name=my_file \ -blockdev driver=qcow2,node-name=my,file=my_file \ -device virtio-blk-pci,scsi=off,bus=pci.1,drive=my,id=virtio-disk0,bootindex=1,write-cache=on,iommu_platform=on,ats=on \ -monitor stdio \ -vnc :2 \ 2. Boot qemu on rhel8.4 des host (same qemu cmd line) -incoming tcp:0:5555 3. Migrate from src to des, fails and qemu on des quit. On src: (qemu) migrate -d tcp:10.73.72.194:5555 On des: (qemu) qemu-kvm: get_pci_config_device: Bad config data: i=0x104 read: 0 device: 20 cmask: ff wmask: 0 w1cmask:0 qemu-kvm: Failed to load PCIDevice:config qemu-kvm: Failed to load virtio-blk:virtio qemu-kvm: error while loading state for instance 0x0 of device '0000:00:03.0:00.0/virtio-blk' qemu-kvm: load of migration failed: Invalid argument Actual results: Migrate with iommu enabled from rhel8.3.1 to rhel8.4 fails. Expected results: Migrate with iommu enabled from rhel8.3.1 to rhel8.4 should successfully. Additional info: 1. Without iommu enabled, this issue cannot be triggered. Just like qemu cmd line like below. /usr/libexec/qemu-kvm \ -name guest=rhel8.4 \ -machine pc-q35-rhel8.3.0,kernel_irqchip=split \ -cpu Skylake-Server-IBRS \ -m 8192 \ -smp 6,sockets=6,cores=1,threads=1 \ -device pcie-root-port,port=0x10,chassis=1,id=pci.1,bus=pcie.0,multifunction=on \ -device pcie-root-port,port=0x11,chassis=2,id=pci.2,bus=pcie.0 \ -blockdev driver=file,cache.direct=off,cache.no-flush=on,filename=/mnt/nfv/rhel8.4.qcow2,node-name=my_file \ -blockdev driver=qcow2,node-name=my,file=my_file \ -device virtio-blk-pci,scsi=off,bus=pci.1,drive=my,id=virtio-disk0,bootindex=1,write-cache=on \ -monitor stdio \ -vnc :2 \ -incoming tcp:0:5555
Can you attach a: sudo lspci -vvv from booting the same VM on 8.3.1 and 8.4.0 please.
Created attachment 1767584 [details] lspci -vvv on rhel8.4
Created attachment 1767585 [details] lspci -vvv on rhel8.3.1
(In reply to Dr. David Alan Gilbert from comment #1) > Can you attach a: > sudo lspci -vvv > > from booting the same VM on 8.3.1 and 8.4.0 please. Hello David, Please see Comment 2 and Comment 3, they are "lspci -vvv" in the same rhel8.4 VM booting on rhel8.4 host and rhel8.3.1 host. Best regards, Pei
Interesting; I don't think that's showing me quite what I need; but it is showing me that the index range is within the 'address translation service' which is the bit to do with IOMMU; the only difference between those two lspci outputs is some lik status: 8.3.1: 00:04.0 PCI bridge: Red Hat, Inc. QEMU PCIe Root port (prog-if 00 [Normal decode]) LnkSta: Speed 16GT/s (ok), Width x32 (ok) >> TrErr- Train- SlotClk- DLActive+ BWMgmt- ABWMgmt- SltCap: AttnBtn+ PwrCtrl+ MRL- AttnInd+ PwrInd+ HotPlug+ Surprise+ Slot #0, PowerLimit 0.000W; Interlock+ NoCompl- SltCtl: Enable: AttnBtn+ PwrFlt- MRL- PresDet- CmdCplt+ HPIrq+ LinkChg- >> Control: AttnInd On, PwrInd Off, Power+ Interlock- SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet- Interlock- Changed: MRL- PresDet- LinkState- 01:00.0 SCSI storage controller: Red Hat, Inc. Virtio block device (rev 01) LnkSta: Speed 2.5GT/s (ok), Width x1 (ok) TrErr- Train- SlotClk- DLActive+ BWMgmt- ABWMgmt- vs 8.4 LnkSta: Speed 16GT/s (ok), Width x32 (ok) >> TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt- SltCap: AttnBtn+ PwrCtrl+ MRL- AttnInd+ PwrInd+ HotPlug+ Surprise+ Slot #0, PowerLimit 0.000W; Interlock+ NoCompl- SltCtl: Enable: AttnBtn+ PwrFlt- MRL- PresDet- CmdCplt+ HPIrq+ LinkChg- >> Control: AttnInd Off, PwrInd Off, Power+ Interlock- SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet- Interlock- Changed: MRL- PresDet- LinkState- 01:00.0 SCSI storage controller: Red Hat, Inc. Virtio block device (rev 01) LnkSta: Speed 2.5GT/s (ok), Width x1 (ok) >> TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt- It doesn't seem to be complaining about that though; (qemu) qemu-kvm: get_pci_config_device: Bad config data: i=0x104 read: 0 device: 20 cmask: ff wmask: 0 w1cmask:0 That i=0x104 is a much bigger index than the Express endpoint which is 0x40-0x7b in that output; both show: Capabilities: [100 v1] Address Translation Service (ATS) ATSCap: Invalidate Queue Depth: 00 ATSCtl: Enable-, Smallest Translation Unit: 00 so since this starts at 0x100, and it's IOMMU related, that's probably where the 0x104 difference is; unfortunately lspci isn't decoding/displaying it.
Now we know it's in ATS, looking in the ATS spec says that: (qemu) qemu-kvm: get_pci_config_device: Bad config data: i=0x104 read: 0 device: 20 cmask: ff wmask: 0 w1cmask:0 offset 04 is 'ATS Control Register | ATS Capability Register' and bit 5 of the ATS Capability register (2^5=0x20) is 'Page aligned request'
This looks like it's qemu commit: commit 4c70875372b821b045e84f462466a5c04b091ef5 Author: Jason Wang <jasowang> Date: Wed Sep 9 16:17:31 2020 +0800 pci: advertise a page aligned ATS After Linux kernel commit 61363c1474b1 ("iommu/vt-d: Enable ATS only if the device uses page aligned address."), ATS will be only enabled if device advertises a page aligned request. Unfortunately, vhost-net is the only user and we don't advertise the aligned request capability in the past since both vhost IOTLB and address_space_get_iotlb_entry() can support non page aligned request. Though it's not clear that if the above kernel commit makes sense. Let's advertise a page aligned ATS here to make vhost device IOTLB work with Intel IOMMU again. Note that in the future we may extend pcie_ats_init() to accept parameters like queue depth and page alignment. Cc: qemu-stable Signed-off-by: Jason Wang <jasowang> Message-Id: <20200909081731.24688-1-jasowang> Reviewed-by: Peter Xu <peterx> Reviewed-by: Michael S. Tsirkin <mst> Signed-off-by: Michael S. Tsirkin <mst>
I've posted a patch upstream to fix this. Please review. Thanks
Assign to Jason directly since he's working on the issue. A determination will need to be made as to "how" or "if" the backport will be necessary for RHEL-AV 8.4.0 since we're rather late in the release cycle.
Jason's qemu commit fix landed upstream: commit d83f46d189a26fa32434139954d264326f199a45 Author: Jason Wang <jasowang> Date: Tue Apr 6 12:03:30 2021 +0800 virtio-pci: compat page aligned ATS Commit 4c70875372b8 ("pci: advertise a page aligned ATS") advertises the page aligned via ATS capability (RO) to unbrek recent Linux IOMMU drivers since 5.2. But it forgot the compat the capability which breaks the migration from old machine type: (qemu) qemu-kvm: get_pci_config_device: Bad config data: i=0x104 read: 0 device: 20 cmask: ff wmask: 0 w1cmask:0 This patch introduces a new parameter "x-ats-page-aligned" for virtio-pci device and turns it on for machine type which is newer than 5.1.
Verification: Versions: Version: 8.3.1 host: 4.18.0-240.22.1.el8_3.x86_64 qemu-kvm-5.1.0-21.module+el8.3.1+10464+8ad18d1a.x86_64 8.4 host: 4.18.0-304.el8.x86_64 qemu-img-5.2.0-16.module+el8.4.0+10806+b7d97207.x86_64 VM: rhel8.4 Following steps in Description, 5 ping-pong migration keeps working well. So this bug has been fixed very well. Will move to Verified directly once on_qa.
Move to Verified as Comment 22.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (virt:av bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:2098