Bug 1942362

Summary: Live migration with iommu from rhel8.3.1 to rhel8.4 fails: qemu-kvm: get_pci_config_device: Bad config data
Product: Red Hat Enterprise Linux Advanced Virtualization Reporter: Pei Zhang <pezhang>
Component: qemu-kvmAssignee: jason wang <jasowang>
qemu-kvm sub component: Live Migration QA Contact: Pei Zhang <pezhang>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: chayang, dgilbert, ehadley, jasowang, jinzhao, juzhang, mrezanin, smitterl, virt-maint, yanghliu
Version: 8.4Keywords: Triaged
Target Milestone: rc   
Target Release: 8.4   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: qemu-kvm-5.2.0-16.module+el8.4.0+10806+b7d97207 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-05-25 06:48:26 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1948358    
Attachments:
Description Flags
lspci -vvv on rhel8.4
none
lspci -vvv on rhel8.3.1 none

Description Pei Zhang 2021-03-24 09:21:34 UTC
Description of problem:
When migrating rhel8.4 guest with iommu form 8.3.1 host to 8.4 host, des host qemu quit with errror:

(qemu) qemu-kvm: get_pci_config_device: Bad config data: i=0x104 read: 0 device: 20 cmask: ff wmask: 0 w1cmask:0
qemu-kvm: Failed to load PCIDevice:config
qemu-kvm: Failed to load virtio-blk:virtio
qemu-kvm: error while loading state for instance 0x0 of device '0000:00:03.0:00.0/virtio-blk'
qemu-kvm: load of migration failed: Invalid argument

Version-Release number of selected component (if applicable):
src host: RHEL-AV 8.3.1:
4.18.0-240.18.1.el8_3.x86_64
qemu-kvm-5.1.0-20.module+el8.3.1+9918+230f5c26.x86_64
openvswitch2.13-2.13.0-97.el8fdp.x86_64
libvirt-6.6.0-13.1.module+el8.3.1+10185+675b2148.x86_64

des host: RHEL-AV 8.4:
4.18.0-298.el8.x86_64
qemu-kvm-5.2.0-13.module+el8.4.0+10369+fd280775.x86_64
openvswitch2.15-2.15.0-3.el8fdp.x86_64
libvirt-7.0.0-9.module+el8.4.0+10326+5e50a3b6.x86_64

VM:RHEL 8.4
4.18.0-298.el8.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Boot qemu with iommu enabled on rhel8.3.1 src host

/usr/libexec/qemu-kvm \
-name guest=rhel8.4 \
-machine pc-q35-rhel8.3.0,kernel_irqchip=split \
-cpu Skylake-Server-IBRS \
-m 8192 \
-smp 6,sockets=6,cores=1,threads=1 \
-device intel-iommu,intremap=on,caching-mode=on,device-iotlb=on \
-device pcie-root-port,port=0x10,chassis=1,id=pci.1,bus=pcie.0,multifunction=on \
-device pcie-root-port,port=0x11,chassis=2,id=pci.2,bus=pcie.0 \
-blockdev driver=file,cache.direct=off,cache.no-flush=on,filename=/mnt/nfv/rhel8.4.qcow2,node-name=my_file \
-blockdev driver=qcow2,node-name=my,file=my_file \
-device virtio-blk-pci,scsi=off,bus=pci.1,drive=my,id=virtio-disk0,bootindex=1,write-cache=on,iommu_platform=on,ats=on \
-monitor stdio \
-vnc :2 \
            

2. Boot qemu on rhel8.4 des host

(same qemu cmd line)
-incoming tcp:0:5555

3. Migrate from src to des, fails and qemu on des quit.

On src:
(qemu) migrate -d tcp:10.73.72.194:5555

On des:
(qemu) qemu-kvm: get_pci_config_device: Bad config data: i=0x104 read: 0 device: 20 cmask: ff wmask: 0 w1cmask:0
qemu-kvm: Failed to load PCIDevice:config
qemu-kvm: Failed to load virtio-blk:virtio
qemu-kvm: error while loading state for instance 0x0 of device '0000:00:03.0:00.0/virtio-blk'
qemu-kvm: load of migration failed: Invalid argument


Actual results:
Migrate with iommu enabled from rhel8.3.1 to rhel8.4 fails.

Expected results:
Migrate with iommu enabled from rhel8.3.1 to rhel8.4 should successfully.

Additional info:
1. Without iommu enabled, this issue cannot be triggered. Just like qemu cmd line like below.

/usr/libexec/qemu-kvm \
-name guest=rhel8.4 \
-machine pc-q35-rhel8.3.0,kernel_irqchip=split \
-cpu Skylake-Server-IBRS \
-m 8192 \
-smp 6,sockets=6,cores=1,threads=1 \
-device pcie-root-port,port=0x10,chassis=1,id=pci.1,bus=pcie.0,multifunction=on \
-device pcie-root-port,port=0x11,chassis=2,id=pci.2,bus=pcie.0 \
-blockdev driver=file,cache.direct=off,cache.no-flush=on,filename=/mnt/nfv/rhel8.4.qcow2,node-name=my_file \
-blockdev driver=qcow2,node-name=my,file=my_file \
-device virtio-blk-pci,scsi=off,bus=pci.1,drive=my,id=virtio-disk0,bootindex=1,write-cache=on \
-monitor stdio \
-vnc :2 \
-incoming tcp:0:5555

Comment 1 Dr. David Alan Gilbert 2021-03-29 10:02:39 UTC
Can you attach a:
   sudo lspci -vvv

from booting the same VM on 8.3.1 and 8.4.0 please.

Comment 2 Pei Zhang 2021-03-30 08:26:31 UTC
Created attachment 1767584 [details]
lspci -vvv on rhel8.4

Comment 3 Pei Zhang 2021-03-30 08:27:25 UTC
Created attachment 1767585 [details]
lspci -vvv on rhel8.3.1

Comment 4 Pei Zhang 2021-03-30 08:30:56 UTC
(In reply to Dr. David Alan Gilbert from comment #1)
> Can you attach a:
>    sudo lspci -vvv
> 
> from booting the same VM on 8.3.1 and 8.4.0 please.


Hello David,

Please see Comment 2 and Comment 3, they are "lspci -vvv" in the same rhel8.4 VM booting on rhel8.4 host and rhel8.3.1 host.

Best regards,

Pei

Comment 5 Dr. David Alan Gilbert 2021-03-30 08:44:59 UTC
Interesting; I don't think that's showing me quite what I need; but it is showing me that the index range is within the 'address translation service' which is the bit to do with IOMMU;
the only difference between those two lspci outputs is some lik status:

8.3.1:

00:04.0 PCI bridge: Red Hat, Inc. QEMU PCIe Root port (prog-if 00 [Normal decode])
		LnkSta:	Speed 16GT/s (ok), Width x32 (ok)
>>			TrErr- Train- SlotClk- DLActive+ BWMgmt- ABWMgmt-
		SltCap:	AttnBtn+ PwrCtrl+ MRL- AttnInd+ PwrInd+ HotPlug+ Surprise+
			Slot #0, PowerLimit 0.000W; Interlock+ NoCompl-
		SltCtl:	Enable: AttnBtn+ PwrFlt- MRL- PresDet- CmdCplt+ HPIrq+ LinkChg-
>>			Control: AttnInd On, PwrInd Off, Power+ Interlock-
		SltSta:	Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet- Interlock-
			Changed: MRL- PresDet- LinkState-

01:00.0 SCSI storage controller: Red Hat, Inc. Virtio block device (rev 01)
		LnkSta:	Speed 2.5GT/s (ok), Width x1 (ok)
			TrErr- Train- SlotClk- DLActive+ BWMgmt- ABWMgmt-

vs 8.4
		LnkSta:	Speed 16GT/s (ok), Width x32 (ok)
>>			TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt-
		SltCap:	AttnBtn+ PwrCtrl+ MRL- AttnInd+ PwrInd+ HotPlug+ Surprise+
			Slot #0, PowerLimit 0.000W; Interlock+ NoCompl-
		SltCtl:	Enable: AttnBtn+ PwrFlt- MRL- PresDet- CmdCplt+ HPIrq+ LinkChg-
>>			Control: AttnInd Off, PwrInd Off, Power+ Interlock-
		SltSta:	Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet- Interlock-
			Changed: MRL- PresDet- LinkState-

01:00.0 SCSI storage controller: Red Hat, Inc. Virtio block device (rev 01)
		LnkSta:	Speed 2.5GT/s (ok), Width x1 (ok)
>>			TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt-

It doesn't seem to be complaining about that though;
(qemu) qemu-kvm: get_pci_config_device: Bad config data: i=0x104 read: 0 device: 20 cmask: ff wmask: 0 w1cmask:0

That i=0x104 is a much bigger index than the Express endpoint which is 0x40-0x7b in that output;

both show:
	Capabilities: [100 v1] Address Translation Service (ATS)
		ATSCap:	Invalidate Queue Depth: 00
		ATSCtl:	Enable-, Smallest Translation Unit: 00

so since this starts at 0x100, and it's IOMMU related, that's probably where the 0x104 difference is; unfortunately lspci isn't decoding/displaying it.

Comment 6 Dr. David Alan Gilbert 2021-03-30 08:53:14 UTC
Now we know it's in ATS, looking in the ATS spec says that:

(qemu) qemu-kvm: get_pci_config_device: Bad config data: i=0x104 read: 0 device: 20 cmask: ff wmask: 0 w1cmask:0

offset 04 is 'ATS Control Register | ATS Capability Register'
and bit 5 of the ATS Capability register (2^5=0x20) is 'Page aligned request'

Comment 7 Dr. David Alan Gilbert 2021-03-30 08:55:02 UTC
This looks like it's qemu commit:

commit 4c70875372b821b045e84f462466a5c04b091ef5
Author: Jason Wang <jasowang>
Date:   Wed Sep 9 16:17:31 2020 +0800

    pci: advertise a page aligned ATS
    
    After Linux kernel commit 61363c1474b1 ("iommu/vt-d: Enable ATS only
    if the device uses page aligned address."), ATS will be only enabled
    if device advertises a page aligned request.
    
    Unfortunately, vhost-net is the only user and we don't advertise the
    aligned request capability in the past since both vhost IOTLB and
    address_space_get_iotlb_entry() can support non page aligned request.
    
    Though it's not clear that if the above kernel commit makes
    sense. Let's advertise a page aligned ATS here to make vhost device
    IOTLB work with Intel IOMMU again.
    
    Note that in the future we may extend pcie_ats_init() to accept
    parameters like queue depth and page alignment.
    
    Cc: qemu-stable
    Signed-off-by: Jason Wang <jasowang>
    Message-Id: <20200909081731.24688-1-jasowang>
    Reviewed-by: Peter Xu <peterx>
    Reviewed-by: Michael S. Tsirkin <mst>
    Signed-off-by: Michael S. Tsirkin <mst>

Comment 8 jason wang 2021-04-02 07:57:11 UTC
I've posted a patch upstream to fix this.

Please review.

Thanks

Comment 9 John Ferlan 2021-04-05 12:32:21 UTC
Assign to Jason directly since he's working on the issue. A determination will need to be made as to "how" or "if" the backport will be necessary for RHEL-AV 8.4.0 since we're rather late in the release cycle.

Comment 11 Dr. David Alan Gilbert 2021-04-20 11:21:30 UTC
Jason's qemu commit fix landed upstream:

commit d83f46d189a26fa32434139954d264326f199a45
Author: Jason Wang <jasowang>
Date:   Tue Apr 6 12:03:30 2021 +0800

    virtio-pci: compat page aligned ATS
    
    Commit 4c70875372b8 ("pci: advertise a page aligned ATS") advertises
    the page aligned via ATS capability (RO) to unbrek recent Linux IOMMU
    drivers since 5.2. But it forgot the compat the capability which
    breaks the migration from old machine type:
    
    (qemu) qemu-kvm: get_pci_config_device: Bad config data: i=0x104 read:
    0 device: 20 cmask: ff wmask: 0 w1cmask:0
    
    This patch introduces a new parameter "x-ats-page-aligned" for
    virtio-pci device and turns it on for machine type which is newer than
    5.1.

Comment 22 Pei Zhang 2021-04-30 02:06:29 UTC
Verification:

Versions:
Version:
8.3.1 host:
4.18.0-240.22.1.el8_3.x86_64
qemu-kvm-5.1.0-21.module+el8.3.1+10464+8ad18d1a.x86_64

8.4 host:
4.18.0-304.el8.x86_64
qemu-img-5.2.0-16.module+el8.4.0+10806+b7d97207.x86_64

VM: rhel8.4

Following steps in Description, 5 ping-pong migration keeps working well. 

So this bug has been fixed very well. Will move to Verified directly once on_qa.

Comment 27 Pei Zhang 2021-05-08 01:36:22 UTC
Move to Verified as Comment 22.

Comment 29 errata-xmlrpc 2021-05-25 06:48:26 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (virt:av bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:2098