Bug 2149211 - VFIO_MAP_DMA failed: Bad address [NEEDINFO]
Summary: VFIO_MAP_DMA failed: Bad address
Keywords:
Status: NEW
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: qemu-kvm
Version: 9.2
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: rc
: ---
Assignee: Peter Xu
QA Contact: Yanghang Liu
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-11-29 08:26 UTC by Yanghang Liu
Modified: 2023-08-10 05:33 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Type: ---
Target Upstream Version:
Embargoed:
peterx: needinfo? (yanghliu)


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHELPLAN-140747 0 None None None 2022-11-29 08:33:35 UTC

Description Yanghang Liu 2022-11-29 08:26:25 UTC
Description of problem:
After doing the dpdk tests against the PF in the domain ,the qemu-kvm throws "VFIO_MAP_DMA failed: Bad address" error

Version-Release number of selected component (if applicable):
host:
           5.14.0-201.rt14.202.el9.x86_64
           qemu-kvm-7.1.0-5.el9.x86_64
           tuned-2.19.0-1.el9.noarch
           libvirt-8.9.0-2.el9.x86_64
           python3-libvirt-8.9.0-1.el9.x86_64
           openvswitch2.17-2.17.0-57.el9fdp.x86_64
           dpdk-21.11.2-1.el9_1.x86_64
           edk2-ovmf-20220826gitba0e0e4c6a-2.el9.noarch
           seabios-bin-1.16.0-4.el9.noarch
guest:
           5.14.0-201.rt14.202.el9.x86_64

How reproducible:
100%

Steps to Reproduce:
1. prepare the test environment
# echo isolated_cores=2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,31,29,27,25,23,21,19,17,15,13,11 > /etc/tuned/realtime-virtual-host-variables.conf
# echo isolate_managed_irq=Y >> /etc/tuned/realtime-virtual-host-variables.conf
# /usr/sbin/tuned-adm profile realtime-virtual-host
# grubby --args="iommu=pt intel_iommu=on default_hugepagesz=1G hugepages=20" --update-kernel=`grubby --default-kernel` 
# reboot

2. start a nfv domain with two PFs
                    
                          <domain type='kvm'>
                            <name>rhel9.2</name>
                            <uuid>b2df2c5c-6f95-11ed-b4bb-20040fec000c</uuid>
                            <memory unit='KiB'>8388608</memory>
                            <currentMemory unit='KiB'>8388608</currentMemory>
                            <memoryBacking>
                              <hugepages>
                                <page size='1048576' unit='KiB'/>
                              </hugepages>
                              <locked/>
                            </memoryBacking>
                            <vcpu placement='static'>6</vcpu>
                            <cputune>
                              <vcpupin vcpu='0' cpuset='30'/>
                              <vcpupin vcpu='1' cpuset='28'/>
                              <vcpupin vcpu='2' cpuset='26'/>
                              <vcpupin vcpu='3' cpuset='24'/>
                              <vcpupin vcpu='4' cpuset='22'/>
                              <vcpupin vcpu='5' cpuset='20'/>
                              <emulatorpin cpuset='25,27,29,31'/>
                              <emulatorsched scheduler='fifo' priority='1'/>
                              <vcpusched vcpus='1' scheduler='fifo' priority='1'/>
                              <vcpusched vcpus='2' scheduler='fifo' priority='1'/>
                              <vcpusched vcpus='3' scheduler='fifo' priority='1'/>
                              <vcpusched vcpus='4' scheduler='fifo' priority='1'/>
                              <vcpusched vcpus='5' scheduler='fifo' priority='1'/>
                            </cputune>
                            <numatune>
                              <memory mode='strict' nodeset='0'/>
                              <memnode cellid='0' mode='strict' nodeset='0'/>
                            </numatune>
                            <os>
                              <type arch='x86_64' machine='pc-q35-rhel9.0.0'>hvm</type>
                              <loader readonly='yes' secure='yes' type='pflash'>/usr/share/edk2/ovmf/OVMF_CODE.secboot.fd</loader>
                              <nvram template='/usr/share/edk2/ovmf/OVMF_VARS.fd'>/var/lib/libvirt/qemu/nvram/rhel9.2_VARS.fd</nvram>
                              <boot dev='hd'/>
                            </os>
                            <features>
                              <acpi/>
                              <pmu state='off'/>
                              <vmport state='off'/>
                              <smm state='on'/>
                              <ioapic driver='qemu'/>
                            </features>
                            <cpu mode='host-model' check='partial'>
                              <topology sockets='3' dies='1' cores='1' threads='2'/>
                              <feature policy='require' name='tsc-deadline'/>
                              <numa>
                                <cell id='0' cpus='0-5' memory='8388608' unit='KiB' memAccess='shared'/>
                              </numa>
                            </cpu>
                            <clock offset='utc'>
                              <timer name='rtc' tickpolicy='catchup'/>
                              <timer name='pit' tickpolicy='delay'/>
                              <timer name='hpet' present='no'/>
                            </clock>
                            <on_poweroff>destroy</on_poweroff>
                            <on_reboot>restart</on_reboot>
                            <on_crash>restart</on_crash>
                            <devices>
                              <emulator>/usr/libexec/qemu-kvm</emulator>
                              <disk type='file' device='disk'>
                                <driver name='qemu' type='qcow2' cache='none' io='threads' iommu='on' ats='on'/>
                                <source file='/home/images_nfv-virt-rt-kvm/rhel9.2.qcow2'/>
                                <target dev='vda' bus='virtio'/>
                                <address type='pci' domain='0x0000' bus='0x02' slot='0x00' function='0x0'/>
                              </disk>
                              <controller type='usb' index='0' model='none'/>
                              <controller type='pci' index='0' model='pcie-root'/>
                              <controller type='pci' index='1' model='pcie-root-port'>
                                <model name='pcie-root-port'/>
                                <target chassis='1' port='0x10'/>
                                <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0' multifunction='on'/>
                              </controller>
                              <controller type='pci' index='2' model='pcie-root-port'>
                                <model name='pcie-root-port'/>
                                <target chassis='2' port='0x11'/>
                                <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x1'/>
                              </controller>
                              <controller type='pci' index='3' model='pcie-root-port'>
                                <model name='pcie-root-port'/>
                                <target chassis='3' port='0x12'/>
                                <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x2'/>
                              </controller>
                              <controller type='pci' index='4' model='pcie-root-port'>
                                <model name='pcie-root-port'/>
                                <target chassis='4' port='0x13'/>
                                <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x3'/>
                              </controller>
                              <controller type='pci' index='5' model='pcie-root-port'>
                                <model name='pcie-root-port'/>
                                <target chassis='5' port='0x14'/>
                                <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x4'/>
                              </controller>
                              <controller type='pci' index='6' model='pcie-root-port'>
                                <model name='pcie-root-port'/>
                                <target chassis='6' port='0x15'/>
                                <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x5'/>
                              </controller>
                              <controller type='pci' index='7' model='pcie-root-port'>
                                <model name='pcie-root-port'/>
                                <target chassis='7' port='0x16'/>
                                <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x6'/>
                              </controller>
                              <controller type='sata' index='0'>
                                <address type='pci' domain='0x0000' bus='0x00' slot='0x1f' function='0x2'/>
                              </controller>
                              <interface type='bridge'>
                                <mac address='88:66:da:5f:dd:11'/>
                                <source bridge='switch'/>
                                <model type='virtio'/>
                                <driver name='vhost' iommu='on' ats='on'/>
                                <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
                              </interface>
                              <serial type='pty'>
                                <target type='isa-serial' port='0'>
                                  <model name='isa-serial'/>
                                </target>
                              </serial>
                              <console type='pty'>
                                <target type='serial' port='0'/>
                              </console>
                              <input type='mouse' bus='ps2'/>
                              <input type='keyboard' bus='ps2'/>
                              <tpm model='tpm-crb'>
                                <backend type='emulator' version='2.0'/>
                              </tpm>
                              <audio id='1' type='none'/>
                              <hostdev mode='subsystem' type='pci' managed='yes'>
                                <driver name='vfio'/>
                                <source>
                                  <address domain='0x0000' bus='0x5e' slot='0x00' function='0x0'/>
                                </source>
                                <address type='pci' domain='0x0000' bus='0x06' slot='0x00' function='0x0'/>
                              </hostdev>
                              <hostdev mode='subsystem' type='pci' managed='yes'>
                                <driver name='vfio'/>
                                <source>
                                  <address domain='0x0000' bus='0x5e' slot='0x00' function='0x1'/>
                                </source>
                                <address type='pci' domain='0x0000' bus='0x07' slot='0x00' function='0x0'/>
                              </hostdev>
                              <memballoon model='virtio'>
                                <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
                                <driver iommu='on' ats='on'/>
                              </memballoon>
                              <iommu model='intel'>
                                <driver intremap='on' caching_mode='on' iotlb='on'/>
                              </iommu>
                            </devices>
                          </domain>


3. setup the domain env
# echo isolated_cores=1,2,3,4,5 > /etc/tuned/realtime-virtual-guest-variables.conf
# echo isolate_managed_irq=Y >> /etc/tuned/realtime-virtual-guest-variables.conf
# /usr/sbin/tuned-adm profile realtime-virtual-guest
# grubby --args="iommu=pt intel_iommu=on default_hugepagesz=1G hugepagesz=1G hugepages=2" --update-kernel=`grubby --default-kernel` 
# reboot

4. bind two PFs' driver to vfio-pci and then run dpdk-testpmd
# modprobe vfio-pci
# dpdk-devbind.py --bind=vfio-pci 0000:06:00.0
# dpdk-devbind.py --bind=vfio-pci 0000:07:00.0
# dpdk-testpmd -l 1,2,3,4,5 -n 4  -d /usr/lib64/librte_net_ixgbe.so  -- --nb-cores=4 -i --disable-rss --rxd=512 --txd=512 --rxq=1 --txq=1 
    testpmd> start

5. running MonnGen outside the domain
# MoonGen opnfv-vsperf.lua

Actual results:
The qemu-kvm throws "VFIO_MAP_DMA failed: Bad address" error:
2022-11-29T03:45:02.552271Z qemu-kvm: VFIO_MAP_DMA failed: Bad address
2022-11-29T03:45:02.552436Z qemu-kvm: vfio_dma_map(0x55793ea09e10, 0x800000000, 0x200000, 0x7fb741000000) = -2 (No such file or directory)
2022-11-29T03:45:02.552457Z qemu-kvm: VFIO_MAP_DMA failed: Bad address
2022-11-29T03:45:02.552476Z qemu-kvm: vfio_dma_map(0x55793ea09e10, 0x800201000, 0x3000, 0x7fb758012000) = -14 (Bad address)
2022-11-29T03:45:02.552494Z qemu-kvm: VFIO_MAP_DMA failed: Bad address
2022-11-29T03:45:02.552508Z qemu-kvm: vfio_dma_map(0x55793ea09e10, 0x800400000, 0x200000, 0x7fb740e00000) = -14 (Bad address)
2022-11-29T03:45:02.552529Z qemu-kvm: VFIO_MAP_DMA failed: Bad address
2022-11-29T03:45:02.552542Z qemu-kvm: vfio_dma_map(0x55793ea09e10, 0x800601000, 0x3000, 0x7fb75800e000) = -14 (Bad address)
2022-11-29T03:45:02.634389Z qemu-kvm: VFIO_MAP_DMA failed: Bad address
2022-11-29T03:45:02.634421Z qemu-kvm: vfio_dma_map(0x55793ecbc800, 0x800000000, 0x200000, 0x7fb741000000) = -14 (Bad address)
2022-11-29T03:45:02.634440Z qemu-kvm: VFIO_MAP_DMA failed: Bad address
2022-11-29T03:45:02.634454Z qemu-kvm: vfio_dma_map(0x55793ecbc800, 0x800201000, 0x3000, 0x7fb758012000) = -14 (Bad address)
2022-11-29T03:45:02.634474Z qemu-kvm: VFIO_MAP_DMA failed: Bad address
2022-11-29T03:45:02.634488Z qemu-kvm: vfio_dma_map(0x55793ecbc800, 0x800400000, 0x200000, 0x7fb740e00000) = -14 (Bad address)
2022-11-29T03:45:02.634505Z qemu-kvm: VFIO_MAP_DMA failed: Bad address
2022-11-29T03:45:02.634519Z qemu-kvm: vfio_dma_map(0x55793ecbc800, 0x800601000, 0x3000, 0x7fb75800e000) = -14 (Bad address)

Expected results:
No any error

Additional info:
(1) this bug can always be reproduced in my auto tests
related auto cmd: 
[1]
# python3 /home/nfv-virt-rt-kvm/job_guest_passthrough.py --realtime=yes/no --hugepage_size=1G/2M --is_vf=no --upstream=no --upstream_type=latest --install_kernel=no --setup_kernel_options=no --os_type=rhel9 --iommu_support=yes --ovmf_support=yes

[2]
# python3 /home/nfv-virt-rt-kvm/job_passthrough_hotplug.py --realtime=no --hugepage_size=1G --upstream=no --upstream_type=latest --setup_kernel_options=yes --os_type=rhel9 --os_version=9 --iommu_support=yes --ovmf_support=yes

(2)
related test log in rt-kernel: http://10.73.72.41/log/2022_11_28/nfv_guest_dpdk_PF_1G
related test log in non-rt kernel: http://10.73.72.41/log/2022_11_29_1/nfv_guest_dpdk_PF_1G

Comment 1 Laurent Vivier 2022-11-29 17:27:23 UTC
Jason,

any idea of the cause of the problem?

Thanks

Comment 2 jason wang 2022-11-30 06:06:53 UTC
(In reply to Laurent Vivier from comment #1)
> Jason,
> 
> any idea of the cause of the problem?
> 
> Thanks

No much idea. It might be something that is related to vIOMMU.

Adding peter for more thoughts.

Thanks

Comment 3 Peter Xu 2022-11-30 14:57:29 UTC
(In reply to Yanghang Liu from comment #0)
> Description of problem:
> After doing the dpdk tests against the PF in the domain ,the qemu-kvm throws
> "VFIO_MAP_DMA failed: Bad address" error

Is the error reported after the test, or during the test, or when starting the test?

[...]

> Actual results:
> The qemu-kvm throws "VFIO_MAP_DMA failed: Bad address" error:
> 2022-11-29T03:45:02.552271Z qemu-kvm: VFIO_MAP_DMA failed: Bad address
> 2022-11-29T03:45:02.552436Z qemu-kvm: vfio_dma_map(0x55793ea09e10,
> 0x800000000, 0x200000, 0x7fb741000000) = -2 (No such file or directory)
> 2022-11-29T03:45:02.552457Z qemu-kvm: VFIO_MAP_DMA failed: Bad address
> 2022-11-29T03:45:02.552476Z qemu-kvm: vfio_dma_map(0x55793ea09e10,
> 0x800201000, 0x3000, 0x7fb758012000) = -14 (Bad address)
> 2022-11-29T03:45:02.552494Z qemu-kvm: VFIO_MAP_DMA failed: Bad address
> 2022-11-29T03:45:02.552508Z qemu-kvm: vfio_dma_map(0x55793ea09e10,
> 0x800400000, 0x200000, 0x7fb740e00000) = -14 (Bad address)
> 2022-11-29T03:45:02.552529Z qemu-kvm: VFIO_MAP_DMA failed: Bad address
> 2022-11-29T03:45:02.552542Z qemu-kvm: vfio_dma_map(0x55793ea09e10,
> 0x800601000, 0x3000, 0x7fb75800e000) = -14 (Bad address)
> 2022-11-29T03:45:02.634389Z qemu-kvm: VFIO_MAP_DMA failed: Bad address
> 2022-11-29T03:45:02.634421Z qemu-kvm: vfio_dma_map(0x55793ecbc800,
> 0x800000000, 0x200000, 0x7fb741000000) = -14 (Bad address)
> 2022-11-29T03:45:02.634440Z qemu-kvm: VFIO_MAP_DMA failed: Bad address
> 2022-11-29T03:45:02.634454Z qemu-kvm: vfio_dma_map(0x55793ecbc800,
> 0x800201000, 0x3000, 0x7fb758012000) = -14 (Bad address)
> 2022-11-29T03:45:02.634474Z qemu-kvm: VFIO_MAP_DMA failed: Bad address
> 2022-11-29T03:45:02.634488Z qemu-kvm: vfio_dma_map(0x55793ecbc800,
> 0x800400000, 0x200000, 0x7fb740e00000) = -14 (Bad address)
> 2022-11-29T03:45:02.634505Z qemu-kvm: VFIO_MAP_DMA failed: Bad address
> 2022-11-29T03:45:02.634519Z qemu-kvm: vfio_dma_map(0x55793ecbc800,
> 0x800601000, 0x3000, 0x7fb75800e000) = -14 (Bad address)

These 8 errors are trying to map 4 ranges to the two containers.  The initial failure is suspecious as that's the only -2 (ENOENT) and that error is rare in this path, not as common as an -14 (EFAULT).

I'd be curious where does it come from; I had a quick look on the dma map path in common non-rt code and I didn't quickly spot the possible path.  Maybe that's RT tree specific or maybe I just overlooked.

Since this is reproducing 100%, is this a regression (any old RHEL9 RT kernel worked before)?

Is it reproduceable on RHEL9 non-rt kernels?

Comment 4 Yanghang Liu 2022-12-02 02:55:08 UTC
(In reply to Peter Xu from comment #3)

Hi Peter,

> > Description of problem:
> > After doing the dpdk tests against the PF in the domain ,the qemu-kvm throws
> > "VFIO_MAP_DMA failed: Bad address" error
> 
> Is the error reported after the test, or during the test, or when starting
> the test?

The qemu-kvm reports this error when dpdk tests is finished


> [...]
> 
> > Actual results:
> > The qemu-kvm throws "VFIO_MAP_DMA failed: Bad address" error:
> > 2022-11-29T03:45:02.552271Z qemu-kvm: VFIO_MAP_DMA failed: Bad address
> > 2022-11-29T03:45:02.552436Z qemu-kvm: vfio_dma_map(0x55793ea09e10,
> > 0x800000000, 0x200000, 0x7fb741000000) = -2 (No such file or directory)
> > 2022-11-29T03:45:02.552457Z qemu-kvm: VFIO_MAP_DMA failed: Bad address
> > 2022-11-29T03:45:02.552476Z qemu-kvm: vfio_dma_map(0x55793ea09e10,
> > 0x800201000, 0x3000, 0x7fb758012000) = -14 (Bad address)
> > 2022-11-29T03:45:02.552494Z qemu-kvm: VFIO_MAP_DMA failed: Bad address
> > 2022-11-29T03:45:02.552508Z qemu-kvm: vfio_dma_map(0x55793ea09e10,
> > 0x800400000, 0x200000, 0x7fb740e00000) = -14 (Bad address)
> > 2022-11-29T03:45:02.552529Z qemu-kvm: VFIO_MAP_DMA failed: Bad address
> > 2022-11-29T03:45:02.552542Z qemu-kvm: vfio_dma_map(0x55793ea09e10,
> > 0x800601000, 0x3000, 0x7fb75800e000) = -14 (Bad address)
> > 2022-11-29T03:45:02.634389Z qemu-kvm: VFIO_MAP_DMA failed: Bad address
> > 2022-11-29T03:45:02.634421Z qemu-kvm: vfio_dma_map(0x55793ecbc800,
> > 0x800000000, 0x200000, 0x7fb741000000) = -14 (Bad address)
> > 2022-11-29T03:45:02.634440Z qemu-kvm: VFIO_MAP_DMA failed: Bad address
> > 2022-11-29T03:45:02.634454Z qemu-kvm: vfio_dma_map(0x55793ecbc800,
> > 0x800201000, 0x3000, 0x7fb758012000) = -14 (Bad address)
> > 2022-11-29T03:45:02.634474Z qemu-kvm: VFIO_MAP_DMA failed: Bad address
> > 2022-11-29T03:45:02.634488Z qemu-kvm: vfio_dma_map(0x55793ecbc800,
> > 0x800400000, 0x200000, 0x7fb740e00000) = -14 (Bad address)
> > 2022-11-29T03:45:02.634505Z qemu-kvm: VFIO_MAP_DMA failed: Bad address
> > 2022-11-29T03:45:02.634519Z qemu-kvm: vfio_dma_map(0x55793ecbc800,
> > 0x800601000, 0x3000, 0x7fb75800e000) = -14 (Bad address)
> 
> These 8 errors are trying to map 4 ranges to the two containers.  The
> initial failure is suspecious as that's the only -2 (ENOENT) and that error
> is rare in this path, not as common as an -14 (EFAULT).
> 
> I'd be curious where does it come from; I had a quick look on the dma map
> path in common non-rt code and I didn't quickly spot the possible path. 
> Maybe that's RT tree specific or maybe I just overlooked.
> 
> Since this is reproducing 100%, is this a regression (any old RHEL9 RT
> kernel worked before)?

I still need sometime to try and I will update my test result in the comment later

> Is it reproduceable on RHEL9 non-rt kernels?

Yes. This bug can be reproduced on the 5.14.0-202.el9.x86_64 as well.

I have uploaded the detailed log in the Description.

Comment 5 Laurent Vivier 2022-12-05 09:57:39 UTC
Peter,

can I assign this BZ to you?

Comment 6 Peter Xu 2022-12-05 14:58:47 UTC
Done.  Copy Alex.

Comment 7 Yanghang Liu 2023-02-17 06:38:33 UTC
This issue can still be reproduced in the following test environment:
qemu-kvm-7.2.0-8.el9.x86_64
tuned-2.19.0-1.el9.noarch
libvirt-9.0.0-5.el9.x86_64
python3-libvirt-9.0.0-1.el9.x86_64
openvswitch2.17-2.17.0-65.el9fdp.x86_64
dpdk-21.11.2-1.el9_1.x86_64
edk2-ovmf-20221207gitfff6d81270b5-5.el9.noarch
seabios-bin-1.16.1-1.el9.noarch

Comment 8 Yanghang Liu 2023-04-18 03:35:20 UTC
This issue can still be reproduced in the following test environment:
qemu-kvm-7.2.0-14.el9_2.x86_64
tuned-2.20.0-1.el9.noarch
libvirt-9.2.0-1.el9.x86_64
python3-libvirt-9.0.0-1.el9.x86_64
openvswitch3.1-3.1.0-17.el9fdp.x86_64
dpdk-22.11-3.el9_2.x86_64
edk2-ovmf-20230301gitf80f052277c8-2.el9.noarch
seabios-bin-1.16.1-1.el9.noarch
5.14.0-297.el9.x86_64

Comment 9 Peter Xu 2023-04-18 14:23:47 UTC
(In reply to Yanghang Liu from comment #4)
> > Since this is reproducing 100%, is this a regression (any old RHEL9 RT
> > kernel worked before)?
> 
> I still need sometime to try and I will update my test result in the comment
> later

Any update on this question?  Do you know the latest working kernel on RHEL9?  Is RHEL8 affected?  Thanks.

Comment 11 Yanghang Liu 2023-08-10 05:15:32 UTC
Hi Peter,

My test result show this issue is related to viommu + ixgbe PF:


When I reomve the viommu cfg(such as intel iommu device...) from the VM , the issue is gone.


Test log with viommu:
http://10.73.72.41/log/2023-08-02_13:50/nfv_guest_dpdk_PF_2M
http://10.73.72.41/log/2023-08-02_13:50/nfv_guest_dpdk_PF_1G

Test result without viommu : 
http://10.73.72.41/log/2023-08-07_20:17/nfv_guest_dpdk_PF_2M
http://10.73.72.41/log/2023-08-07_20:17/nfv_guest_dpdk_PF_1G

Comment 13 Yanghang Liu 2023-08-10 05:33:00 UTC
simplified the reproducer:

[1] strat a VM with two 82599ES PFs + viommu

# virt-install --machine=q35 --noreboot --name=rhel93 --memory=4096 --vcpus=4 --graphics type=vnc,port=5993,listen=0.0.0.0 --osinfo detect=on,require=off --check all=off --memtune hard_limit=12582912 --memballoon virtio,driver.iommu=on,driver.ats=on --import --noautoconsole --check all=off --network bridge=switch,model=virtio,mac=52:54:00:03:93:93,driver.iommu=on,driver.ats=on --disk path=/home/images/RHEL93.qcow2,bus=virtio,cache=none,format=qcow2,io=threads,size=20,driver.iommu=on,driver.ats=on --features ioapic.driver=qemu --iommu model=intel,driver.intremap=on,driver.caching_mode=on,driver.iotlb=on --boot=uefi --hostdev pci_0000_d8_00_0  --hostdev pci_0000_d8_00_1 

[2] reboot the VM

# virsh reboot $VM

[3] check the qemu-kvm log:

/usr/libexec/qemu-kvm \
-name guest=rhel93,debug-threads=on \
-S \
-object '{"qom-type":"secret","id":"masterKey0","format":"raw","file":"/var/lib/libvirt/qemu/domain-4-rhel93/master-key.aes"}' \
-blockdev '{"driver":"file","filename":"/usr/share/edk2/ovmf/OVMF_CODE.secboot.fd","node-name":"libvirt-pflash0-storage","auto-read-only":true,"discard":"unmap"}' \
-blockdev '{"node-name":"libvirt-pflash0-format","read-only":true,"driver":"raw","file":"libvirt-pflash0-storage"}' \
-blockdev '{"driver":"file","filename":"/var/lib/libvirt/qemu/nvram/rhel93_VARS.fd","node-name":"libvirt-pflash1-storage","auto-read-only":true,"discard":"unmap"}' \
-blockdev '{"node-name":"libvirt-pflash1-format","read-only":false,"driver":"raw","file":"libvirt-pflash1-storage"}' \
-machine pc-q35-rhel9.2.0,usb=off,smm=on,kernel_irqchip=split,dump-guest-core=off,memory-backend=pc.ram,pflash0=libvirt-pflash0-format,pflash1=libvirt-pflash1-format,hpet=off,acpi=on \
-accel kvm \
-cpu host,migratable=on \
-global driver=cfi.pflash01,property=secure,value=on \
-m size=4194304k \
-object '{"qom-type":"memory-backend-ram","id":"pc.ram","size":4294967296}' \
-overcommit mem-lock=off \
-smp 4,sockets=4,cores=1,threads=1 \
-uuid f7f5f2ad-4c56-406a-9b8a-3c4c91b5f4bb \
-no-user-config \
-nodefaults \
-chardev socket,id=charmonitor,fd=22,server=on,wait=off \
-mon chardev=charmonitor,id=monitor,mode=control \
-rtc base=utc,driftfix=slew \
-global kvm-pit.lost_tick_policy=delay \
-no-shutdown \
-global ICH9-LPC.disable_s3=1 \
-global ICH9-LPC.disable_s4=1 \
-boot strict=on \
-device '{"driver":"intel-iommu","id":"iommu0","intremap":"on","caching-mode":true,"device-iotlb":true}' \
-device '{"driver":"pcie-root-port","port":16,"chassis":1,"id":"pci.1","bus":"pcie.0","multifunction":true,"addr":"0x2"}' \
-device '{"driver":"pcie-root-port","port":17,"chassis":2,"id":"pci.2","bus":"pcie.0","addr":"0x2.0x1"}' \
-device '{"driver":"pcie-root-port","port":18,"chassis":3,"id":"pci.3","bus":"pcie.0","addr":"0x2.0x2"}' \
-device '{"driver":"pcie-root-port","port":19,"chassis":4,"id":"pci.4","bus":"pcie.0","addr":"0x2.0x3"}' \
-device '{"driver":"pcie-root-port","port":20,"chassis":5,"id":"pci.5","bus":"pcie.0","addr":"0x2.0x4"}' \
-device '{"driver":"pcie-root-port","port":21,"chassis":6,"id":"pci.6","bus":"pcie.0","addr":"0x2.0x5"}' \
-device '{"driver":"pcie-root-port","port":22,"chassis":7,"id":"pci.7","bus":"pcie.0","addr":"0x2.0x6"}' \
-device '{"driver":"pcie-root-port","port":23,"chassis":8,"id":"pci.8","bus":"pcie.0","addr":"0x2.0x7"}' \
-device '{"driver":"pcie-root-port","port":24,"chassis":9,"id":"pci.9","bus":"pcie.0","multifunction":true,"addr":"0x3"}' \
-device '{"driver":"pcie-root-port","port":25,"chassis":10,"id":"pci.10","bus":"pcie.0","addr":"0x3.0x1"}' \
-device '{"driver":"pcie-root-port","port":26,"chassis":11,"id":"pci.11","bus":"pcie.0","addr":"0x3.0x2"}' \
-device '{"driver":"pcie-root-port","port":27,"chassis":12,"id":"pci.12","bus":"pcie.0","addr":"0x3.0x3"}' \
-device '{"driver":"pcie-root-port","port":28,"chassis":13,"id":"pci.13","bus":"pcie.0","addr":"0x3.0x4"}' \
-device '{"driver":"pcie-root-port","port":29,"chassis":14,"id":"pci.14","bus":"pcie.0","addr":"0x3.0x5"}' \
-device '{"driver":"ich9-usb-ehci1","id":"usb","bus":"pcie.0","addr":"0x1d.0x7"}' \
-device '{"driver":"ich9-usb-uhci1","masterbus":"usb.0","firstport":0,"bus":"pcie.0","multifunction":true,"addr":"0x1d"}' \
-device '{"driver":"ich9-usb-uhci2","masterbus":"usb.0","firstport":2,"bus":"pcie.0","addr":"0x1d.0x1"}' \
-device '{"driver":"ich9-usb-uhci3","masterbus":"usb.0","firstport":4,"bus":"pcie.0","addr":"0x1d.0x2"}' \
-blockdev '{"driver":"file","filename":"/home/images/RHEL93.qcow2","aio":"threads","node-name":"libvirt-1-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}' \
-blockdev '{"node-name":"libvirt-1-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":"libvirt-1-storage","backing":null}' \
-device '{"driver":"virtio-blk-pci","iommu_platform":true,"ats":true,"bus":"pci.2","addr":"0x0","drive":"libvirt-1-format","id":"virtio-disk0","bootindex":1,"write-cache":"on"}' \
-netdev '{"type":"tap","fd":"23","vhost":true,"vhostfd":"25","id":"hostnet0"}' \
-device '{"driver":"virtio-net-pci","iommu_platform":true,"ats":true,"netdev":"hostnet0","id":"net0","mac":"52:54:00:02:93:93","bus":"pci.1","addr":"0x0"}' \
-chardev pty,id=charserial0 \
-device '{"driver":"isa-serial","chardev":"charserial0","id":"serial0","index":0}' \
-chardev socket,id=chrtpm,path=/run/libvirt/qemu/swtpm/4-rhel93-swtpm.sock \
-tpmdev emulator,id=tpm-tpm0,chardev=chrtpm \
-device '{"driver":"tpm-crb","tpmdev":"tpm-tpm0","id":"tpm0"}' \
-device '{"driver":"usb-tablet","id":"input0","bus":"usb.0","port":"1"}' \
-audiodev '{"id":"audio1","driver":"none"}' \
-vnc 0.0.0.0:93,audiodev=audio1 \
-device '{"driver":"bochs-display","id":"video0","vgamem":16777216,"bus":"pcie.0","addr":"0x1"}' \
-global ICH9-LPC.noreboot=off \
-watchdog-action reset \
-device '{"driver":"vfio-pci","host":"0000:d8:00.1","id":"hostdev0","bus":"pci.4","addr":"0x0"}' \
-device '{"driver":"vfio-pci","host":"0000:d8:00.0","id":"hostdev1","bus":"pci.5","addr":"0x0"}' \
-device '{"driver":"virtio-balloon-pci","iommu_platform":true,"ats":true,"id":"balloon0","bus":"pci.3","addr":"0x0"}' \
-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \
-msg timestamp=on
char device redirected to /dev/pts/1 (label charserial0)
2023-08-10T05:24:40.721119Z qemu-kvm: VFIO_MAP_DMA failed: Bad address
2023-08-10T05:24:40.721275Z qemu-kvm: vfio_dma_map(0x5564a4a9d070, 0x82200000, 0x80000, 0x7f12d647c000) = -2 (No such file or directory)
2023-08-10T05:24:40.721287Z qemu-kvm: VFIO_MAP_DMA failed: Bad address
2023-08-10T05:24:40.721294Z qemu-kvm: vfio_dma_map(0x5564a4a9d070, 0x82281000, 0x3000, 0x7f13e8c4b000) = -14 (Bad address)
2023-08-10T05:24:40.721304Z qemu-kvm: VFIO_MAP_DMA failed: Bad address
2023-08-10T05:24:40.721311Z qemu-kvm: vfio_dma_map(0x5564a4a9d070, 0x82400000, 0x80000, 0x7f13d7e7f000) = -14 (Bad address)
2023-08-10T05:24:40.721320Z qemu-kvm: VFIO_MAP_DMA failed: Bad address
2023-08-10T05:24:40.721327Z qemu-kvm: vfio_dma_map(0x5564a4a9d070, 0x82481000, 0x3000, 0x7f13e8c4f000) = -14 (Bad address)
2023-08-10T05:24:40.845351Z qemu-kvm: VFIO_MAP_DMA failed: Bad address
2023-08-10T05:24:40.845377Z qemu-kvm: vfio_dma_map(0x5564a46f7a50, 0x82200000, 0x80000, 0x7f12d647c000) = -14 (Bad address)
2023-08-10T05:24:40.845388Z qemu-kvm: VFIO_MAP_DMA failed: Bad address
2023-08-10T05:24:40.845395Z qemu-kvm: vfio_dma_map(0x5564a46f7a50, 0x82281000, 0x3000, 0x7f13e8c4b000) = -14 (Bad address)
2023-08-10T05:24:40.845404Z qemu-kvm: VFIO_MAP_DMA failed: Bad address
2023-08-10T05:24:40.845411Z qemu-kvm: vfio_dma_map(0x5564a46f7a50, 0x82400000, 0x80000, 0x7f13d7e7f000) = -14 (Bad address)
2023-08-10T05:24:40.845419Z qemu-kvm: VFIO_MAP_DMA failed: Bad address
2023-08-10T05:24:40.845425Z qemu-kvm: vfio_dma_map(0x5564a46f7a50, 0x82481000, 0x3000, 0x7f13e8c4f000) = -14 (Bad address)


It become possible now to check the regression kernel version manually after we simplified the reproducer.


Note You need to log in before you can comment on or make changes to this bug.