Bug 1809978

Summary: Boot guest with device assignment, choose PC machine and "pci=nomsi" in kernel line, guest kernel will show "Call Trace" when booting testpmd
Product: Red Hat Enterprise Linux 7 Reporter: Pei Zhang <pezhang>
Component: qemu-kvm-rhevAssignee: Ariel Adam <aadam>
Status: CLOSED WONTFIX QA Contact: Pei Zhang <pezhang>
Severity: low Docs Contact:
Priority: low    
Version: 7.9CC: chayang, jinzhao, juzhang, peterx, virt-maint
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1811863 (view as bug list) Environment:
Last Closed: 2020-05-11 07:46:22 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1811863, 1811885    

Description Pei Zhang 2020-03-04 10:34:19 UTC
Description of problem:
Boot guest with device assignment and PC machine type, and add "pci=nomsi" to guest kernel line. Next start dpdk's testpmd in guest, there will be "Call Trace" info in guest kernel line.

Version-Release number of selected component (if applicable):
3.10.0-1127.el7.x86_64
qemu-kvm-rhev-2.12.0-44.el7.x86_64


How reproducible:
100%

Steps to Reproduce:
1. Boot qemu with device assignment using PC machine type

/usr/libexec/qemu-kvm -name rhel7.8 \
-M pc \
-cpu host -m 8G \
-smp 4 \
-drive file=/home/images_nfv-virt-rt-kvm/rhel7.8.qcow2,format=qcow2,if=none,id=drive-virtio-blk0,werror=stop,rerror=stop \
-device virtio-blk-pci,drive=drive-virtio-blk0,id=virtio-blk0,bootindex=1 \
-vnc :2 \
-monitor stdio \
-nodefaults \
-device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vgamem_mb=16 \
-serial unix:/tmp/monitor1,server,nowait \
-device vfio-pci,host=0000:5e:00.0,id=hostdev0 \
-device vfio-pci,host=0000:5e:00.1,id=hostdev1 \
-boot menu=on \

2. Add pci=nomsi to guest kernel line

# cat /proc/cmdline 
BOOT_IMAGE=/vmlinuz-3.10.0-1127.el7.x86_64 root=/dev/mapper/rhel_bootp--73--75--21-root ro console=tty0 console=ttyS0,115200n8 biosdevname=0 crashkernel=auto spectre_v2=retpoline rd.lvm.lv=rhel_bootp-73-75-21/root rd.lvm.lv=rhel_bootp-73-75-21/swap rhgb quiet LANG=en_US.UTF-8 default_hugepagesz=1G iommu=pt intel_iommu=on skew_tick=1 nohz=on nohz_full=1,2,3,4,5 rcu_nocbs=1,2,3,4,5 tuned.non_isolcpus=00000001 intel_pstate=disable nosoftlockup pci=nomsi

3. In guest, load VFIO, hugepage

# modprobe vfio enable_unsafe_noiommu_mode=Y
# modprobe vfio-pci

# echo 1 >  /sys/devices/system/node/node0/hugepages/hugepages-1048576kB/nr_hugepages

# dpdk-devbind --bind=vfio-pci 0000:00:05.0
# dpdk-devbind --bind=vfio-pci 0000:00:06.0

4. In guest, start testpmd. Guest kernel get Call Trace info.

/usr/bin/testpmd \
        -l 1,2,3 \
        -n 4 \
        -d /usr/lib64/librte_pmd_ixgbe.so \
        -w 0000:00:05.0 -w 0000:00:06.0 \
        -- \
        --nb-cores=2 \
        -i \
        --disable-rss \
        --rxd=512 --txd=512 \
        --rxq=1 --txq=1 \

# dmesg
[  296.454061] iommu: Adding device 0000:00:05.0 to group 0
[  296.454066] vfio-pci 0000:00:05.0: Adding kernel taint for vfio-noiommu group on device
[  296.597968] ixgbe 0000:00:06.0: complete
[  296.653289] iommu: Adding device 0000:00:06.0 to group 1
[  296.653294] vfio-pci 0000:00:06.0: Adding kernel taint for vfio-noiommu group on device
[  299.352214] vfio-pci 0000:00:05.0: vfio-noiommu device opened by user (testpmd:1723)
[  299.998101] vfio-pci 0000:00:06.0: vfio-noiommu device opened by user (testpmd:1723)
[  309.462440] irq 10: nobody cared (try booting with the "irqpoll" option)
[  309.463894] CPU: 0 PID: 0 Comm: swapper/0 Kdump: loaded Tainted: G     U         ------------   3.10.0-1127.el7.x86_64 #1
[  309.463896] Hardware name: Red Hat KVM, BIOS 1.11.0-2.el7 04/01/2014
[  309.463898] Call Trace:
[  309.463914]  <IRQ>  [<ffffffffae57ff85>] dump_stack+0x19/0x1b
[  309.463994]  [<ffffffffadf52ed2>] __report_bad_irq+0x32/0xd0
[  309.463998]  [<ffffffffadf532f2>] note_interrupt+0x132/0x1f0
[  309.464002]  [<ffffffffadf50995>] handle_irq_event_percpu+0x55/0x80
[  309.464005]  [<ffffffffadf509fc>] handle_irq_event+0x3c/0x60
[  309.464008]  [<ffffffffadf5409d>] handle_fasteoi_irq+0x5d/0x110
[  309.464023]  [<ffffffffade2f5f4>] handle_irq+0xe4/0x1a0
[  309.464032]  [<ffffffffae59786d>] do_IRQ+0x4d/0xf0
[  309.464040]  [<ffffffffae58936a>] common_interrupt+0x16a/0x16a
[  309.464057]  [<ffffffffadea563a>] ? __do_softirq+0x9a/0x280
[  309.464062]  [<ffffffffae59642c>] call_softirq+0x1c/0x30
[  309.464065]  [<ffffffffade2f715>] do_softirq+0x65/0xa0
[  309.464068]  [<ffffffffadea5a15>] irq_exit+0x105/0x110
[  309.464071]  [<ffffffffae5979c8>] smp_apic_timer_interrupt+0x48/0x60
[  309.464075]  [<ffffffffae593efa>] apic_timer_interrupt+0x16a/0x170
[  309.464077]  <EOI>  [<ffffffffae587c20>] ? __cpuidle_text_start+0x8/0x8
[  309.464083]  [<ffffffffae587e6b>] ? native_safe_halt+0xb/0x20
[  309.464086]  [<ffffffffae587c3e>] default_idle+0x1e/0xc0
[  309.464094]  [<ffffffffade37c80>] arch_cpu_idle+0x20/0xc0
[  309.464105]  [<ffffffffadf01c2a>] cpu_startup_entry+0x14a/0x1e0
[  309.464109]  [<ffffffffae56e687>] rest_init+0x77/0x80
[  309.464136]  [<ffffffffaeb8b1cf>] start_kernel+0x44b/0x46c
[  309.464140]  [<ffffffffaeb8ab84>] ? repair_env_string+0x5c/0x5c
[  309.464144]  [<ffffffffaeb8a120>] ? early_idt_handler_array+0x120/0x120
[  309.464147]  [<ffffffffaeb8a738>] x86_64_start_reservations+0x24/0x26
[  309.464150]  [<ffffffffaeb8a88e>] x86_64_start_kernel+0x154/0x177
[  309.464160]  [<ffffffffade000d5>] start_cpu+0x5/0x14
[  309.464162] handlers:
[  309.464837] [<ffffffffc0478500>] vp_interrupt [virtio_pci]
[  309.466018] Disabling IRQ #10

Actual results:
Guest get kernel Call trace info.

Expected results:
Guest should not get kernel Call trace info.

Additional info:
1. With Q35 machine type, this issue is gone.

2. Without "pci=nomsi" in guest kernel line, this issue is gone.

3. This bug was found by handling https://bugzilla.redhat.com/show_bug.cgi?id=1786404#c26.