Bug 1814017 - qemu-kvm: segfault with kata sr-iov hotplug
Summary: qemu-kvm: segfault with kata sr-iov hotplug
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: qemu
Version: 32
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Fedora Virtualization Maintainers
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-03-16 18:36 UTC by Adrián Moreno
Modified: 2020-04-02 00:31 UTC (History)
12 users (show)

Fixed In Version: qemu-4.2.0-7.fc32
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-04-02 00:31:19 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
system logs (1006.36 KB, text/plain)
2020-03-16 18:36 UTC, Adrián Moreno
no flags Details

Description Adrián Moreno 2020-03-16 18:36:06 UTC
Created attachment 1670636 [details]
system logs

Description of problem:

When running fedora32 (also reproduced with upstream qemu 4.2.0) kata + qemu-kvm + sr-iov device plugin qemu crashes.

Non-SRIOV containers work well.

Version-Release number of selected component (if applicable):

Package tested:
kata-ksm-throttler.x86_64             1.10.0-2.fc32
kata-osbuilder.x86_64                 1.10.0-8.fc32
kata-proxy.x86_64                     1.10.0-2.fc32
kata-runtime.x86_64                   1.10.0-3.fc32 
kata-shim.x86_64                      1.10.0-3.fc32

qemu-kvm.x86_64                       2:4.2.0-5.fc32

How reproducible:
Always
Reproducible with podman and with kubernetes + crio

Steps to Reproduce:
1. Setup a baremetal system with an SR-IOV capable NIC

- Add "iommu=pt intel_iommu=on" to the kernel's command line
$ grubby --update-kernel=ALL --args="iommu=pt intel_iommu=on"
$ reboot
- Add VFs
$ echo 4 > /sys/class/net/ens2f0/device/sriov_numvfs
- Insert vfio module
$ modprobe vfio
- Bind the devices to the vfio-pci driver 
$ driverctl set-override 0000:65:02.1 0000:65:02.12
- Select a vfio device
$ ls /dev/vfio
40  41  42  43  vfio

2. Start podman
$ sudo podman --runtime=/usr/bin/kata-runtime run --device /dev/vfio/40:/dev/vfio/40 --security-opt label=disable -it fedora /bin/bash


Actual results:

The container is created (the previous command gives you "bash" prompt in the container" but a second later it freezes, qemu segfaults and podman hangs.

After that you get errors from kata saying "Proxy is not running".


Expected results:

qemu should not segfault

Additional info:

The crash occurs just after the vfio device is hotplugged into qemu, the guest's kernel last words are:

shpchp 0000:00:02.0: Device 0000:01:01.0 already exists at 0000:01:01, cannot hot-add
shpchp 0000:00:02.0: Cannot add device at 0000:01:01


Sometimes the backtrace has a corrupted stack such as:
Mar 16 13:24:32 virtlab700.virt.lab.eng.bos.redhat.com systemd-coredump[637001]: Process 636670 (qemu-system-x86) of user 0 dumped core.
                                                                                  
                                                                                 Stack trace of thread 636676:
                                                                                 #0  0x000055ff00482b69 notifier_remove (/usr/bin/qemu-system-x86_64 + 0x74ab69)
                                                                                  
                                                                                 Stack trace of thread 636670:
                                                                                 #0  0x00007fee879fd750 n/a (n/a + 0x0)

But other times, a full backtrace is available:

           PID: 664514 (qemu-system-x86)                                                                                                                                                                                                        
           UID: 0 (root)                                                                                                                                                                                                                        
           GID: 0 (root)                                                                                                                                                                                                                        
        Signal: 11 (SEGV)                                                                                                                                                                                                                       
     Timestamp: Mon 2020-03-16 14:22:03 EDT (6min ago)                                                                                                                                                                                          
  Command Line: /usr/bin/qemu-system-x86_64 -machine accel=kvm -name sandbox-feb6baad5e6aa1d3dfb5eb9e1d1071a5e1897d574de9a33d89971de9954860b5 -uuid a94a946b-cf03-4ae9-88ef-b9221c4e78c8 -machine q35,accel=kvm,kernel_irqchip -cpu host -qmp un
ix:/run/vc/vm/feb6baad5e6aa1d3dfb5eb9e1d1071a5e1897d574de9a33d89971de9954860b5/qmp.sock,server,nowait -m 2048M,slots=10,maxmem=47762M -device pci-bridge,bus=pcie.0,id=pci-bridge-0,chassis_nr=1,shpc=on,addr=2,romfile= -device virtio-serial-p
ci,disable-modern=false,id=serial0,romfile= -device virtconsole,chardev=charconsole0,id=console0 -chardev socket,id=charconsole0,path=/run/vc/vm/feb6baad5e6aa1d3dfb5eb9e1d1071a5e1897d574de9a33d89971de9954860b5/console.sock,server,nowait -de
vice virtio-scsi-pci,id=scsi0,disable-modern=false,romfile= -object rng-random,id=rng0,filename=/dev/urandom -device virtio-rng,rng=rng0,romfile= -device vhost-vsock-pci,disable-modern=false,vhostfd=3,id=vsock-2119658267,guest-cid=211965826
7,romfile= -chardev socket,id=char-a0e6a5384ea6fc14,path=/run/vc/vm/feb6baad5e6aa1d3dfb5eb9e1d1071a5e1897d574de9a33d89971de9954860b5/vhost-fs.sock -device vhost-user-fs-pci,chardev=char-a0e6a5384ea6fc14,tag=kataShared -netdev tap,id=network
-0,vhost=on,vhostfds=4,fds=5 -device driver=virtio-net-pci,netdev=network-0,mac=ea:fb:72:d2:45:4c,disable-modern=false,mq=on,vectors=4,romfile= -global kvm-pit.lost_tick_policy=discard -vga none -no-user-config -nodefaults -nographic -daemo
nize -object memory-backend-file,id=dimm1,size=2048M,mem-path=/dev/shm,share=on -numa node,memdev=dimm1 -kernel /usr/lib/modules/5.6.0-0.rc3.git0.1.fc32.x86_64/vmlinuz -initrd /var/cache/kata-containers/osbuilder-images/5.6.0-0.rc3.git0.1.f
c32.x86_64/fedora-kata-5.6.0-0.rc3.git0.1.fc32.x86_64.initrd -append tsc=reliable no_timer_check rcupdate.rcu_expedited=1 i8042.direct=1 i8042.dumbkbd=1 i8042.nopnp=1 i8042.noaux=1 noreplace-smp reboot=k console=hvc0 console=hvc1 iommu=off 
cryptomgr.notests net.ifnames=0 pci=lastbus=0 debug panic=1 nr_cpus=28 agent.use_vsock=true agent.log=debug systemd.unified_cgroup_hierarchy=0 agent.log=debug initcall_debug -pidfile /run/vc/vm/feb6baad5e6aa1d3dfb5eb9e1d1071a5e1897d574de9a3
3d89971de9954860b5/pid -D /run/vc/vm/feb6baad5e6aa1d3dfb5eb9e1d1071a5e1897d574de9a33d89971de9954860b5/qemu.log -smp 1,cores=1,threads=1,sockets=28,maxcpus=28                                                                                   
    Executable: /usr/bin/qemu-system-x86_64                                                                                                                                                                                                     
 Control Group: /machine.slice/libpod-conmon-feb6baad5e6aa1d3dfb5eb9e1d1071a5e1897d574de9a33d89971de9954860b5.scope                                                                                                                             
          Unit: libpod-conmon-feb6baad5e6aa1d3dfb5eb9e1d1071a5e1897d574de9a33d89971de9954860b5.scope                                                                                                                                            
         Slice: machine.slice                                                                                                                                                                                                                   
       Boot ID: 74f66e0abeea4caf9aa504566efc9b49                                                                                                                                                                                                
    Machine ID: 569fcde0ad25486993785471494b2b6a                                                                                                                                                                                                
      Hostname: virtlab700.virt.lab.eng.bos.redhat.com
       Storage: /var/lib/systemd/coredump/core.qemu-system-x86.0.74f66e0abeea4caf9aa504566efc9b49.664514.1584382923000000000000.lz4 (truncated)
       Message: Process 664514 (qemu-system-x86) of user 0 dumped core.
                 
                Stack trace of thread 664520:
                #0  0x000055818e3efb69 notifier_remove (/usr/bin/qemu-system-x86_64 + 0x74ab69)
                #1  0x000055818e057ecd vfio_exitfn (/usr/bin/qemu-system-x86_64 + 0x3b2ecd)
                #2  0x000055818e208e1b pci_qdev_unrealize (/usr/bin/qemu-system-x86_64 + 0x563e1b)
                #3  0x000055818e17b6ef device_set_realized (/usr/bin/qemu-system-x86_64 + 0x4d66ef)
                #4  0x000055818e2eae8b property_set_bool (/usr/bin/qemu-system-x86_64 + 0x645e8b)
                #5  0x000055818e2f01c4 object_property_set_qobject (/usr/bin/qemu-system-x86_64 + 0x64b1c4)
                #6  0x000055818e2edbca object_property_set_bool (/usr/bin/qemu-system-x86_64 + 0x648bca)
                #7  0x000055818e20e230 shpc_free_devices_in_slot (/usr/bin/qemu-system-x86_64 + 0x569230)
                #8  0x000055818e20e363 shpc_slot_command (/usr/bin/qemu-system-x86_64 + 0x569363)
                #9  0x000055818e20e5bf shpc_command (/usr/bin/qemu-system-x86_64 + 0x5695bf)
                #10 0x000055818dff532b memory_region_write_accessor (/usr/bin/qemu-system-x86_64 + 0x35032b)
                #11 0x000055818dff414e access_with_adjusted_size (/usr/bin/qemu-system-x86_64 + 0x34f14e)
                #12 0x000055818dff7fe4 memory_region_dispatch_write (/usr/bin/qemu-system-x86_64 + 0x352fe4)
                #13 0x000055818dfa21b0 flatview_write_continue (/usr/bin/qemu-system-x86_64 + 0x2fd1b0)
                #14 0x000055818dfa6257 flatview_write (/usr/bin/qemu-system-x86_64 + 0x301257)
                #15 0x000055818e007570 kvm_cpu_exec (/usr/bin/qemu-system-x86_64 + 0x362570)
                #16 0x000055818dfeb91c qemu_kvm_cpu_thread_fn (/usr/bin/qemu-system-x86_64 + 0x34691c)
                #17 0x000055818e3e2ab3 qemu_thread_start (/usr/bin/qemu-system-x86_64 + 0x73dab3)
                #18 0x00007f798721b432 n/a (n/a + 0x0)


Full core dump available if needed

Comment 1 Adrián Moreno 2020-03-16 18:38:17 UTC
I did not have time to reproduce it on rhel, yet.

Comment 6 Fabiano Fidêncio 2020-03-17 10:37:31 UTC
So, this bug is a DUP of https://bugzilla.redhat.com/show_bug.cgi?id=1782678 (which is a downstream bug).

I've opened the following PR to qemu: https://src.fedoraproject.org/rpms/qemu/pull-request/8. It's opened against rawhide, but the patch should also be backported to f32.

Adrian, thanks a lot for reporting the issue and for the really quick test on the patch provided by Peter Xu (https://lists.gnu.org/archive/html/qemu-devel/2019-12/msg05493.html).

Comment 7 Fedora Update System 2020-03-23 21:46:41 UTC
FEDORA-2020-e24e4f44e6 has been submitted as an update to Fedora 32. https://bodhi.fedoraproject.org/updates/FEDORA-2020-e24e4f44e6

Comment 8 Fedora Update System 2020-03-24 01:52:50 UTC
FEDORA-2020-e24e4f44e6 has been pushed to the Fedora 32 testing repository.
In short time you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2020-e24e4f44e6`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2020-e24e4f44e6

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 9 Fedora Update System 2020-04-02 00:31:19 UTC
FEDORA-2020-e24e4f44e6 has been pushed to the Fedora 32 stable repository.
If problem still persists, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.