Hide Forgot
Description of problem: Hot-plug/unplug a VF to guest many times , guest stop running , qemu crash . Version-Release number of selected component (if applicable): qemu-kvm-rhev-2.2.0-8.el7.x86_64 kernel-3.10.0-234.el7.x86_64 How reproducible: 90% Steps to Reproduce: 1.prepare a sr-iov evn , find the VF. # lspci | grep Eth 04:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20) 04:00.1 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20) 86:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01) 86:00.1 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01) 86:10.0 Ethernet controller: Intel Corporation 82599 Ethernet Controller Virtual Function (rev 01) 86:10.1 Ethernet controller: Intel Corporation 82599 Ethernet Controller Virtual Function (rev 01) 86:10.2 Ethernet controller: Intel Corporation 82599 Ethernet Controller Virtual Function (rev 01) 86:10.3 Ethernet controller: Intel Corporation 82599 Ethernet Controller Virtual Function (rev 01) 2.prepare a healthy guest and a net xml like following : # cat vf.xml <hostdev mode='subsystem' type='pci' managed='yes'> <source> <address domain='0x0000' bus='0x86' slot='0x10' function='0x1'/> </source> </hostdev> 2. start the guest , then hot-plug/unplug the VF to guest . # for i in {1..50};do virsh attach-device r7.2 vf.xml ;sleep 3;virsh detach-device r7.2 vf.xml;done ...... Device attached successfully error: Failed to detach device from vf.xml error: operation failed: domain is no longer running Actual results: Guest stop running , qemu crash . Expected results: Guest still running and VF can be detached successfully. Additional info: 1.sometimes it will give error like following : error: Failed to detach device from vf.xml error: Unable to read from monitor: Connection reset by peer 2.backtrace info (gdb) t a a bt Thread 3 (Thread 0x7f3b07fff700 (LWP 16965)): #0 0x00007f3b16fd5b7d in poll () at ../sysdeps/unix/syscall-template.S:81 #1 0x00007f3b17f2ad37 in poll (__timeout=<optimized out>, __nfds=20, __fds=0x7f3ab40008f8) at /usr/include/bits/poll2.h:46 #2 red_worker_main (arg=<optimized out>) at red_worker.c:12200 #3 0x00007f3b1d12edf5 in start_thread (arg=0x7f3b07fff700) at pthread_create.c:308 #4 0x00007f3b16fe01ad in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 Thread 2 (Thread 0x7f3b1e523a40 (LWP 16932)): #0 __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135 #1 0x00007f3b1d130d32 in _L_lock_791 () from /usr/lib64/libpthread-2.17.so #2 0x00007f3b1d130c38 in __GI___pthread_mutex_lock (mutex=mutex@entry=0x7f3b1ecf9660 <qemu_global_mutex>) at pthread_mutex_lock.c:64 #3 0x00007f3b1e860ad9 in qemu_mutex_lock (mutex=mutex@entry=0x7f3b1ecf9660 <qemu_global_mutex>) at util/qemu-thread-posix.c:76 #4 0x00007f3b1e613c00 in qemu_mutex_lock_iothread () at /usr/src/debug/qemu-2.2.0/cpus.c:1123 #5 0x00007f3b1e7f58eb in os_host_main_loop_wait (timeout=6739783) at main-loop.c:242 #6 main_loop_wait (nonblocking=<optimized out>) at main-loop.c:494 #7 0x00007f3b1e5ec0fe in main_loop () at vl.c:1882 #8 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4411 Thread 1 (Thread 0x7f3b0d62e700 (LWP 16946)): #0 object_finalize_child_property (obj=0x7f3b21d56e20, name=0x7f3b21c48fe0 "VFIO 0000:86:10.1 BAR 3 mmap msix-hi[0]", opaque=0x7f3b21930530) at qom/object.c:1078 #1 0x00007f3b1e7bbc28 in object_property_del_all (obj=0x7f3b21d56e20) at qom/object.c:367 #2 object_finalize (data=0x7f3b21d56e20) at qom/object.c:412 #3 object_unref (obj=0x7f3b21d56e20) at qom/object.c:720 #4 0x00007f3b1e7bbf9c in object_property_del (obj=0x7f3b20fd3e30, name=0x7f3b2124d070 "hostdev0", errp=errp@entry=0x0) at qom/object.c:800 #5 0x00007f3b1e7bc061 in object_property_del_child (errp=0x0, child=0x7f3b21d56e20, obj=<optimized out>) at qom/object.c:383 #6 object_unparent (obj=obj@entry=0x7f3b21d56e20) at qom/object.c:392 #7 0x00007f3b1e725196 in acpi_pcihp_eject_slot (s=<optimized out>, bsel=<optimized out>, slots=<optimized out>) at hw/acpi/pcihp.c:139 #8 0x00007f3b1e62516a in access_with_adjusted_size (addr=addr@entry=8, value=value@entry=0x7f3b0d62daf0, size=size@entry=4, access_size_min=<optimized out>, access_size_max=<optimized out>, access= 0x7f3b1e6252e0 <memory_region_write_accessor>, mr=0x7f3b211992c8) at /usr/src/debug/qemu-2.2.0/memory.c:480 #9 0x00007f3b1e629e67 in memory_region_dispatch_write (size=4, data=33554432, addr=8, mr=0x7f3b211992c8) at /usr/src/debug/qemu-2.2.0/memory.c:1122 #10 io_mem_write (mr=mr@entry=0x7f3b211992c8, addr=8, val=<optimized out>, size=4) at /usr/src/debug/qemu-2.2.0/memory.c:1973 #11 0x00007f3b1e5f3de3 in address_space_rw (as=0x7f3b1ec9d3e0 <address_space_io>, addr=addr@entry=44552, buf=0x7f3b1e558000 <Address 0x7f3b1e558000 out of bounds>, len=len@entry=4, is_write=is_write@entry=true) at /usr/src/debug/qemu-2.2.0/exec.c:2155 #12 0x00007f3b1e624610 in kvm_handle_io (count=1, size=4, direction=<optimized out>, data=<optimized out>, port=44552) at /usr/src/debug/qemu-2.2.0/kvm-all.c:1635 #13 kvm_cpu_exec (cpu=cpu@entry=0x7f3b2113b1c0) at /usr/src/debug/qemu-2.2.0/kvm-all.c:1792 #14 0x00007f3b1e612ab2 in qemu_kvm_cpu_thread_fn (arg=0x7f3b2113b1c0) at /usr/src/debug/qemu-2.2.0/cpus.c:953 #15 0x00007f3b1d12edf5 in start_thread (arg=0x7f3b0d62e700) at pthread_create.c:308 #16 0x00007f3b16fe01ad in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
I will meet this issue if I hot-unplug the VF 1st after the host booting. Steps: 1. booting host. 2. hot-plug vf to guest 3. hot-unplug vf from guest, then guest crash.
For RHEL7.2, I believe this is already fixed in QEMU-2.3. Please retest with QEMU-2.3 packages. However, this is a pretty significant functional regression from RHEL7.0 to have a QEMU segfault so likely from the hotplug of a common device. In my case, the first attempt to hot-unplug an 82576 VF resulted in the segfault. I'm therefore setting the Regression keyword and proposing for z-stream. The necessary upstream patch is: commit 3a4dbe6aa934370a92372528c1255ee1504965ee Author: Alex Williamson <alex.williamson> Date: Wed Feb 4 11:45:32 2015 -0700 vfio-pci: Fix missing unparent of dynamically allocated MemoryRegion Commit d8d95814609e added explicit object_unparent() calls for dynamically allocated MemoryRegions. The VFIOMSIXInfo structure also contains such a MemoryRegion, covering the mmap'd region of a PCI BAR above the MSI-X table. This structure is freed as part of the class exit function and therefore also needs an explicit object_unparent(). Failing to do this results in random segfaults due to fields within the structure, often the class pointer, being reclaimed and corrupted by the time object_finalize_child_property() is called for the object. Signed-off-by: Alex Williamson <alex.williamson> Reviewed-by: Paolo Bonzini <pbonzini> Cc: qemu-stable # 2.2 This applies cleanly to our tree and my testing indicates that it resolves the problem. Moving this bz to MODIFIED for re-test, if z-stream is approved I can post the above patch against the z-stream bz.
test with 7.1.z : test version : host : kernel-3.10.0-229.2.1.el7.x86_64 qemu-kvm-rhev-2.1.2-23.el7_1.1.x86_64 libvirt-1.2.8-16.el7_1.3.x86_64 guest kernel : kernel-3.10.0-229.el7.x86_64 test result : hot-plug/unplug VF successfully , qemu won't crash and guest is still running .
(In reply to Pei Zhang from comment #4) > test with 7.1.z : > > test version : > > host : > kernel-3.10.0-229.2.1.el7.x86_64 > qemu-kvm-rhev-2.1.2-23.el7_1.1.x86_64 > libvirt-1.2.8-16.el7_1.3.x86_64 > > guest kernel : kernel-3.10.0-229.el7.x86_64 > > test result : > hot-plug/unplug VF successfully , qemu won't crash and guest is still > running . I'm confused, what was tested in comment 0? Are we not shipping qemu-kvm-rhev-2.2.0-8.el7.x86_64? Where does the 2.2 version come from if 2.1 is in the 7.1.z stream?
(In reply to Alex Williamson from comment #5) > (In reply to Pei Zhang from comment #4) > > test with 7.1.z : > > > > test version : > > > > host : > > kernel-3.10.0-229.2.1.el7.x86_64 > > qemu-kvm-rhev-2.1.2-23.el7_1.1.x86_64 > > libvirt-1.2.8-16.el7_1.3.x86_64 > > > > guest kernel : kernel-3.10.0-229.el7.x86_64 > > > > test result : > > hot-plug/unplug VF successfully , qemu won't crash and guest is still > > running . > > I'm confused, what was tested in comment 0? Are we not shipping > qemu-kvm-rhev-2.2.0-8.el7.x86_64? Where does the 2.2 version come from if > 2.1 is in the 7.1.z stream? Hi Alex, RHEL7.1 qemu-kvm-rhev GA version is qemu-kvm-rhev-2.1.2-23.el7. Technically speaking, qemu-2.2 should belong to rhel7.2 although final rhel7.2 rhev might based on qemu2.3. In addition, according to comment0 and comment4, seems this regression bz comes from rhel7.2(qemu2.2x) and rhel7.1 does not hit this issue. Best Regards, Junyi
Retried with qemu-kvm directly. Hot plug then unplug in a loop for 50 iterations. Host kernel: 3.10.0-229.2.1.el7.x86_64 Guest kernel: 3.10.0-229.2.1.el7.x86_64 Here is the test result: qemu-kvm-1.5.3-87.el7 not reproducible qemu-kvm-rhev-2.1.2-23.el7 not reproducible qemu-kvm-rhev-2.1.2-23.el7_1.1 not reproducible qemu-kvm-rhev-2.2.0-8.el7 reproducible 2/2 For qemu-kvm-rhev-2.2.0-8.el7, tried twice, first at 3rd attempt, second at 5th attempt. CLI:: /usr/libexec/qemu-kvm -name test -S -machine pc-i440fx-rhel7.1.0,accel=kvm,usb=off -m 2G -realtime mlock=on -smp 2,sockets=2,cores=1,threads=1 -no-user-config -nodefaults -monitor stdio -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot menu=on,strict=on -device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x5.0x7 -device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x5 -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x5.0x1 -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x5.0x2 -drive file=RHEL-7.1-20141111.0.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x7 -net none -device usb-tablet,id=input0 -vnc :1 -vga cirrus -msg timestamp=on -qmp unix:/home/chayang/qmp,server,nowait Backtrace: Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7fffe37fe700 (LWP 27380)] object_finalize_child_property (obj=0x55555975d440, name=0x5555571a4df0 "VFIO 0000:03:10.3 BAR 3 mmap msix-hi[0]", opaque=0x5555575e01b0) at qom/object.c:1078 1078 if (child->class->unparent) { (gdb) bt #0 object_finalize_child_property (obj=0x55555975d440, name=0x5555571a4df0 "VFIO 0000:03:10.3 BAR 3 mmap msix-hi[0]", opaque=0x5555575e01b0) at qom/object.c:1078 #1 0x00005555557abc28 in object_property_del_all (obj=0x55555975d440) at qom/object.c:367 #2 object_finalize (data=0x55555975d440) at qom/object.c:412 #3 object_unref (obj=0x55555975d440) at qom/object.c:720 #4 0x00005555557abf9c in object_property_del (obj=0x5555561a8630, name=0x5555564062e0 "hostnet_VF", errp=<optimized out>) at qom/object.c:800 #5 0x00005555557ac061 in object_property_del_child (errp=0x0, child=<optimized out>, obj=<optimized out>) at qom/object.c:383 #6 object_unparent (obj=<optimized out>) at qom/object.c:392 #7 0x0000555555715196 in acpi_pcihp_eject_slot (s=<optimized out>, bsel=<optimized out>, slots=<optimized out>) at hw/acpi/pcihp.c:139 #8 0x000055555561516a in access_with_adjusted_size (addr=addr@entry=8, value=value@entry=0x7fffe37fdaf0, size=size@entry=4, access_size_min=<optimized out>, access_size_max=<optimized out>, access=0x5555556152e0 <memory_region_write_accessor>, mr=0x5555562ffe28) at /usr/src/debug/qemu-2.2.0/memory.c:480 #9 0x0000555555619e67 in memory_region_dispatch_write (size=4, data=8, addr=8, mr=0x5555562ffe28) at /usr/src/debug/qemu-2.2.0/memory.c:1122 #10 io_mem_write (mr=mr@entry=0x5555562ffe28, addr=8, val=<optimized out>, size=4) at /usr/src/debug/qemu-2.2.0/memory.c:1973 #11 0x00005555555e3de3 in address_space_rw (as=0x555555c8d3e0 <address_space_io>, addr=addr@entry=44552, buf=0x7ffff7fef000 "\b", len=len@entry=4, is_write=is_write@entry=true) at /usr/src/debug/qemu-2.2.0/exec.c:2155 #12 0x0000555555614610 in kvm_handle_io (count=1, size=4, direction=<optimized out>, data=<optimized out>, port=44552) at /usr/src/debug/qemu-2.2.0/kvm-all.c:1635 #13 kvm_cpu_exec (cpu=cpu@entry=0x555556319900) at /usr/src/debug/qemu-2.2.0/kvm-all.c:1792 #14 0x0000555555602ab2 in qemu_kvm_cpu_thread_fn (arg=0x555556319900) at /usr/src/debug/qemu-2.2.0/cpus.c:953 #15 0x00007ffff6bc9df5 in start_thread () from /lib64/libpthread.so.0 #16 0x00007ffff0a7c1ad in clone () from /lib64/libc.so.6
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2015-2546.html