Description of problem: qemu core dump when unplug a 16T GPT disk from win2019 guest. Version-Release number of selected component (if applicable): kernel-4.18.0-80.el8.x86_64 qemu-kvm-3.1.0-22.module+el8.0.1+3032+a09688b9.x86_64 edk2-ovmf-20180508gitee3198e672e2-9.el8.noarch virtio-win-prewhql-171 How reproducible: 100% Steps to Reproduce: 1. Create a 16T image with qemu-img # qemu-img create -f qcow2 /home/data1.qcow2 16T 2. Boot guest up: ============================================================================== MALLOC_PERTURB_=1 /usr/libexec/qemu-kvm \ -name 'avocado-vt-vm1' \ -machine q35 \ -nodefaults \ -device VGA,bus=pcie.0,addr=0x1 \ -device pvpanic,ioport=0x505,id=idZcGD6F \ -device pcie-root-port,id=pcie.0-root-port-2,slot=2,chassis=2,addr=0x2,bus=pcie.0 \ -device qemu-xhci,id=usb1,bus=pcie.0-root-port-2,addr=0x0 \ -object secret,id=sec0,data=backing \ -device pcie-root-port,id=pcie.0-root-port-3,slot=3,chassis=3,addr=0x3,bus=pcie.0 \ -device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pcie.0-root-port-3,addr=0x0 \ -drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,format=luks,file=/dev/vgtest/srv-msql3-sys,key-secret=sec0 \ -device pcie-root-port,id=pcie.0-root-port-4,slot=4,chassis=4,addr=0x4,bus=pcie.0 \ -device virtio-blk-pci,id=stg,drive=drive_stg,bootindex=1,bus=pcie.0-root-port-4,addr=0x0 \ -drive id=drive_stg,if=none,snapshot=off,aio=threads,cache=none,format=luks,file=/dev/vgtest-data/srv-msql3-data,key-secret=sec0 \ -device pcie-root-port,id=pcie.0-root-port-5,slot=5,chassis=5,addr=0x5,bus=pcie.0 \ -device virtio-net-pci,mac=9a:55:56:57:58:59,id=id18Xcuo,vectors=4,netdev=idGRsMas,bus=pcie.0-root-port-5,addr=0x0 \ -netdev tap,id=idGRsMas,vhost=on \ -m 13312 \ -smp 24,maxcpus=24,cores=12,threads=1,sockets=2 \ -cpu 'Skylake-Server',hv_stimer,hv_synic,hv_vpindex,hv_reset,hv_relaxed,hv_spinlocks=0x1fff,hv_vapic,hv_time,hv-tlbflush,+kvm_pv_unhalt \ -drive id=drive_cd1,if=none,snapshot=off,aio=threads,cache=none,media=cdrom,file=/home/kvm_autotest_root/iso/ISO/Win2019/en_windows_server_2019_x64_dvd_4cb967d8.iso \ -device ide-cd,id=cd1,drive=drive_cd1,bootindex=2,bus=ide.0,unit=0 \ -drive id=drive_virtio,if=none,snapshot=off,aio=threads,cache=none,media=cdrom,file=/home/kvm_autotest_root/iso/windows/virtio-win-prewhql-0.1-171.iso \ -device ide-cd,id=virtio,drive=drive_virtio,bootindex=3,bus=ide.1,unit=0 \ -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \ -vnc :0 \ -rtc base=localtime,clock=host,driftfix=slew \ -boot order=cdn,once=c,menu=off,strict=off \ -drive if=pflash,format=raw,readonly=on,file=/usr/share/OVMF/OVMF_CODE.secboot.fd \ -drive if=pflash,format=raw,file=/home/kvm_autotest_root/images/win2019-64-virtio.qcow2.fd \ -enable-kvm \ -device pcie-root-port,id=pcie_extra_root_port_0,slot=6,chassis=6,addr=0x6,bus=pcie.0 \ -monitor stdio \ -qmp tcp:0:4445,server,nowait \ =============================================================================== 3. Hot-plug new created 16T disk to guest. (qemu) drive_add auto id=drive_stg0,if=none,snapshot=off,aio=threads,cache=none,format=qcow2,file=/home/data1.qcow2 OK (qemu) device_add driver=virtio-blk-pci,id=stg0,drive=drive_stg0,bus=pcie_extra_root_port_0 4. In guest, initialize the disk to GPT disk, -> create a new simple volume with all size and format it with NTFS file system. 5. (qemu) device_del stg0 Actual results: qemu core dumped Expected results: no core dump Additional info: 1. Tested with a 20G GPT disk, cannot reproduced this issue. 2. Tested with a 16T MBR disk, cannot reproduced this issue.
Hit the same issue on ws2012-64 (q35+seabios) kernel-4.18.0-80.el8.x86_64 qemu-kvm-3.1.0-22.module+el8.0.1+3032+a09688b9.x86_64 seabios-1.12.0-1.module+el8+2706+3c6581b6.x86_64 virtio-win-prewhql-171
I now got a backtrace from the core dump which looks more related to virtio: (gdb) bt #0 0x0000558b4c37ad6d in virtio_pci_notify_write () #1 0x0000558b4c1af4c3 in memory_region_write_accessor () #2 0x0000558b4c1ad676 in access_with_adjusted_size () #3 0x0000558b4c1b1440 in memory_region_dispatch_write () #4 0x0000558b4c15b513 in flatview_write_continue () #5 0x0000558b4c15b739 in flatview_write () #6 0x0000558b4c15f843 in address_space_write () #7 0x0000558b4c1c3058 in kvm_cpu_exec () #8 0x0000558b4c19c6f6 in qemu_kvm_cpu_thread_fn () #9 0x0000558b4c4a6a24 in qemu_thread_start () #10 0x00007fd6127c52de in start_thread (arg=<optimized out>) at pthread_create.c:486 #11 0x00007fd6124f5a63 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 Disassembly extract: 0x0000558b4c37ad66 <+70>: lea rsi,[rip+0x1c6d00] # 0x558b4c541a6d => 0x0000558b4c37ad6d <+77>: mov rdi,QWORD PTR [rax+0x28] 0x0000558b4c37ad71 <+81>: call 0x558b4c3de260 <object_dynamic_cast_assert> So I guess DEVICE(vdev)->parent_bus is NULL. Max
Now I got debug symbols working, and indeed: #0 0x0000558b4c37ad6d in virtio_pci_notify_write (opaque=0x558b4e0b1170, addr=0, val=<optimized out>, size=<optimized out>) at hw/virtio/virtio-pci.c:1361 1361 VirtIOPCIProxy *proxy = VIRTIO_PCI(DEVICE(vdev)->parent_bus->parent); (gdb) print vdev->parent_obj->parent_bus $4 = (BusState *) 0x0
So I tried to debug this problem and I didn’t find anything obvious (I can’t reproduce it with a Linux guest, and I don’t have access to a Windows guest). Without the regions registered, a memory write will not end up at virtio_pci_notify_write(); and the regions are unmapped before the device is removed from the bus. But then again I’m no virtio expert. Max
Try to change #define DEBUG_PCIE in hw/pci/pcie.c does this show anything?
(In reply to Michael S. Tsirkin from comment #14) > Try to change #define DEBUG_PCIE in hw/pci/pcie.c > does this show anything? upstream and your pci branch both look the same QEMU 4.0.50 monitor - type 'help' for more information (qemu) pcie_cap_slot_plug_common:398 :0 hotplug state: 0x0 pcie_cap_slot_plug_common:398 :0 hotplug state: 0x0 pcie_cap_slot_plug_common:398 :0 hotplug state: 0x0 pcie_cap_slot_plug_common:398 :0 hotplug state: 0x0 pcie_cap_slot_reset:557 pcie-root-port:30 reset pcie_cap_slot_reset:557 pcie-root-port:28 reset pcie_cap_slot_reset:557 pcie-root-port:20 reset pcie_cap_slot_reset:557 pcie-root-port:18 reset pcie_cap_slot_reset:557 pcie-root-port:10 reset (qemu) drive_add auto id=drive_stg0,if=none,snapshot=off,aio=threads,cache=none,format=qcow2,file=/home/vrozenfe/work/images/disk16T.qcow2 OK (qemu) device_add driver=virtio-blk-pci,id=stg0,drive=drive_stg0,bus=pcie_extra_root_port_0 pcie_cap_slot_plug_common:398 :ffffffff hotplug state: 0x0 (qemu) device_del stg0 pcie_cap_slot_plug_common:398 virtio-blk-pci:0 hotplug state: 0x40 (qemu) qemu and vm still seem to be alive but Windows storage subsystem is broken. On qemu-kvm-3.1.0-22.module+el8.0.1+3032+a09688b9 QEMU 3.1.0 monitor - type 'help' for more information (qemu) pcie_cap_slot_hotplug_common:325 qemu-xhci:0 hotplug state: 0x0 pcie_cap_slot_hotplug_common:325 virtio-blk-pci:0 hotplug state: 0x0 pcie_cap_slot_hotplug_common:325 virtio-blk-pci:0 hotplug state: 0x0 pcie_cap_slot_hotplug_common:325 virtio-net-pci:0 hotplug state: 0x0 pcie_cap_slot_reset:458 pcie-root-port:30 reset pcie_cap_slot_reset:458 pcie-root-port:28 reset pcie_cap_slot_reset:458 pcie-root-port:20 reset pcie_cap_slot_reset:458 pcie-root-port:18 reset pcie_cap_slot_reset:458 pcie-root-port:10 reset (qemu) drive_add auto id=drive_stg0,if=none,snapshot=off,aio=threads,cache=none,format=qcow2,file=/home/vrozenfe/work/images/disk16T.qcow2 OK (qemu) device_add driver=virtio-blk-pci,id=stg0,drive=drive_stg0,bus=pcie_extra_root_port_0 pcie_cap_slot_hotplug_common:325 virtio-blk-pci:0 hotplug state: 0x0 (qemu) device_del stg0 pcie_cap_slot_hotplug_common:325 virtio-blk-pci:0 hotplug state: 0x40 and qemu chrashed after that.
So this doesn’t look specific to virtio-blk? (I personally can’t see a difference whether I’m using virtio-blk or e.g. virtio-serial.) The problem is that I don’t see any issue with a Linux guest. As far as I understand, when you delete a PCI device, the guest receives an unplug request. Linux answers on this after a couple of seconds (during which the device is present both in lspci in the guest and info pci in qemu’s monitor), and then the device disappears from both lspci and info pci. dmesg has this to say: [ 29.720969] pciehp 0000:00:06.0:pcie004: Slot(6): Attention button pressed [ 29.721457] pciehp 0000:00:06.0:pcie004: Slot(6): Powering off due to button press [ 35.872602] pciehp 0000:00:06.0:pcie004: Slot(6): Card not present [ 35.872630] pciehp 0000:00:06.0:pcie004: Slot(6): Already disabled Note that I’m really not a virtio, PCI or Windows expert, so I’m really outside of my comfort zone here. I get the same result on upstream master and 3.1.0-30.module+el8.0.1+3755+6782b0ed. The problem is that I can’t even ask for what to inspect on a crash (because I don’t know anything about this code). As I said in comment 7, I have absolutely no idea how we can have a memory access to a region of a removed device. (I’m just assuming the backtrace I pasted in comment 5 is the same that Vadim sees when his 3.1.0 instance crashes.) I looked through all code paths and didn’t find anything that would allow that. (Clearly I must be overlooking something, but I don’t know what.) That said, I only looked upstream. The code looks pretty clear to me there: pcie_unplug_device() first invokes pcie_cap_slot_unplug_cb() through hotplug_handler_unplug(), which will then set the "realized" property to false, which runs pci_qdev_unrealize(). This remove all MMIO regions. Only then will pcie_unplug_device() call object_unparent(), thus setting parent_bus to NULL (through device_unparent()). 3.1.0 does not have this explicit call to unrealize the device, but device_unparent() contains the same call right at its beginning. So here too I have no idea how the device’s memory region can still be hooked up (so we get to virtio_pci_notify_write()), but its parent_bus is already NULL. The only interesting thing I see from testing is that calling device_del twice in such short succession that the second invocation comes before Linux does actually unplug the device, the unplug operation will be canceled. But I suppose that’s just due to implementation quirks. (device_del does not really request unplugging the device, but it just emulates pressing the attention button. If Linux sees that before it removes the device, it will cancel the operation.) Max
I just found out that the device that the virtio device is different from the PCI device... So this isn’t about pci_qdev_unrealize(), but virtio_device_unrealize(). which calls virtio_bus_device_unplugged(), which calls something like virtio_pci_device_unplugged(), which then unmaps the memory regions. (But the device’s class’s .unparent still points to device_unparent(), and that always unrealizes the device before setting parent_bus to NULL. I can also see in the core dump that .realized is false.) Max
Also hit it on qemu-kvm-4.1.0-4.module+el8.1.0+4020+16089f93. Details: Host: kernel-4.18.0-131.el8.x86_64 qemu-kvm-4.1.0-4.module+el8.1.0+4020+16089f93 Guest: windows2019 with virtio-win-prewhql-0.1-172.iso 1. boot guest with below cmd lines. /usr/libexec/qemu-kvm \ -S \ -name 'avocado-vt-vm1' \ -machine q35 \ -nodefaults \ -device VGA,bus=pcie.0,addr=0x1 \ -chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/avocado_e7dd8bt1/monitor-qmpmonitor1-20190815-213901-wPBLwObh,server,nowait \ -mon chardev=qmp_id_qmpmonitor1,mode=control \ -chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/avocado_e7dd8bt1/monitor-catch_monitor-20190815-213901-wPBLwObh,server,nowait \ -mon chardev=qmp_id_catch_monitor,mode=control \ -device pvpanic,ioport=0x505,id=id5iZYDr \ -chardev socket,server,path=/var/tmp/avocado_e7dd8bt1/serial-serial0-20190815-213901-wPBLwObh,id=chardev_serial0,nowait \ -device isa-serial,id=serial0,chardev=chardev_serial0 \ -chardev socket,id=seabioslog_id_20190815-213901-wPBLwObh,path=/var/tmp/avocado_e7dd8bt1/seabios-20190815-213901-wPBLwObh,server,nowait \ -device isa-debugcon,chardev=seabioslog_id_20190815-213901-wPBLwObh,iobase=0x402 \ -device pcie-root-port,id=pcie.0-root-port-2,slot=2,chassis=2,addr=0x2,bus=pcie.0 \ -device qemu-xhci,id=usb1,bus=pcie.0-root-port-2,addr=0x0 \ -device pcie-root-port,id=pcie.0-root-port-3,slot=3,chassis=3,addr=0x3,bus=pcie.0 \ -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pcie.0-root-port-3,addr=0x0 \ -drive id=drive_image1,if=none,snapshot=off,cache=none,format=qcow2,file=/home/kvm_autotest_root/images/win2019-64-virtio-scsi.qcow2 \ -device scsi-hd,id=image1,drive=drive_image1,bootindex=0 \ -device pcie-root-port,id=pcie.0-root-port-4,slot=4,chassis=4,addr=0x4,bus=pcie.0 \ -device virtio-net-pci,mac=9a:82:31:9d:ee:b5,id=idqsUmFe,netdev=idq2XyuZ,bus=pcie.0-root-port-4,addr=0x0 \ -netdev tap,id=idq2XyuZ,vhost=on \ -m 14336 \ -smp 12,maxcpus=12,cores=6,threads=1,sockets=2 \ -cpu 'SandyBridge',hv_stimer,hv_synic,hv_vpindex,hv_reset,hv_relaxed,hv_spinlocks=0x1fff,hv_vapic,hv_time,hv-tlbflush,+kvm_pv_unhalt \ -drive id=drive_cd1,if=none,snapshot=off,cache=none,media=cdrom,file=/home/kvm_autotest_root/iso/windows/winutils.iso \ -device scsi-cd,id=cd1,drive=drive_cd1 \ -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \ -vnc :0 \ -rtc base=localtime,clock=host,driftfix=slew \ -boot order=cdn,once=c,menu=off,strict=off \ -enable-kvm \ -device pcie-root-port,id=pcie_extra_root_port_0,slot=5,chassis=5,addr=0x5,bus=pcie.0 \ -monitor stdio \ -qmp tcp:0:4444,server,nowait \ 2. hotplug a blk disk with qmp # telnet localhost 4444 Trying ::1... telnet: connect to address ::1: Connection refused Trying 127.0.0.1... Connected to localhost. Escape character is '^]'. {"QMP": {"version": {"qemu": {"micro": 0, "minor": 1, "major": 4}, "package": "qemu-kvm-4.1.0-4.module+el8.1.0+4020+16089f93"}, "capabilities": ["oob"]}} {"execute": "qmp_capabilities", "id": "ExmD8nws"} {"return": {}, "id": "ExmD8nws"} {"execute": "cont", "id": "YBO02kfw"} {"return": {}, "id": "YBO02kfw"} {"execute": "human-monitor-command", "arguments": {"command-line": "drive_add auto id=drive_stg0,if=none,snapshot=off,cache=none,format=qcow2,file=/home/kvm_autotest_root/images/storage0.qcow2"}, "id": "5QU1d5ov"} {"return": "OK\r\n", "id": "5QU1d5ov"} {"execute": "device_add", "arguments": {"driver": "virtio-blk-pci", "id": "stg0", "drive": "drive_stg0", "bus": "pcie_extra_root_port_0"}, "id": "oraTG8Ej"} {"return": {}, "id": "oraTG8Ej"} 3. create a partition on new added disk and then do iozone test on it. DISKPART> list disk DISKPART> select disk 1 DISKPART> attributes disk clear readonly DISKPART> online disk DISKPART> create partition primary DISKPART> assign letter=I DISKPART> format fs=ntfs quick DISKPART> exit D:\Iozone\iozone.exe -azR -r 64k -n 125M -g 512M -M -i 0 -i 1 -b I:\iozone_test -f I:\testfile 4. unplug disk after iozone test {"execute": "device_del", "arguments": {"id": "stg0"}, "id": "ykdZ52t3"} {"return": {}, "id": "ykdZ52t3"} {"timestamp": {"seconds": 1566384380, "microseconds": 318855}, "event": "DEVICE_DELETED", "data": {"path": "/machine/peripheral/stg0/virtio-backend"}} {"timestamp": {"seconds": 1566384380, "microseconds": 373946}, "event": "DEVICE_DELETED", "data": {"device": "stg0", "path": "/machine/peripheral/stg0"}} {"execute": "human-monitor-command", "arguments": {"command-line": "info qtree"}, "id": "4MBmot5F"} After step 4, qemu core dumped. (qemu) case-181.sh: line 38: 27576 Segmentation fault (core dumped) /usr/libexec/qemu-kvm -S -name 'avocado-vt-vm1' -machine q35 -nodefaults -device VGA,bus=pcie.0,addr=0x1 -chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/avocado_e7dd8bt1/monitor-qmpmonitor1-20190815-213901-wPBLwObh,server,nowait -mon chardev=qmp_id_qmpmonitor1,mode=control -chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/avocado_e7dd8bt1/monitor-catch_monitor-20190815-213901-wPBLwObh,server,nowait -mon chardev=qmp_id_catch_monitor,mode=control -device pvpanic,ioport=0x505,id=id5iZYDr -chardev socket,server,path=/var/tmp/avocado_e7dd8bt1/serial-serial0-20190815-213901-wPBLwObh,id=chardev_serial0,nowait -device isa-serial,id=serial0,chardev=chardev_serial0 -chardev socket,id=seabioslog_id_20190815-213901-wPBLwObh,path=/var/tmp/avocado_e7dd8bt1/seabios-20190815-213901-wPBLwObh,server,nowait -device isa-debugcon,chardev=seabioslog_id_20190815-213901-wPBLwObh,iobase=0x402 -device pcie-root-port,id=pcie.0-root-port-2,slot=2,chassis=2,addr=0x2,bus=pcie.0 -device qemu-xhci,id=usb1,bus=pcie.0-root-port-2,addr=0x0 -device pcie-root-port,id=pcie.0-root-port-3,slot=3,chassis=3,addr=0x3,bus=pcie.0 -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pcie.0-root-port-3,addr=0x0 -drive id=drive_image1,if=none,snapshot=off,cache=none,format=qcow2,file=/home/kvm_autotest_root/images/win2019-64-virtio-scsi.qcow2 -device scsi-hd,id=image1,drive=drive_image1,bootindex=0 -device pcie-root-port,id=pcie.0-root-port-4,slot=4,chassis=4,addr=0x4,bus=pcie.0 -device virtio-net-pci,mac=9a:82:31:9d:ee:b5,id=idqsUmFe,netdev=idq2XyuZ,bus=pcie.0-root-port-4,addr=0x0 -netdev tap,id=idq2XyuZ,vhost=on -m 14336 -smp 12,maxcpus=12,cores=6,threads=1,sockets=2 -cpu 'SandyBridge',hv_stimer,hv_synic,hv_vpindex,hv_reset,hv_relaxed,hv_spinlocks=0x1fff,hv_vapic,hv_time,hv-tlbflush,+kvm_pv_unhalt -drive id=drive_cd1,if=none,snapshot=off,cache=none,media=cdrom,file=/home/kvm_autotest_root/iso/windows/winutils.iso -device scsi-cd,id=cd1,drive=drive_cd1 -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 -vnc :0 -rtc base=localtime,clock=host,driftfix=slew -boot order=cdn,once=c,menu=off,strict=off -enable-kvm -device pcie-root-port,id=pcie_extra_root_port_0,slot=5,chassis=5,addr=0x5,bus=pcie.0 -monitor stdio -qmp tcp:0:4444,server,nowait core dumped file: http://fileshare.englab.nay.redhat.com/pub/section2/images_backup/bug1706759/core.qemu-kvm.0.7f1e6a0741294df2aca8e6917070cf19.27576.1566386006000000.lz4 gdb log: http://fileshare.englab.nay.redhat.com/pub/section2/images_backup/bug1706759/gdb.txt
It seems that while upstream isn't working flawlessly, it has some patches that at least avoid the crash: pcie: don't skip multi-mask events pcie: check that slt ctrl changed before deleting pcie: work around for racy guest init
Also hit it on qemu-kvm-4.1.0-5.module+el8.1.0+4076+b5e41ebc Versions: Host: kernel-4.18.0-131.el8.x86_64 qemu-kvm-4.1.0-5.module+el8.1.0+4076+b5e41ebc Guest: win10 x86_64 with virtio-win-prewhql-0.1-173.iso
QEMU didn't wait for completed I/O events and ended up in accessing fields of deleted device. I've sent patch upstream to fix this: [PATCH] virtio-blk: Add blk_drain() to virtio_blk_device_unrealize()
Hit this issue on qemu-kvm-4.1.0-14.module+el8.1.0+4548+ed1300f4.x86_64. Tested with q35 + ovmf config on windows 2019 guest. Used versions: kernel-4.18.0-147.el8.x86_64 qemu-kvm-4.1.0-14.module+el8.1.0+4548+ed1300f4.x86_64 seabios-1.12.0-5.module+el8.1.0+4022+29a53beb.x86_64 virtio-win-prewhql-172 Best Regards~ Peixiu
Dropped from advisory
QEMU has been recently split into sub-components and as a one-time operation to avoid breakage of tools, we are setting the QEMU sub-component of this BZ to "General". Please review and change the sub-component if necessary the next time you review this BZ. Thanks
Moving to modified, even tough it didn't require a patch/build.
Verified on qemu-kvm-common-4.2.0-24.module+el8.2.1+6959+9b840e7c.x86_64 as steps in comment 45. No qemu crash issue found. But it will hit similar issue with Bug 1833187 - [virtio-win][viostor] Disk still display in guest after hotunplug.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:3172