Bug 2215192
Summary: | qemu crash on virtio_blk_set_status: Assertion `!s->dataplane_started' failed when hotplug/unplug virtio disks repeatedly [RHEL-8] | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 9 | Reporter: | qing.wang <qinwang> |
Component: | qemu-kvm | Assignee: | Stefan Hajnoczi <stefanha> |
qemu-kvm sub component: | virtio-blk,scsi | QA Contact: | qing.wang <qinwang> |
Status: | CLOSED MIGRATED | Docs Contact: | |
Severity: | high | ||
Priority: | high | CC: | aliang, chayang, coli, jinzhao, juzhang, kwolf, lijin, qizhu, stefanha, virt-maint, xuwei, ymankad, zhenyzha |
Version: | 9.4 | Keywords: | CustomerScenariosInitiative, MigratedToJIRA, Reopened, Triaged |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2023-09-22 16:27:36 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
qing.wang
2023-06-15 03:50:50 UTC
BT: #0 0x00007fde88adeacf raise (libc.so.6) #1 0x00007fde88ab1ea5 abort (libc.so.6) #2 0x00007fde88ab1d79 __assert_fail_base.cold.0 (libc.so.6) #3 0x00007fde88ad7426 __assert_fail (libc.so.6) #4 0x000055bd7a1175c8 virtio_blk_set_status (qemu-kvm) #5 0x000055bd7a1474e4 virtio_set_status (qemu-kvm) #6 0x000055bd7a05b243 virtio_pci_common_write (qemu-kvm) #7 0x000055bd7a0f6777 memory_region_write_accessor (qemu-kvm) #8 0x000055bd7a0f320e access_with_adjusted_size (qemu-kvm) #9 0x000055bd7a0f62a3 memory_region_dispatch_write (qemu-kvm) #10 0x000055bd7a0e7f2e flatview_write_continue (qemu-kvm) #11 0x000055bd7a0e8093 flatview_write (qemu-kvm) #12 0x000055bd7a0ebc6f address_space_write (qemu-kvm) #13 0x000055bd7a1a28b9 kvm_cpu_exec (qemu-kvm) #14 0x000055bd7a1a36e5 kvm_vcpu_thread_fn (qemu-kvm) #15 0x000055bd7a2dfdd4 qemu_thread_start (qemu-kvm) #16 0x00007fde88e5d1ca start_thread (libpthread.so.0) #17 0x00007fde88ac9e73 __clone (libc.so.6) coredump file: http://fileshare.hosts.qa.psi.pek2.redhat.com/pub/section2/images_backup/qbugs/2215192/2023-06-14/core.qemu-kvm.0.5f7588420b954f1782fcbcedcfac907b.910272.1686786489000000.lz4 Stefan, another virtio-blk one, can you have a look? I tried reproducing this on qemu-kvm-8.0.0-4.el9 but couldn't trigger the assertion failure on my host. Next I'll look at the coredump. (In reply to qing.wang from comment #2) > BT: > #0 0x00007fde88adeacf raise (libc.so.6) > #1 0x00007fde88ab1ea5 abort (libc.so.6) > #2 0x00007fde88ab1d79 __assert_fail_base.cold.0 (libc.so.6) > #3 0x00007fde88ad7426 __assert_fail (libc.so.6) > #4 0x000055bd7a1175c8 virtio_blk_set_status (qemu-kvm) > #5 0x000055bd7a1474e4 virtio_set_status (qemu-kvm) > #6 0x000055bd7a05b243 virtio_pci_common_write (qemu-kvm) > #7 0x000055bd7a0f6777 memory_region_write_accessor (qemu-kvm) > #8 0x000055bd7a0f320e access_with_adjusted_size (qemu-kvm) > #9 0x000055bd7a0f62a3 memory_region_dispatch_write (qemu-kvm) > #10 0x000055bd7a0e7f2e flatview_write_continue (qemu-kvm) > #11 0x000055bd7a0e8093 flatview_write (qemu-kvm) > #12 0x000055bd7a0ebc6f address_space_write (qemu-kvm) > #13 0x000055bd7a1a28b9 kvm_cpu_exec (qemu-kvm) > #14 0x000055bd7a1a36e5 kvm_vcpu_thread_fn (qemu-kvm) > #15 0x000055bd7a2dfdd4 qemu_thread_start (qemu-kvm) > #16 0x00007fde88e5d1ca start_thread (libpthread.so.0) > #17 0x00007fde88ac9e73 __clone (libc.so.6) > > coredump file: > http://fileshare.hosts.qa.psi.pek2.redhat.com/pub/section2/images_backup/ > qbugs/2215192/2023-06-14/core.qemu-kvm.0.5f7588420b954f1782fcbcedcfac907b. > 910272.1686786489000000.lz4 The permissions on this coredump file prevent me from downloading it. Please make the file readable. Thanks! It does not hit this issue with the latest version Red Hat Enterprise Linux release 8.9 Beta (Ootpa) 4.18.0-500.el8.x86_64 qemu-kvm-6.2.0-36.module+el8.9.0+19222+f46ac890.x86_64 seabios-bin-1.16.0-3.module+el8.9.0+18724+20190c23.noarch edk2-ovmf-20220126gitbb1bba3d77-5.el8.noarch virtio-win-prewhql-0.1-239.iso python ConfigTest.py --testcase=multi_disk_wild_hotplug.without_delay --platform=x86_64 --guestname=RHEL.8.9.0 --driveformat=virtio_blk --imageformat=qcow2 --machines=q35 --firmware=default_bios --netdst=virbr0 --iothread_scheme=roundrobin --nr_iothreads=2 --customsparams="vm_mem_limit = 8G" --nrepeat=20 I think this bug still exists upstream and downstream but is difficult to trigger. It occurs when a QMP command dispatch BH is scheduled just before a vcpu writes to the VIRTIO Device Status Register. Then another vcpu must write to the same VIRTIO Device Status Register in order to reach this assertion failure. I have written about the scenario here: https://lore.kernel.org/qemu-devel/20230713194226.GA335220@fedora/ The best solution is not clear to me yet, so I started the qemu-devel discussion (see link above) to reach consensus on how to deal with this situation. Do you want to keep this bug closed because it's too hard to reproduce/verify? (I will still pursue a fix upstream.) I can not reproduce this issue on the latest version, that is why I mark it as the current release test over 50 times on the latest version Red Hat Enterprise Linux release 8.9 Beta (Ootpa) 4.18.0-500.el8.x86_64 qemu-kvm-6.2.0-36.module+el8.9.0+19222+f46ac890.x86_64 It may reproduce on the last version or earlier (reproduce ratio 5%) qemu-kvm-6.2.0-35.module+el8.9.0+19166+e262ca96.x86_64 I would like to keep it open since you mentioned it still exists in Upstream (BTW, I am not sure you mentioned "QMP command dispatch" issue is related to Bug 2214985 - [qemu-kvm] no response with QMP command device_add when repeatedly hotplug/unplug virtio disks [RHEL-8] It is a different result but they are using the same test, and the most failed result is no response with the QMP command ) Sounds good to me. I have moved it to RHEL 9. Issue migration from Bugzilla to Jira is in process at this time. This will be the last message in Jira copied from the Bugzilla bug. |