Bug 1950192
Summary: | RHEL9: when ioeventfd=off and 8.4guest, (qemu) qemu-kvm: ../util/qemu-coroutine-lock.c:57: qemu_co_queue_wait_impl: Assertion `qemu_in_coroutine()' failed. | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 9 | Reporter: | bfu <bfu> | ||||
Component: | qemu-kvm | Assignee: | Thomas Huth <thuth> | ||||
qemu-kvm sub component: | General | QA Contact: | bfu <bfu> | ||||
Status: | CLOSED CURRENTRELEASE | Docs Contact: | |||||
Severity: | high | ||||||
Priority: | high | CC: | cohuck, coli, dhorak, fweimer, hreitz, jinzhao, juzhang, knoel, kwolf, ngu, pbonzini, qzhang, ribarry, smitterl, stefanha, thuth, tstaudt, tstellar, virt-maint, virt-qe-z, xuwei, yiwei, zhenyzha | ||||
Version: | 9.0 | Keywords: | Triaged | ||||
Target Milestone: | beta | ||||||
Target Release: | 9.0 Beta | ||||||
Hardware: | s390x | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | qemu-kvm-6.0.0-13.el9 | Doc Type: | If docs needed, set a value | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2021-12-03 07:41:37 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
bfu
2021-04-16 02:22:35 UTC
*** Bug 1967051 has been marked as a duplicate of this bug. *** I wonder whether this is somehow related to BZ 1952483 ... though it's a different assertion, it somehow sounds similar. If that's the case, this problem might reproduce on all architectures except x86... It's a completely different part of the code, but it's similar in that a checks fails that should clearly succeed. The backtrace shows that we are in a coroutine, so the assertion doesn't uncover a bug, but the assertion failure itself is the buggy part. What is common between both cases is that both qemu_in_coroutine() and qemu_get_current_aio_context() access variables in TLS. Crash reproduced 100% with: host: kernel-5.13.0-0.rc2.19.el9.s390x qemu-kvm-6.0.0-4.el9.s390x guest: kernel-5.13.0-0.rc2.19.el9.s390x disk driver virtio-blk-ccw But with different error message: qemu-kvm: ../block/aio_task.c:64: aio_task_pool_wait_one: Assertion `qemu_coroutine_self() == pool->main_co' failed. Full qemu cmdline attached. Also confirmed couldn't be reproduced with host: qemu-kvm-6.0.0-17.module+el8.5.0+11173+c9fce0bb.s390x kernel-4.18.0-308.el8.s390x guest: kernel-4.18.0-308.el8.s390x disk driver virtio-blk-ccw I used libvirt to reproduce. Boot disk xml: <disk type='file' device='disk'> <driver name='qemu' type='qcow2' ioeventfd='off'/> <source file='/var/lib/libvirt/images/vm.qcow2'/> <target dev='vda' bus='virtio'/> <address type='ccw' cssid='0xfe' ssid='0x0' devno='0x0000'/> </disk> Created attachment 1788826 [details]
qemu cmdline and crash log
The bug could be reproduced on qemu-kvm-core-6.0.0-5.el9.s390x in following test scenario(tp-qemu autotest case blockdev_mirror_vm_reboot: 1. Boot up a guest with system image drive_image1: -blockdev node-name=file_image1,driver=file,auto-read-only=on,discard=unmap,aio=threads,filename=/home/bfu/kar/vt_test_images/rhel900-s390x-virtio-scsi.qcow2,cache.direct=on,cache.no-flush=off \ -blockdev node-name=drive_image1,driver=qcow2,read-only=off,cache.direct=on,cache.no-flush=off,file=file_image1 \ -device scsi-hd,id=image1,drive=drive_image1,write-cache=on \ 2. In qmp, add a disk image drive_mirror1, do system_reset, then do blockdev-mirror from drive_image1 to drive_mirror1. It's found {"execute": "blockdev-create", "arguments": {"options": {"driver": "file", "filename": "/home/bfu/kar/workspace/root/avocado/data/avocado-vt/mirror1.qcow2", "size": 21474836480}, "job-id": "file_mirror1"}, "id": "KIwmrp0d"} {"execute": "job-dismiss", "arguments": {"id": "file_mirror1"}, "id": "Yoruzcub"} {"execute": "blockdev-add", "arguments": {"node-name": "file_mirror1", "driver": "file", "filename": "/home/bfu/kar/workspace/root/avocado/data/avocado-vt/mirror1.qcow2", "aio": "threads", "auto-read-only": true, "discard": "unmap"}, "id": "91V05wPo"} {"execute": "blockdev-create", "arguments": {"options": {"driver": "qcow2", "file": "file_mirror1", "size": 21474836480}, "job-id": "drive_mirror1"}, "id": "4tYvitCx"} {"execute": "job-dismiss", "arguments": {"id": "drive_mirror1"}, "id": "Xv3w6pE4"} {"execute": "blockdev-add", "arguments": {"node-name": "drive_mirror1", "driver": "qcow2", "file": "file_mirror1", "read-only": false}, "id": "5SV5MLeM"} {"execute": "system_reset", "id": "fZAq6Vpq"} {"execute": "blockdev-mirror", "arguments": {"sync": "full", "device": "drive_image1", "target": "drive_mirror1", "job-id": "drive_image1_uSXY"}, "id": "rSAyUJ2H"} qemu-kvm: ../util/qemu-coroutine-lock.c:57: qemu_co_queue_wait_impl: Assertion `qemu_in_coroutine()' failed. /tmp/aexpect_VaM2J0Np/aexpect-emwb0j6h.sh: line 1: 261292 Aborted (core dumped) MALLOC_PERTURB_=1 /usr/libexec/qemu-kvm -S -name 'avocado-vt-vm1' -sandbox on -machine s390-ccw-virtio,memory-backend=mem-machine_mem -nodefaults -vga none -m 11264 -object memory-backend-ram,size=11264M,id=mem-machine_mem -smp 6,maxcpus=6,cores=3,threads=1,sockets=2 -cpu 'host' -chardev socket,id=qmp_id_qmpmonitor1,path=/tmp/avocado_9492hyna/monitor-qmpmonitor1-20210611-061624-y4SOL72Y,server=on,wait=off -mon chardev=qmp_id_qmpmonitor1,mode=control -chardev socket,id=qmp_id_catch_monitor,path=/tmp/avocado_9492hyna/monitor-catch_monitor-20210611-061624-y4SOL72Y,server=on,wait=off -mon chardev=qmp_id_catch_monitor,mode=control -chardev socket,id=chardev_serial0,path=/tmp/avocado_9492hyna/serial-serial0-20210611-061624-y4SOL72Y,server=on,wait=off -device sclpconsole,id=serial0,chardev=chardev_serial0 -device virtio-scsi-ccw,id=virtio_scsi_ccw0 -blockdev node-name=file_image1,driver=file,auto-read-only=on,discard=unmap,aio=threads,filename=/home/bfu/kar/vt_test_images/rhel900-s390x-virtio-scsi.qcow2,cache.direct=on,cache.no-flush=off -blockdev node-name=drive_image1,driver=qcow2,read-only=off,cache.direct=on,cache.no-flush=off,file=file_image1 -device scsi-hd,id=image1,drive=drive_image1,write-cache=on -device virtio-net-ccw,mac=9a:b8:e2:5d:d5:f1,id=idKiEl61,netdev=idivdTM1 -netdev tap,id=idivdTM1,vhost=on,vhostfd=18,fd=14 -nographic -rtc base=utc,clock=host,driftfix=slew -boot strict=on -enable-kvm -device virtio-mouse-ccw,id=input_mouse1 -device virtio-keyboard-ccw,id=input_keyboard1 The problem seems to be related to the "-flto=auto" compiler flag. I can reproduce the problem by configuring the build with: .../configure --target-list=s390x-softmmu --extra-cflags=-flto=auto and the by running "make check-unit -j8". So a quick work-around could be to remove -flto=auto from the extra-cflags in build for s390x... This definitely needs to be investigated by someone who has experience with the coroutines in QEMU ... Stefan, could you please have a look at this? (let me know if you need help with getting access to a non-x86 machine where this could be reproduced) It may be possible to reproduce this on x86-64 as well when building with -mtls-dialect=gnu2 (assuming that my hunch is correct). Drat, sorry, I commented on the wrong BZ - I was able to easily reproduce BZ 1952483 when compiling with -flto, but so far I was not able to reproduce this BZ here yet (though I think that it is very likely the same root cause). Big sorry for the confusion. I have summarized the current state of coroutine TLS issues in bz1952483. When we have a solution then this bz should be solved too. According to test result of ctc2 on RHEL9, this bug could be reproduced with both virtio-blk-ccw and virtio-scisi-ccw, and if boot up guest with virtio-serial-ccw, this bug won't be reproduced. Tomas, I'll update the test result with the Clang builds? (http://batcave.lab.eng.brq.redhat.com/repos/test/RHEL-9/WRB_CLANG/) ASAP. with RHEL9 guest, the test result would be : ERROR| virttest.qemu_vm.QemuSegFaultError: Qemu crashed: /tmp/aexpect_ou7nio15/aexpect-py_2vfr_.sh: line 1: 92236 Aborted (core dumped) MALLOC_PERTURB_=1 /usr/libexec/qemu-kvm -S -name 'avocado-vt-vm1' -sandbox on -machine s390-ccw-virtio,memory-backend=mem-machine_mem -nodefaults -vga none -m 11264 -object memory-backend-ram,size=11264M,id=mem-machine_mem -smp 6,maxcpus=6,cores=3,threads=1,sockets=2 -cpu 'host' -chardev socket,server=on,id=qmp_id_qmpmonitor1,wait=off,path=/tmp/avocado_4a2vhbaq/monitor-qmpmonitor1-20210727-100941-tRXWqaXP -mon chardev=qmp_id_qmpmonitor1,mode=control -chardev socket,server=on,id=qmp_id_catch_monitor,wait=off,path=/tmp/avocado_4a2vhbaq/monitor-catch_monitor-20210727-100941-tRXWqaXP -mon chardev=qmp_id_catch_monitor,mode=control -chardev socket,server=on,id=chardev_serial0,wait=off,path=/tmp/avocado_4a2vhbaq/serial-serial0-20210727-100941-tRXWqaXP -device sclpconsole,id=serial0,chardev=chardev_serial0 -blockdev node-name=file_image1,driver=file,auto-read-only=on,discard=unmap,aio=threads,filename=/home/bfu/kar/vt_test_images/rhel900-s390x-virtio.qcow2,cache.direct=on,cache.no-flush=off -blockdev node-name=drive_image1,driver=qcow2,read-only=off,cache.direct=on,cache.no-flush=off,file=file_image1 -device virtio-blk-ccw,id=image1,drive=drive_image1,bootindex=0,write-cache=on,ioeventfd=off -device virtio-net-ccw,mac=9a:84:f7:da:96:03,id=idKH3ieD,netdev=id8OU0V6 -netdev tap,id=id8OU0V6,vhost=on,vhostfd=18,fd=14 -nographic -rtc base=utc,clock=host,driftfix=slew -boot strict=on -enable-kvm -device virtio-mouse-ccw,id=input_mouse1 -device virtio-keyboard-ccw,id=input_keyboard1 @thuth Version-Release number of selected component: kernel: 5.14.0-0.rc2.23.el9.s390x qemu version: qemu-kvm-6.0.0-9.el9.clang.safestacktest.s390x Step to Reproduce: boot up guest with 1. -device virtio-blk-ccw,id=image1,drive=drive_image1,bootindex=0,write-cache=on,ioeventfd=off \ 2. or -device virtio-scsi-ccw,id=virtio_scsi_ccw0,ioeventfd=off \ Result: guest could boot up successfully (In reply to bfu from comment #21) > Result: > guest could boot up successfully Great, thanks for testing! ... then let's wait for the full switch to Clang to happen, then we should be fine. (In reply to Thomas Huth from comment #22) > (In reply to bfu from comment #21) > > Result: > > guest could boot up successfully > > Great, thanks for testing! ... then let's wait for the full switch to Clang > to happen, then we should be fine. I think this is just a result of missed optimizations in Clang. Given that Clang is rebased frequently in the buildroot, you cannot keep using the current version indefinitely. So I do not think this is a long-term solution. (In reply to Florian Weimer from comment #23) > (In reply to Thomas Huth from comment #22) > > (In reply to bfu from comment #21) > > > Result: > > > guest could boot up successfully > > > > Great, thanks for testing! ... then let's wait for the full switch to Clang > > to happen, then we should be fine. > > I think this is just a result of missed optimizations in Clang. Given that > Clang is rebased frequently in the buildroot, you cannot keep using the > current version indefinitely. So I do not think this is a long-term solution. I agree. But I'd like to use this BZ here now just for the RHEL9-Beta, to make sure that we at least don't run into this problem there. For the long-term solution, I think we should track that in BZ 1952483 instead, which is already assigned to Stefan who has way more experience with the iothread/block layer than I do. Fortunately, this problem seems to reproduce quite easily in QE's test suite, so we should also notice quite quickly once it starts to fail with Clang, too. Version-Release number of selected component (if applicable): host kernel:5.14.0-0.rc4.35.el9.s390x qemu version: qemu-kvm-6.0.0-12.el9.s390x guest kernel: 5.14.0-0.rc4.35.el9.s390x Steps to Reproduce: 1. boot up guest with cli: /usr/libexec/qemu-kvm \ -name 'avocado-vt-vm1' \ -sandbox on \ -machine s390-ccw-virtio \ -nodefaults \ -vga none \ -m 11264 \ -smp 6,maxcpus=6,cores=3,threads=1,sockets=2 \ -cpu 'host' \ -chardev socket,id=chardev_serial0,server,path=/tmp/bfu,nowait \ -device sclpconsole,id=serial0,chardev=chardev_serial0 \ -device virtio-scsi-ccw,id=virtio_scsi_ccw0,ioeventfd=off \ -blockdev node-name=file_image1,driver=file,auto-read-only=on,discard=unmap,aio=threads,filename=/home/bfu/kar/vt_test_images/rhel900-s390x-virtio-scsi.qcow2,cache.direct=on,cache.no-flush=off \ -blockdev node-name=drive_image1,driver=qcow2,read-only=off,cache.direct=on,cache.no-flush=off,file=file_image1 \ -device scsi-hd,id=image1,drive=drive_image1,write-cache=on \ -device virtio-net-ccw,mac=9a:c8:15:8f:ee:31,id=idhaGnoq,netdev=idcxZ1G7 \ -netdev tap,id=idcxZ1G7,vhost=on \ -nographic \ -rtc base=utc,clock=host,driftfix=slew \ -boot strict=on \ -enable-kvm \ -device virtio-mouse-ccw,id=input_mouse1 \ -device virtio-keyboard-ccw,id=input_keyboard1 \ -monitor stdio \ Actual result: guest could boot up successfully Also reference to PASS 2-Host_RHEL.m9.u0.nographic.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.9.0.0.s390x.io-github-autotest-qemu.ioeventfd.under_stress.s390-virtio, set this bug to verified *** Bug 1991913 has been marked as a duplicate of this bug. *** Version-Release number of selected component (if applicable): host kernel:5.14.0-0.rc4.35.el9.s390x qemu version: qemu-kvm-6.0.0-13.el9.s390x guest kernel: 5.14.0-0.rc4.35.el9.s390x Steps to Reproduce: 1. boot up guest with cli: /usr/libexec/qemu-kvm \ -name 'avocado-vt-vm1' \ -sandbox on \ -machine s390-ccw-virtio \ -nodefaults \ -vga none \ -m 11264 \ -smp 6,maxcpus=6,cores=3,threads=1,sockets=2 \ -cpu 'host' \ -chardev socket,id=chardev_serial0,server,path=/tmp/bfu,nowait \ -device sclpconsole,id=serial0,chardev=chardev_serial0 \ -device virtio-scsi-ccw,id=virtio_scsi_ccw0,ioeventfd=off \ -blockdev node-name=file_image1,driver=file,auto-read-only=on,discard=unmap,aio=threads,filename=/home/bfu/kar/vt_test_images/rhel900-s390x-virtio-scsi.qcow2,cache.direct=on,cache.no-flush=off \ -blockdev node-name=drive_image1,driver=qcow2,read-only=off,cache.direct=on,cache.no-flush=off,file=file_image1 \ -device scsi-hd,id=image1,drive=drive_image1,write-cache=on \ -device virtio-net-ccw,mac=9a:c8:15:8f:ee:31,id=idhaGnoq,netdev=idcxZ1G7 \ -netdev tap,id=idcxZ1G7,vhost=on \ -nographic \ -rtc base=utc,clock=host,driftfix=slew \ -boot strict=on \ -enable-kvm \ -device virtio-mouse-ccw,id=input_mouse1 \ -device virtio-keyboard-ccw,id=input_keyboard1 \ -monitor stdio \ Actual result: guest could boot up successfully Also reference to PASS 1-Host_RHEL.m9.u0.nographic.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.9.0.0.s390x.io-github-autotest-qemu.ioeventfd.under_stress.s390-virtio, set this bug to verified |