Bug 1950192

Summary: RHEL9: when ioeventfd=off and 8.4guest, (qemu) qemu-kvm: ../util/qemu-coroutine-lock.c:57: qemu_co_queue_wait_impl: Assertion `qemu_in_coroutine()' failed.
Product: Red Hat Enterprise Linux 9 Reporter: bfu <bfu>
Component: qemu-kvmAssignee: Thomas Huth <thuth>
qemu-kvm sub component: General QA Contact: bfu <bfu>
Status: CLOSED CURRENTRELEASE Docs Contact:
Severity: high    
Priority: high CC: cohuck, coli, dhorak, fweimer, hreitz, jinzhao, juzhang, knoel, kwolf, ngu, pbonzini, qzhang, ribarry, smitterl, stefanha, thuth, tstaudt, tstellar, virt-maint, virt-qe-z, xuwei, yiwei, zhenyzha
Version: 9.0Keywords: Triaged
Target Milestone: beta   
Target Release: 9.0 Beta   
Hardware: s390x   
OS: Linux   
Whiteboard:
Fixed In Version: qemu-kvm-6.0.0-13.el9 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-12-03 07:41:37 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
qemu cmdline and crash log none

Description bfu 2021-04-16 02:22:35 UTC
Description of problem:

error message:
(qemu) qemu-kvm: ../util/qemu-coroutine-lock.c:57: qemu_co_queue_wait_impl: Assertion `qemu_in_coroutine()' failed.
test.sh: line 25: 444475 Aborted                 (core dumped) /usr/libexec/qemu-kvm -name 'avocado-vt-vm1' -sandbox on -machine s390-ccw-virtio -nodefaults -vga none -m 11264 -smp 6,maxcpus=6,cores=3,threads=1,sockets=2 -cpu 'host' -chardev socket,id=chardev_serial0,server,path=/tmp/bfu,nowait -device sclpconsole,id=serial0,chardev=chardev_serial0 -device virtio-scsi-ccw,id=virtio_scsi_ccw0,ioeventfd=off -blockdev node-name=file_image1,driver=file,auto-read-only=on,discard=unmap,aio=threads,filename=/home/bfu/kar/vt_test_images/rhel840-s390x-virtio-scsi.qcow2,cache.direct=on,cache.no-flush=off -blockdev node-name=drive_image1,driver=qcow2,read-only=off,cache.direct=on,cache.no-flush=off,file=file_image1 -device scsi-hd,id=image1,drive=drive_image1,write-cache=on -device virtio-net-ccw,mac=9a:c8:15:8f:ee:31,id=idhaGnoq,netdev=idcxZ1G7 -netdev tap,id=idcxZ1G7,vhost=on -nographic -rtc base=utc,clock=host,driftfix=slew -boot strict=on -enable-kvm -device virtio-mouse-ccw,id=input_mouse1 -device virtio-keyboard-ccw,id=input_keyboard1 -monitor stdio

Version-Release number of selected component (if applicable):
host kernel:5.11.0-2.el9.s390x
qemu version: qemu-kvm-5.2.0-14.el9.s390x
guest kernel: 4.18.0-293.el8.s390x

How reproducible:
4/5

Steps to Reproduce:
1.boot up a RHEL.8.4 guest as follow:
/usr/libexec/qemu-kvm \
    -name 'avocado-vt-vm1'  \
    -sandbox on  \
    -machine s390-ccw-virtio  \
    -nodefaults  \
    -vga none \
    -m 11264  \
    -smp 6,maxcpus=6,cores=3,threads=1,sockets=2  \
    -cpu 'host' \
    -chardev socket,id=chardev_serial0,server,path=/tmp/bfu,nowait \
    -device sclpconsole,id=serial0,chardev=chardev_serial0 \
    -device virtio-scsi-ccw,id=virtio_scsi_ccw0,ioeventfd=off \
    -blockdev node-name=file_image1,driver=file,auto-read-only=on,discard=unmap,aio=threads,filename=/home/bfu/kar/vt_test_images/rhel840-s390x-virtio-scsi.qcow2,cache.direct=on,cache.no-flush=off \
    -blockdev node-name=drive_image1,driver=qcow2,read-only=off,cache.direct=on,cache.no-flush=off,file=file_image1 \
    -device scsi-hd,id=image1,drive=drive_image1,write-cache=on \
    -device virtio-net-ccw,mac=9a:c8:15:8f:ee:31,id=idhaGnoq,netdev=idcxZ1G7  \
    -netdev tap,id=idcxZ1G7,vhost=on  \
    -nographic  \
    -rtc base=utc,clock=host,driftfix=slew \
    -boot strict=on \
    -enable-kvm \
    -device virtio-mouse-ccw,id=input_mouse1 \
    -device virtio-keyboard-ccw,id=input_keyboard1 \
    -monitor stdio \

2. hit this issue 
(qemu) qemu-kvm: ../util/qemu-coroutine-lock.c:57: qemu_co_queue_wait_impl: Assertion `qemu_in_coroutine()' failed.
3.

Actual results:
qemu-kvm: ../util/qemu-coroutine-lock.c:57: qemu_co_queue_wait_impl: Assertion `qemu_in_coroutine()' failed.

Expected results:
Guest could boot up without qemu crash
Additional info:

Comment 2 CongLi 2021-06-02 11:43:40 UTC
*** Bug 1967051 has been marked as a duplicate of this bug. ***

Comment 4 Thomas Huth 2021-06-02 12:00:30 UTC
I wonder whether this is somehow related to BZ 1952483 ... though it's a different assertion, it somehow sounds similar. If that's the case, this problem might reproduce on all architectures except x86...

Comment 5 Kevin Wolf 2021-06-02 14:23:26 UTC
It's a completely different part of the code, but it's similar in that a checks fails that should clearly succeed. The backtrace shows that we are in a coroutine, so the assertion doesn't uncover a bug, but the assertion failure itself is the buggy part.

What is common between both cases is that both qemu_in_coroutine() and qemu_get_current_aio_context() access variables in TLS.

Comment 10 smitterl 2021-06-03 11:28:10 UTC
Crash reproduced 100% with:
host:
kernel-5.13.0-0.rc2.19.el9.s390x
qemu-kvm-6.0.0-4.el9.s390x
guest:
kernel-5.13.0-0.rc2.19.el9.s390x
disk driver virtio-blk-ccw

But with different error message:
qemu-kvm: ../block/aio_task.c:64: aio_task_pool_wait_one: Assertion `qemu_coroutine_self() == pool->main_co' failed.


Full qemu cmdline attached.

Also confirmed couldn't be reproduced with
host:
qemu-kvm-6.0.0-17.module+el8.5.0+11173+c9fce0bb.s390x
kernel-4.18.0-308.el8.s390x
guest:
kernel-4.18.0-308.el8.s390x
disk driver virtio-blk-ccw

I used libvirt to reproduce. Boot disk xml:
 <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2' ioeventfd='off'/>
      <source file='/var/lib/libvirt/images/vm.qcow2'/>
      <target dev='vda' bus='virtio'/>
      <address type='ccw' cssid='0xfe' ssid='0x0' devno='0x0000'/>
 </disk>

Comment 11 smitterl 2021-06-03 11:30:45 UTC
Created attachment 1788826 [details]
qemu cmdline and crash log

Comment 12 Gu Nini 2021-06-16 04:02:21 UTC
The bug could be reproduced on qemu-kvm-core-6.0.0-5.el9.s390x in following test scenario(tp-qemu autotest case blockdev_mirror_vm_reboot:

1. Boot up a guest with system image drive_image1:

    -blockdev node-name=file_image1,driver=file,auto-read-only=on,discard=unmap,aio=threads,filename=/home/bfu/kar/vt_test_images/rhel900-s390x-virtio-scsi.qcow2,cache.direct=on,cache.no-flush=off \
    -blockdev node-name=drive_image1,driver=qcow2,read-only=off,cache.direct=on,cache.no-flush=off,file=file_image1 \
    -device scsi-hd,id=image1,drive=drive_image1,write-cache=on \

2. In qmp, add a disk image drive_mirror1, do system_reset, then do blockdev-mirror from drive_image1 to drive_mirror1. It's found 
 
{"execute": "blockdev-create", "arguments": {"options": {"driver": "file", "filename": "/home/bfu/kar/workspace/root/avocado/data/avocado-vt/mirror1.qcow2", "size": 21474836480}, "job-id": "file_mirror1"}, "id": "KIwmrp0d"}
{"execute": "job-dismiss", "arguments": {"id": "file_mirror1"}, "id": "Yoruzcub"}
{"execute": "blockdev-add", "arguments": {"node-name": "file_mirror1", "driver": "file", "filename": "/home/bfu/kar/workspace/root/avocado/data/avocado-vt/mirror1.qcow2", "aio": "threads", "auto-read-only": true, "discard": "unmap"}, "id": "91V05wPo"}
{"execute": "blockdev-create", "arguments": {"options": {"driver": "qcow2", "file": "file_mirror1", "size": 21474836480}, "job-id": "drive_mirror1"}, "id": "4tYvitCx"}
{"execute": "job-dismiss", "arguments": {"id": "drive_mirror1"}, "id": "Xv3w6pE4"}
{"execute": "blockdev-add", "arguments": {"node-name": "drive_mirror1", "driver": "qcow2", "file": "file_mirror1", "read-only": false}, "id": "5SV5MLeM"}
{"execute": "system_reset", "id": "fZAq6Vpq"}
{"execute": "blockdev-mirror", "arguments": {"sync": "full", "device": "drive_image1", "target": "drive_mirror1", "job-id": "drive_image1_uSXY"}, "id": "rSAyUJ2H"}

qemu-kvm: ../util/qemu-coroutine-lock.c:57: qemu_co_queue_wait_impl: Assertion `qemu_in_coroutine()' failed.
/tmp/aexpect_VaM2J0Np/aexpect-emwb0j6h.sh: line 1: 261292 Aborted                 (core dumped) MALLOC_PERTURB_=1 /usr/libexec/qemu-kvm -S -name 'avocado-vt-vm1' -sandbox on -machine s390-ccw-virtio,memory-backend=mem-machine_mem -nodefaults -vga none -m 11264 -object memory-backend-ram,size=11264M,id=mem-machine_mem -smp 6,maxcpus=6,cores=3,threads=1,sockets=2 -cpu 'host' -chardev socket,id=qmp_id_qmpmonitor1,path=/tmp/avocado_9492hyna/monitor-qmpmonitor1-20210611-061624-y4SOL72Y,server=on,wait=off -mon chardev=qmp_id_qmpmonitor1,mode=control -chardev socket,id=qmp_id_catch_monitor,path=/tmp/avocado_9492hyna/monitor-catch_monitor-20210611-061624-y4SOL72Y,server=on,wait=off -mon chardev=qmp_id_catch_monitor,mode=control -chardev socket,id=chardev_serial0,path=/tmp/avocado_9492hyna/serial-serial0-20210611-061624-y4SOL72Y,server=on,wait=off -device sclpconsole,id=serial0,chardev=chardev_serial0 -device virtio-scsi-ccw,id=virtio_scsi_ccw0 -blockdev node-name=file_image1,driver=file,auto-read-only=on,discard=unmap,aio=threads,filename=/home/bfu/kar/vt_test_images/rhel900-s390x-virtio-scsi.qcow2,cache.direct=on,cache.no-flush=off -blockdev node-name=drive_image1,driver=qcow2,read-only=off,cache.direct=on,cache.no-flush=off,file=file_image1 -device scsi-hd,id=image1,drive=drive_image1,write-cache=on -device virtio-net-ccw,mac=9a:b8:e2:5d:d5:f1,id=idKiEl61,netdev=idivdTM1 -netdev tap,id=idivdTM1,vhost=on,vhostfd=18,fd=14 -nographic -rtc base=utc,clock=host,driftfix=slew -boot strict=on -enable-kvm -device virtio-mouse-ccw,id=input_mouse1 -device virtio-keyboard-ccw,id=input_keyboard1

Comment 13 Thomas Huth 2021-07-11 10:36:16 UTC
The problem seems to be related to the "-flto=auto" compiler flag. I can reproduce the problem by configuring the build with:

 .../configure --target-list=s390x-softmmu --extra-cflags=-flto=auto

and the by running "make check-unit -j8".

So a quick work-around could be to remove -flto=auto from the extra-cflags in build for s390x...

Comment 14 Thomas Huth 2021-07-13 05:18:01 UTC
This definitely needs to be investigated by someone who has experience with the coroutines in QEMU ... Stefan, could you please have a look at this? (let me know if you need help with getting access to a non-x86 machine where this could be reproduced)

Comment 15 Florian Weimer 2021-07-13 11:47:21 UTC
It may be possible to reproduce this on x86-64 as well when building with -mtls-dialect=gnu2 (assuming that my hunch is correct).

Comment 16 Thomas Huth 2021-07-15 09:10:38 UTC
Drat, sorry, I commented on the wrong BZ - I was able to easily reproduce BZ 1952483 when compiling with -flto, but so far I was not able to reproduce this BZ here yet (though I think that it is very likely the same root cause). Big sorry for the confusion.

Comment 17 Stefan Hajnoczi 2021-07-20 14:36:29 UTC
I have summarized the current state of coroutine TLS issues in bz1952483. When we have a solution then this bz should be solved too.

Comment 19 bfu 2021-08-06 06:58:44 UTC
According to test result of ctc2 on RHEL9,
this bug could be reproduced with both virtio-blk-ccw and virtio-scisi-ccw, and if boot up guest with virtio-serial-ccw, this bug won't be reproduced.

Tomas, I'll update the test result with the Clang builds? (http://batcave.lab.eng.brq.redhat.com/repos/test/RHEL-9/WRB_CLANG/) ASAP.

Comment 20 bfu 2021-08-06 07:04:28 UTC
with RHEL9 guest, the test result would be :
ERROR| virttest.qemu_vm.QemuSegFaultError: Qemu crashed: /tmp/aexpect_ou7nio15/aexpect-py_2vfr_.sh: line 1: 92236 Aborted                 (core dumped) MALLOC_PERTURB_=1 /usr/libexec/qemu-kvm -S -name 'avocado-vt-vm1' -sandbox on -machine s390-ccw-virtio,memory-backend=mem-machine_mem -nodefaults -vga none -m 11264 -object memory-backend-ram,size=11264M,id=mem-machine_mem -smp 6,maxcpus=6,cores=3,threads=1,sockets=2 -cpu 'host' -chardev socket,server=on,id=qmp_id_qmpmonitor1,wait=off,path=/tmp/avocado_4a2vhbaq/monitor-qmpmonitor1-20210727-100941-tRXWqaXP -mon chardev=qmp_id_qmpmonitor1,mode=control -chardev socket,server=on,id=qmp_id_catch_monitor,wait=off,path=/tmp/avocado_4a2vhbaq/monitor-catch_monitor-20210727-100941-tRXWqaXP -mon chardev=qmp_id_catch_monitor,mode=control -chardev socket,server=on,id=chardev_serial0,wait=off,path=/tmp/avocado_4a2vhbaq/serial-serial0-20210727-100941-tRXWqaXP -device sclpconsole,id=serial0,chardev=chardev_serial0 -blockdev node-name=file_image1,driver=file,auto-read-only=on,discard=unmap,aio=threads,filename=/home/bfu/kar/vt_test_images/rhel900-s390x-virtio.qcow2,cache.direct=on,cache.no-flush=off -blockdev node-name=drive_image1,driver=qcow2,read-only=off,cache.direct=on,cache.no-flush=off,file=file_image1 -device virtio-blk-ccw,id=image1,drive=drive_image1,bootindex=0,write-cache=on,ioeventfd=off -device virtio-net-ccw,mac=9a:84:f7:da:96:03,id=idKH3ieD,netdev=id8OU0V6 -netdev tap,id=id8OU0V6,vhost=on,vhostfd=18,fd=14 -nographic -rtc base=utc,clock=host,driftfix=slew -boot strict=on -enable-kvm -device virtio-mouse-ccw,id=input_mouse1 -device virtio-keyboard-ccw,id=input_keyboard1

Comment 21 bfu 2021-08-09 05:58:40 UTC
@thuth 
Version-Release number of selected component:

kernel: 5.14.0-0.rc2.23.el9.s390x
qemu version: qemu-kvm-6.0.0-9.el9.clang.safestacktest.s390x


Step to Reproduce:
boot up guest with 
1. -device virtio-blk-ccw,id=image1,drive=drive_image1,bootindex=0,write-cache=on,ioeventfd=off \
2. or -device virtio-scsi-ccw,id=virtio_scsi_ccw0,ioeventfd=off \


Result:
guest could boot up successfully

Comment 22 Thomas Huth 2021-08-09 06:29:43 UTC
(In reply to bfu from comment #21)
> Result:
> guest could boot up successfully

Great, thanks for testing! ... then let's wait for the full switch to Clang to happen, then we should be fine.

Comment 23 Florian Weimer 2021-08-09 07:45:23 UTC
(In reply to Thomas Huth from comment #22)
> (In reply to bfu from comment #21)
> > Result:
> > guest could boot up successfully
> 
> Great, thanks for testing! ... then let's wait for the full switch to Clang
> to happen, then we should be fine.

I think this is just a result of missed optimizations in Clang. Given that Clang is rebased frequently in the buildroot, you cannot keep using the current version indefinitely. So I do not think this is a long-term solution.

Comment 24 Thomas Huth 2021-08-09 09:13:12 UTC
(In reply to Florian Weimer from comment #23)
> (In reply to Thomas Huth from comment #22)
> > (In reply to bfu from comment #21)
> > > Result:
> > > guest could boot up successfully
> > 
> > Great, thanks for testing! ... then let's wait for the full switch to Clang
> > to happen, then we should be fine.
> 
> I think this is just a result of missed optimizations in Clang. Given that
> Clang is rebased frequently in the buildroot, you cannot keep using the
> current version indefinitely. So I do not think this is a long-term solution.

I agree. But I'd like to use this BZ here now just for the RHEL9-Beta, to make sure that we at least don't run into this problem there. For the long-term solution, I think we should track that in BZ 1952483 instead, which is already assigned to Stefan who has way more experience with the iothread/block layer than I do.

Fortunately, this problem seems to reproduce quite easily in QE's test suite, so we should also notice quite quickly once it starts to fail with Clang, too.

Comment 25 bfu 2021-08-24 10:21:54 UTC
Version-Release number of selected component (if applicable):
host kernel:5.14.0-0.rc4.35.el9.s390x
qemu version: qemu-kvm-6.0.0-12.el9.s390x
guest kernel: 5.14.0-0.rc4.35.el9.s390x

Steps to Reproduce:
1. boot up guest with cli:
/usr/libexec/qemu-kvm \
    -name 'avocado-vt-vm1'  \
    -sandbox on  \
    -machine s390-ccw-virtio  \
    -nodefaults  \
    -vga none \
    -m 11264  \
    -smp 6,maxcpus=6,cores=3,threads=1,sockets=2  \
    -cpu 'host' \
    -chardev socket,id=chardev_serial0,server,path=/tmp/bfu,nowait \
    -device sclpconsole,id=serial0,chardev=chardev_serial0 \
    -device virtio-scsi-ccw,id=virtio_scsi_ccw0,ioeventfd=off \
    -blockdev node-name=file_image1,driver=file,auto-read-only=on,discard=unmap,aio=threads,filename=/home/bfu/kar/vt_test_images/rhel900-s390x-virtio-scsi.qcow2,cache.direct=on,cache.no-flush=off \
    -blockdev node-name=drive_image1,driver=qcow2,read-only=off,cache.direct=on,cache.no-flush=off,file=file_image1 \
    -device scsi-hd,id=image1,drive=drive_image1,write-cache=on \
    -device virtio-net-ccw,mac=9a:c8:15:8f:ee:31,id=idhaGnoq,netdev=idcxZ1G7  \
    -netdev tap,id=idcxZ1G7,vhost=on  \
    -nographic  \
    -rtc base=utc,clock=host,driftfix=slew \
    -boot strict=on \
    -enable-kvm \
    -device virtio-mouse-ccw,id=input_mouse1 \
    -device virtio-keyboard-ccw,id=input_keyboard1 \
    -monitor stdio \

Actual result:
guest could boot up successfully

Also reference to PASS 2-Host_RHEL.m9.u0.nographic.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.9.0.0.s390x.io-github-autotest-qemu.ioeventfd.under_stress.s390-virtio, set this bug to verified

Comment 26 bfu 2021-08-24 10:23:49 UTC
*** Bug 1991913 has been marked as a duplicate of this bug. ***

Comment 28 bfu 2021-08-30 09:43:57 UTC
Version-Release number of selected component (if applicable):
host kernel:5.14.0-0.rc4.35.el9.s390x
qemu version: qemu-kvm-6.0.0-13.el9.s390x
guest kernel: 5.14.0-0.rc4.35.el9.s390x

Steps to Reproduce:
1. boot up guest with cli:
/usr/libexec/qemu-kvm \
    -name 'avocado-vt-vm1'  \
    -sandbox on  \
    -machine s390-ccw-virtio  \
    -nodefaults  \
    -vga none \
    -m 11264  \
    -smp 6,maxcpus=6,cores=3,threads=1,sockets=2  \
    -cpu 'host' \
    -chardev socket,id=chardev_serial0,server,path=/tmp/bfu,nowait \
    -device sclpconsole,id=serial0,chardev=chardev_serial0 \
    -device virtio-scsi-ccw,id=virtio_scsi_ccw0,ioeventfd=off \
    -blockdev node-name=file_image1,driver=file,auto-read-only=on,discard=unmap,aio=threads,filename=/home/bfu/kar/vt_test_images/rhel900-s390x-virtio-scsi.qcow2,cache.direct=on,cache.no-flush=off \
    -blockdev node-name=drive_image1,driver=qcow2,read-only=off,cache.direct=on,cache.no-flush=off,file=file_image1 \
    -device scsi-hd,id=image1,drive=drive_image1,write-cache=on \
    -device virtio-net-ccw,mac=9a:c8:15:8f:ee:31,id=idhaGnoq,netdev=idcxZ1G7  \
    -netdev tap,id=idcxZ1G7,vhost=on  \
    -nographic  \
    -rtc base=utc,clock=host,driftfix=slew \
    -boot strict=on \
    -enable-kvm \
    -device virtio-mouse-ccw,id=input_mouse1 \
    -device virtio-keyboard-ccw,id=input_keyboard1 \
    -monitor stdio \

Actual result:
guest could boot up successfully

Also reference to PASS 1-Host_RHEL.m9.u0.nographic.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.9.0.0.s390x.io-github-autotest-qemu.ioeventfd.under_stress.s390-virtio, set this bug to verified