Bug 1817621 - Crash and deadlock with block jobs when using io-threads
Summary: Crash and deadlock with block jobs when using io-threads
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux Advanced Virtualization
Classification: Red Hat
Component: qemu-kvm
Version: 8.2
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: rc
: 8.2
Assignee: Kevin Wolf
QA Contact: aihua liang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-03-26 16:29 UTC by John Ferlan
Modified: 2020-05-05 09:59 UTC (History)
8 users (show)

Fixed In Version: qemu-kvm-4.2.0-18.module+el8.2.0+6278+dfae3426
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-05-05 09:59:02 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2020:2017 0 None None None 2020-05-05 09:59:45 UTC

Description John Ferlan 2020-03-26 16:29:18 UTC
Description of problem:

IOThread crash report in upstream, see (and followups):

https://lists.nongnu.org/archive/html/qemu-devel/2020-03/msg07225.html


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce (from the upstream report):

# qemu-img create disk1.raw 100M
# qemu-img create disk2.raw 100M
#./x86_64-softmmu/qemu-system-x86_64 -chardev 'socket,id=qmp,path=/var/run/qemu-test.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -pidfile /var/run/qemu-server/108.pid  -m 512 -object 'iothread,id=iothread-virtioscsi0' -object 'iothread,id=iothread-virtioscsi1'  -device 'virtio-scsi-pci,id=virtioscsi0,iothread=iothread-virtioscsi0' -drive 'file=disk1.raw,if=none,id=drive-scsi0,format=raw,cache=none,aio=native,detect-zeroes=on' -device 'scsi-hd,bus=virtioscsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0' -device 'virtio-scsi-pci,id=virtioscsi1,iothread=iothread-virtioscsi1' -drive 'file=disk2.raw,if=none,id=drive-scsi1,format=raw,cache=none,aio=native,detect-zeroes=on' -device 'scsi-hd,bus=virtioscsi1.0,channel=0,scsi-id=0,lun=1,drive=drive-scsi1,id=scsi1'

Then open socat to the qmp socket
# socat /var/run/qemu-test.qmp -

And run the following qmp command:

{ "execute": "qmp_capabilities", "arguments": {} }
{ "execute": "transaction", "arguments":  { "actions": [{ "type": "drive-backup", "data": { "device": "drive-scsi0", "sync": "full", "target": "backup-sysi0.raw" }}, { "type": "drive-backup", "data": { "device": "drive-scsi1", "sync": "full", "target": "backup-scsi1.raw" }}], "properties": { "completion-mode": "grouped" } } }


Actual results:
The VM will core dump:

qemu: qemu_mutex_unlock_impl: Operation not permitted
Aborted (core dumped)
(gdb) bt
#0  0x00007f099d5037bb in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007f099d4ee535 in __GI_abort () at abort.c:79
#2  0x000055c04e39525e in error_exit (err=<optimized out>, msg=msg@entry=0x55c04e5122e0 <__func__.16544> "qemu_mutex_unlock_impl") at util/qemu-thread-posix.c:36
#3  0x000055c04e395813 in qemu_mutex_unlock_impl (mutex=mutex@entry=0x7f09903154e0, file=file@entry=0x55c04e51129f "util/async.c", line=line@entry=601)
    at util/qemu-thread-posix.c:108
#4  0x000055c04e38f8e5 in aio_context_release (ctx=ctx@entry=0x7f0990315480) at util/async.c:601
#5  0x000055c04e299073 in bdrv_set_aio_context_ignore (bs=0x7f0929a76500, new_context=new_context@entry=0x7f0990315000, ignore=ignore@entry=0x7ffe08fa7400)
    at block.c:6238
#6  0x000055c04e2990cc in bdrv_set_aio_context_ignore (bs=bs@entry=0x7f092af47900, new_context=new_context@entry=0x7f0990315000, ignore=ignore@entry=0x7ffe08fa7400)
    at block.c:6211
#7  0x000055c04e299443 in bdrv_child_try_set_aio_context (bs=bs@entry=0x7f092af47900, ctx=0x7f0990315000, ignore_child=ignore_child@entry=0x0, errp=errp@entry=0x0)
    at block.c:6324
#8  0x000055c04e299576 in bdrv_try_set_aio_context (errp=0x0, ctx=<optimized out>, bs=0x7f092af47900) at block.c:6333
#9  0x000055c04e299576 in bdrv_replace_child (child=child@entry=0x7f09902ef5e0, new_bs=new_bs@entry=0x0) at block.c:2551
#10 0x000055c04e2995ff in bdrv_detach_child (child=0x7f09902ef5e0) at block.c:2666
#11 0x000055c04e299ec9 in bdrv_root_unref_child (child=<optimized out>) at block.c:2677
#12 0x000055c04e29f3fe in block_job_remove_all_bdrv (job=job@entry=0x7f0927c18800) at blockjob.c:191
#13 0x000055c04e29f429 in block_job_free (job=0x7f0927c18800) at blockjob.c:88
#14 0x000055c04e2a0909 in job_unref (job=0x7f0927c18800) at job.c:359
#15 0x000055c04e2a0909 in job_unref (job=0x7f0927c18800) at job.c:351
#16 0x000055c04e2a0b68 in job_conclude (job=0x7f0927c18800) at job.c:620
#17 0x000055c04e2a0b68 in job_finalize_single (job=0x7f0927c18800) at job.c:688
#18 0x000055c04e2a0b68 in job_finalize_single (job=0x7f0927c18800) at job.c:660
#19 0x000055c04e2a14fc in job_txn_apply (txn=<optimized out>, fn=0x55c04e2a0a50 <job_finalize_single>) at job.c:145
#20 0x000055c04e2a14fc in job_do_finalize (job=0x7f0927c1c200) at job.c:781
#21 0x000055c04e2a1751 in job_completed_txn_success (job=0x7f0927c1c200) at job.c:831
#22 0x000055c04e2a1751 in job_completed (job=0x7f0927c1c200) at job.c:844
#23 0x000055c04e2a1751 in job_completed (job=0x7f0927c1c200) at job.c:835
#24 0x000055c04e2a17b0 in job_exit (opaque=0x7f0927c1c200) at job.c:863
#25 0x000055c04e38ee75 in aio_bh_call (bh=0x7f098ec52000) at util/async.c:164
#26 0x000055c04e38ee75 in aio_bh_poll (ctx=ctx@entry=0x7f0990315000) at util/async.c:164
#27 0x000055c04e3924fe in aio_dispatch (ctx=0x7f0990315000) at util/aio-posix.c:380
#28 0x000055c04e38ed5e in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at util/async.c:298
#29 0x00007f099f020f2e in g_main_context_dispatch () at /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0
#30 0x000055c04e391768 in glib_pollfds_poll () at util/main-loop.c:219
#31 0x000055c04e391768 in os_host_main_loop_wait (timeout=<optimized out>) at util/main-loop.c:242
#32 0x000055c04e391768 in main_loop_wait (nonblocking=nonblocking@entry=0) at util/main-loop.c:518
#33 0x000055c04e032329 in qemu_main_loop () at /home/dietmar/pve5-devel/mirror_qemu/softmmu/vl.c:1665
#34 0x000055c04df36a8e in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at /home/dietmar/pve5-devel/mirror_qemu/softmmu/main.c:49

Expected results:
No core

Additional info:

Partial fix w/ 

https://lists.gnu.org/archive/html/qemu-devel/2020-03/msg07249.html

but more issues and hang persists

Possible solution posted:

https://lists.nongnu.org/archive/html/qemu-devel/2020-03/msg07994.html

Comment 2 aihua liang 2020-03-27 07:12:52 UTC
Test on qemu-kvm-4.2.0-15.module+el8.2.0+6029+618ef2ec, also hit this issue with multi-iothreads.

Test Steps:
 1. Start guest with two iothreads
     /usr/libexec/qemu-kvm \
    -name 'avocado-vt-vm1'  \
    -sandbox on  \
    -machine q35  \
    -nodefaults \
    -device VGA,bus=pcie.0,addr=0x1 \
    -m 4096  \
    -smp 16,maxcpus=16,cores=8,threads=1,dies=1,sockets=2  \
    -cpu 'EPYC',+kvm_pv_unhalt  \
    -chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/monitor-qmpmonitor1-20200203-033416-61dmcn92,server,nowait \
    -mon chardev=qmp_id_qmpmonitor1,mode=control  \
    -chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/monitor-catch_monitor-20200203-033416-61dmcn92,server,nowait \
    -mon chardev=qmp_id_catch_monitor,mode=control \
    -device pvpanic,ioport=0x505,id=idy8YPXp \
    -chardev socket,path=/var/tmp/serial-serial0-20200203-033416-61dmcn92,server,nowait,id=chardev_serial0 \
    -device isa-serial,id=serial0,chardev=chardev_serial0  \
    -chardev socket,id=seabioslog_id_20200203-033416-61dmcn92,path=/var/tmp/seabios-20200203-033416-61dmcn92,server,nowait \
    -device isa-debugcon,chardev=seabioslog_id_20200203-033416-61dmcn92,iobase=0x402 \
    -device pcie-root-port,id=pcie.0-root-port-2,slot=2,chassis=2,addr=0x2,bus=pcie.0 \
    -device qemu-xhci,id=usb1,bus=pcie.0-root-port-2,addr=0x0 \
    -object iothread,id=iothread0 \
    -object iothread,id=iothread1 \
    -device pcie-root-port,id=pcie.0-root-port-3,slot=3,chassis=3,addr=0x3,bus=pcie.0 \
    -blockdev node-name=file_image1,driver=file,aio=threads,filename=/home/kvm_autotest_root/images/rhel820-64-virtio-scsi.qcow2,cache.direct=on,cache.no-flush=off \
    -blockdev node-name=drive_image1,driver=qcow2,cache.direct=on,cache.no-flush=off,file=file_image1 \
    -device virtio-blk-pci,id=image1,drive=drive_image1,write-cache=on,bus=pcie.0-root-port-3,iothread=iothread0 \
    -device pcie-root-port,id=pcie.0-root-port-6,slot=6,chassis=6,addr=0x6,bus=pcie.0 \
    -blockdev node-name=file_data1,driver=file,aio=threads,filename=/home/data.qcow2,cache.direct=on,cache.no-flush=off \
    -blockdev node-name=drive_data1,driver=qcow2,cache.direct=on,cache.no-flush=off,file=file_data1 \
    -device virtio-blk-pci,id=data1,drive=drive_data1,write-cache=on,bus=pcie.0-root-port-6,iothread=iothread1 \
    -device pcie-root-port,id=pcie.0-root-port-4,slot=4,chassis=4,addr=0x4,bus=pcie.0 \
    -device virtio-net-pci,mac=9a:6c:ca:b7:36:85,id=idz4QyVp,netdev=idNnpx5D,bus=pcie.0-root-port-4,addr=0x0  \
    -netdev tap,id=idNnpx5D,vhost=on \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
    -vnc :0  \
    -rtc base=utc,clock=host,driftfix=slew  \
    -boot menu=off,order=cdn,once=c,strict=off \
    -enable-kvm \
    -device pcie-root-port,id=pcie_extra_root_port_0,slot=5,chassis=5,addr=0x5,bus=pcie.0 \
    -monitor stdio \
    -qmp tcp:0:3000,server,nowait \

 2. Create two backup targets
      {'execute':'blockdev-create','arguments':{'options': {'driver':'file','filename':'/root/sn$i','size':21474836480},'job-id':'job1'}}
      {'execute':'blockdev-add','arguments':{'driver':'file','node-name':'drive_sn$i','filename':'/root/sn$i'}}
      {'execute':'blockdev-create','arguments':{'options': {'driver': 'qcow2','file':'drive_sn$i','size':21474836480},'job-id':'job2'}}
      {'execute':'blockdev-add','arguments':{'driver':'qcow2','node-name':'sn$i','file':'drive_sn$i'}}
      {'execute':'job-dismiss','arguments':{'id':'job1'}}
      {'execute':'job-dismiss','arguments':{'id':'job2'}}

 3. Do backup in transaction mode with completion mode "grouped"
      { "execute": "transaction", "arguments": { "actions": [{"type": "blockdev-backup", "data": { "device": "drive_data1", "target": "sn1", "sync": "full", "job-id":"j1" } },{"type": "blockdev-backup", "data": { "device": "drive_image1", "target": "sn2", "sync": "full", "job-id":"j2" } }], "properties": { "completion-mode": "grouped" } } }
{"timestamp": {"seconds": 1585292687, "microseconds": 532336}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "j1"}}
{"timestamp": {"seconds": 1585292687, "microseconds": 544091}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "j2"}}
{"timestamp": {"seconds": 1585292687, "microseconds": 544143}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "j1"}}
{"timestamp": {"seconds": 1585292687, "microseconds": 544200}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "j2"}}
{"timestamp": {"seconds": 1585292687, "microseconds": 544229}, "event": "JOB_STATUS_CHANGE", "data": {"status": "paused", "id": "j1"}}
{"timestamp": {"seconds": 1585292687, "microseconds": 544303}, "event": "JOB_STATUS_CHANGE", "data": {"status": "paused", "id": "j2"}}
{"timestamp": {"seconds": 1585292687, "microseconds": 544329}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "j1"}}
{"return": {}}
{"timestamp": {"seconds": 1585292687, "microseconds": 544471}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "j2"}}
{"timestamp": {"seconds": 1585292687, "microseconds": 985379}, "event": "JOB_STATUS_CHANGE", "data": {"status": "waiting", "id": "j1"}}
{"timestamp": {"seconds": 1585292933, "microseconds": 804160}, "event": "JOB_STATUS_CHANGE", "data": {"status": "waiting", "id": "j2"}}
{"timestamp": {"seconds": 1585292933, "microseconds": 804323}, "event": "JOB_STATUS_CHANGE", "data": {"status": "pending", "id": "j2"}}
{"timestamp": {"seconds": 1585292933, "microseconds": 804370}, "event": "JOB_STATUS_CHANGE", "data": {"status": "pending", "id": "j1"}}
{"timestamp": {"seconds": 1585292933, "microseconds": 804477}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "j2", "len": 21474836480, "offset": 21474836480, "speed": 0, "type": "backup"}}
{"timestamp": {"seconds": 1585292933, "microseconds": 804537}, "event": "JOB_STATUS_CHANGE", "data": {"status": "concluded", "id": "j2"}}
{"timestamp": {"seconds": 1585292933, "microseconds": 804577}, "event": "JOB_STATUS_CHANGE", "data": {"status": "null", "id": "j2"}}
{"timestamp": {"seconds": 1585292933, "microseconds": 804725}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "j1", "len": 2147483648, "offset": 2147483648, "speed": 0, "type": "backup"}}
{"timestamp": {"seconds": 1585292933, "microseconds": 804770}, "event": "JOB_STATUS_CHANGE", "data": {"status": "concluded", "id": "j1"}}
{"timestamp": {"seconds": 1585292933, "microseconds": 804809}, "event": "JOB_STATUS_CHANGE", "data": {"status": "null", "id": "j1"}}

 After step3, qemu core dump with info:
   (qemu) qemu: qemu_mutex_unlock_impl: Operation not permitted
bug.txt: line 42: 19122 Aborted                 (core dumped) /usr/libexec/qemu-kvm -name 'avocado-vt-vm1' -sandbox on -machine q35 -nodefaults ...

 gdb info:
  (gdb) bt
#0  0x00007f52a822770f in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007f52a8211b25 in __GI_abort () at abort.c:79
#2  0x000055d8590e6bde in error_exit
    (err=<optimized out>, msg=msg@entry=0x55d85928c8f0 <__func__.16350> "qemu_mutex_unlock_impl")
    at util/qemu-thread-posix.c:36
#3  0x000055d8590e6eca in qemu_mutex_unlock_impl
    (mutex=mutex@entry=0x55d85a4d5cb0, file=file@entry=0x55d85928bc9f "util/async.c", line=line@entry=526)
    at util/qemu-thread-posix.c:108
#4  0x000055d8590e1e39 in aio_context_release (ctx=ctx@entry=0x55d85a4d5c50) at util/async.c:526
#5  0x000055d8590198b3 in bdrv_set_aio_context_ignore
    (bs=0x55d85b20ac10, new_context=new_context@entry=0x55d85a4cfd60, ignore=ignore@entry=0x7ffd03bb30a0) at block.c:6180
#6  0x000055d85901990c in bdrv_set_aio_context_ignore
    (bs=bs@entry=0x55d85a88feb0, new_context=new_context@entry=0x55d85a4cfd60, ignore=ignore@entry=0x7ffd03bb30a0)
    at block.c:6153
#7  0x000055d859019c83 in bdrv_child_try_set_aio_context
    (bs=bs@entry=0x55d85a88feb0, ctx=0x55d85a4cfd60, ignore_child=ignore_child@entry=0x0, errp=errp@entry=0x0) at block.c:6266
#8  0x000055d859019db6 in bdrv_try_set_aio_context (errp=0x0, ctx=<optimized out>, bs=0x55d85a88feb0) at block.c:6275
#9  0x000055d859019db6 in bdrv_replace_child (child=child@entry=0x55d85a4e7e50, new_bs=new_bs@entry=0x0) at block.c:2484
#10 0x000055d859019e37 in bdrv_detach_child (child=0x55d85a4e7e50) at block.c:2602
#11 0x000055d85901a6ed in bdrv_root_unref_child (child=<optimized out>) at block.c:2613
#12 0x000055d85901fc06 in block_job_remove_all_bdrv (job=job@entry=0x55d85b3afc40) at blockjob.c:191
#13 0x000055d85901fc2d in block_job_free (job=0x55d85b3afc40) at blockjob.c:88
#14 0x000055d859020f6d in job_unref (job=0x55d85b3afc40) at job.c:359
#15 0x000055d859020f6d in job_unref (job=0x55d85b3afc40) at job.c:351
#16 0x000055d8590211b8 in job_conclude (job=0x55d85b3afc40) at job.c:620
#17 0x000055d8590211b8 in job_finalize_single (job=0x55d85b3afc40) at job.c:688
#18 0x000055d8590211b8 in job_finalize_single (job=0x55d85b3afc40) at job.c:660
#19 0x000055d859021bcc in job_txn_apply (txn=<optimized out>, fn=0x55d8590210a0 <job_finalize_single>) at job.c:145
#20 0x000055d859021bcc in job_do_finalize (job=0x55d85b56ec00) at job.c:781
#21 0x000055d859021e14 in job_exit (opaque=0x55d85b56ec00) at job.c:863
#22 0x000055d8590e1676 in aio_bh_call (bh=0x7f529400c040) at util/async.c:117
#23 0x000055d8590e1676 in aio_bh_poll (ctx=ctx@entry=0x55d85a4cfd60) at util/async.c:117
--Type <RET> for more, q to quit, c to continue without paging--
#24 0x000055d8590e4a64 in aio_dispatch (ctx=0x55d85a4cfd60) at util/aio-posix.c:459
#25 0x000055d8590e1552 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>)
    at util/async.c:260
#26 0x00007f52acaaf67d in g_main_dispatch (context=0x55d85a4d0110) at gmain.c:3176
#27 0x00007f52acaaf67d in g_main_context_dispatch (context=context@entry=0x55d85a4d0110) at gmain.c:3829
#28 0x000055d8590e3b18 in glib_pollfds_poll () at util/main-loop.c:219
#29 0x000055d8590e3b18 in os_host_main_loop_wait (timeout=<optimized out>) at util/main-loop.c:242
#30 0x000055d8590e3b18 in main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:518
#31 0x000055d858ec5ed1 in main_loop () at vl.c:1828
#32 0x000055d858d71e62 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4504

Comment 3 aihua liang 2020-03-27 07:37:26 UTC
And only with "completion-mode": "grouped", will hit this issue.

Comment 4 John Ferlan 2020-03-31 19:42:12 UTC
I don't see a way that libvirt uses "completion-mode": "grouped" and the default is @individual (from qapi/transaction.json and blockdev.c/get_transaction_properties(), so this does not appear to (yet) affect RHV 4.4 negatively.

Given that I'm moving resolution into RHEL AV 8.2.1, changing the prio/sev to high, and removing the blocker.

Comment 5 John Ferlan 2020-04-08 13:11:51 UTC
Martin - Another bz that should be added to/for the 8.2.0 respin. 

I'll let Kevin explain in more detail as necessary, but the patches that are now pushed upstream resolve a number of issues.

The patches are up through here: https://github.com/qemu/qemu/commit/2f37b0222cf9274d014fcb1f211b14ee626561c9

Comment 6 Kevin Wolf 2020-04-08 15:49:28 UTC
The backup transaction case (which apparently would not be relevant for RHV) mentioned in the original report is only one special case of the crashes and deadlocks that were discovered in the context of the upstream discussion. It is possible reproduce crashes and hangs without transactions, and potentially even with block jobs other than backup.

Below I'll share the script that I used for reproducing the problem. The script keeps starting and cancelling backup block jobs in the background while the VM is running. To reproduce the problems, start some I/O load in the guest. Parts of the problem can be seen even with just a 'dd if=/dev/zero of=/dev/sda' (while booting from a CD), other parts of the problem were only reproduced with "stress-ng -d 5". If the bug is reproduced, QEMU will either crash or the guest will hang. You can watch the QMP output on the console, and as soon as it stops, the guest hangs.


#!/bin/bash
  
qmp() {
cat <<EOF
{'execute':'qmp_capabilities'}
EOF

while true; do
cat <<EOF
{ "execute": "drive-backup", "arguments": {
  "job-id":"drive_image1","device": "drive_image1", "sync": "full", "target": "/tmp/backup.raw" } }
EOF
sleep 0.5
cat <<EOF
{ "execute": "block-job-cancel", "arguments": { "device": "drive_image1"} }
EOF
done
}

qmp | x86_64-softmmu/qemu-system-x86_64 \
    -enable-kvm \
    -machine pc \
    -m 1G \
    -object 'iothread,id=iothread-virtioscsi0' \
    -device 'virtio-scsi-pci,id=virtioscsi0,iothread=iothread-virtioscsi0' \
    -blockdev node-name=my_drive,driver=file,filename=/tmp/overlay.qcow2 \
    -blockdev driver=qcow2,node-name=drive_image1,file=my_drive \
    -device scsi-hd,drive=drive_image1,id=image1 \
    -qmp stdio -monitor vc

Comment 8 Danilo de Paula 2020-04-08 18:00:37 UTC
Can QE Ack this, please?
This is a blocker for AV, has to be built.

Comment 13 aihua liang 2020-04-10 07:15:33 UTC
Test on qemu-kvm-4.2.0-18.module+el8.2.0+6278+dfae3426, not hit this issue any more, set bug's status to "Verified".
 
Test Steps:
 1. Start guest with two iothreads
     /usr/libexec/qemu-kvm \
    -name 'avocado-vt-vm1'  \
    -sandbox on  \
    -machine q35  \
    -nodefaults \
    -device VGA,bus=pcie.0,addr=0x1 \
    -m 4096  \
    -smp 16,maxcpus=16,cores=8,threads=1,dies=1,sockets=2  \
    -cpu 'EPYC',+kvm_pv_unhalt  \
    -chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/monitor-qmpmonitor1-20200203-033416-61dmcn92,server,nowait \
    -mon chardev=qmp_id_qmpmonitor1,mode=control  \
    -chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/monitor-catch_monitor-20200203-033416-61dmcn92,server,nowait \
    -mon chardev=qmp_id_catch_monitor,mode=control \
    -device pvpanic,ioport=0x505,id=idy8YPXp \
    -chardev socket,path=/var/tmp/serial-serial0-20200203-033416-61dmcn92,server,nowait,id=chardev_serial0 \
    -device isa-serial,id=serial0,chardev=chardev_serial0  \
    -chardev socket,id=seabioslog_id_20200203-033416-61dmcn92,path=/var/tmp/seabios-20200203-033416-61dmcn92,server,nowait \
    -device isa-debugcon,chardev=seabioslog_id_20200203-033416-61dmcn92,iobase=0x402 \
    -device pcie-root-port,id=pcie.0-root-port-2,slot=2,chassis=2,addr=0x2,bus=pcie.0 \
    -device qemu-xhci,id=usb1,bus=pcie.0-root-port-2,addr=0x0 \
    -object iothread,id=iothread0 \
    -object iothread,id=iothread1 \
    -device pcie-root-port,id=pcie.0-root-port-3,slot=3,chassis=3,addr=0x3,bus=pcie.0 \
    -blockdev node-name=file_image1,driver=file,aio=threads,filename=/home/kvm_autotest_root/images/rhel820-64-virtio-scsi.qcow2,cache.direct=on,cache.no-flush=off \
    -blockdev node-name=drive_image1,driver=qcow2,cache.direct=on,cache.no-flush=off,file=file_image1 \
    -device virtio-blk-pci,id=image1,drive=drive_image1,write-cache=on,bus=pcie.0-root-port-3,iothread=iothread0 \
    -device pcie-root-port,id=pcie.0-root-port-6,slot=6,chassis=6,addr=0x6,bus=pcie.0 \
    -blockdev node-name=file_data1,driver=file,aio=threads,filename=/home/data.qcow2,cache.direct=on,cache.no-flush=off \
    -blockdev node-name=drive_data1,driver=qcow2,cache.direct=on,cache.no-flush=off,file=file_data1 \
    -device virtio-blk-pci,id=data1,drive=drive_data1,write-cache=on,bus=pcie.0-root-port-6,iothread=iothread1 \
    -device pcie-root-port,id=pcie.0-root-port-4,slot=4,chassis=4,addr=0x4,bus=pcie.0 \
    -device virtio-net-pci,mac=9a:6c:ca:b7:36:85,id=idz4QyVp,netdev=idNnpx5D,bus=pcie.0-root-port-4,addr=0x0  \
    -netdev tap,id=idNnpx5D,vhost=on \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
    -vnc :0  \
    -rtc base=utc,clock=host,driftfix=slew  \
    -boot menu=off,order=cdn,once=c,strict=off \
    -enable-kvm \
    -device pcie-root-port,id=pcie_extra_root_port_0,slot=5,chassis=5,addr=0x5,bus=pcie.0 \
    -monitor stdio \
    -qmp tcp:0:3000,server,nowait \

 2. Create two backup targets
      {'execute':'blockdev-create','arguments':{'options': {'driver':'file','filename':'/root/sn$i','size':21474836480},'job-id':'job1'}}
      {'execute':'blockdev-add','arguments':{'driver':'file','node-name':'drive_sn$i','filename':'/root/sn$i'}}
      {'execute':'blockdev-create','arguments':{'options': {'driver': 'qcow2','file':'drive_sn$i','size':21474836480},'job-id':'job2'}}
      {'execute':'blockdev-add','arguments':{'driver':'qcow2','node-name':'sn$i','file':'drive_sn$i'}}
      {'execute':'job-dismiss','arguments':{'id':'job1'}}
      {'execute':'job-dismiss','arguments':{'id':'job2'}}

 3. Do backup in transaction mode with completion mode "grouped"
      { "execute": "transaction", "arguments": { "actions": [{"type": "blockdev-backup", "data": { "device": "drive_data1", "target": "sn1", "sync": "full", "job-id":"j1" } },{"type": "blockdev-backup", "data": { "device": "drive_image1", "target": "sn2", "sync": "full", "job-id":"j2" } }], "properties": { "completion-mode": "grouped" } } }
      {"timestamp": {"seconds": 1586502529, "microseconds": 717268}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "j1"}}
{"timestamp": {"seconds": 1586502529, "microseconds": 723463}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "j2"}}
{"timestamp": {"seconds": 1586502529, "microseconds": 723516}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "j1"}}
{"timestamp": {"seconds": 1586502529, "microseconds": 723562}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "j2"}}
{"timestamp": {"seconds": 1586502529, "microseconds": 723581}, "event": "JOB_STATUS_CHANGE", "data": {"status": "paused", "id": "j1"}}
{"timestamp": {"seconds": 1586502529, "microseconds": 723696}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "j1"}}
{"timestamp": {"seconds": 1586502529, "microseconds": 723703}, "event": "JOB_STATUS_CHANGE", "data": {"status": "paused", "id": "j2"}}
{"return": {}}
{"timestamp": {"seconds": 1586502529, "microseconds": 723901}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "j2"}}
{"timestamp": {"seconds": 1586502533, "microseconds": 653166}, "event": "JOB_STATUS_CHANGE", "data": {"status": "waiting", "id": "j1"}}
{"timestamp": {"seconds": 1586502778, "microseconds": 577884}, "event": "JOB_STATUS_CHANGE", "data": {"status": "waiting", "id": "j2"}}
{"timestamp": {"seconds": 1586502778, "microseconds": 578034}, "event": "JOB_STATUS_CHANGE", "data": {"status": "pending", "id": "j2"}}
{"timestamp": {"seconds": 1586502778, "microseconds": 578089}, "event": "JOB_STATUS_CHANGE", "data": {"status": "pending", "id": "j1"}}
{"timestamp": {"seconds": 1586502778, "microseconds": 578207}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "j2", "len": 21474836480, "offset": 21474836480, "speed": 0, "type": "backup"}}
{"timestamp": {"seconds": 1586502778, "microseconds": 578273}, "event": "JOB_STATUS_CHANGE", "data": {"status": "concluded", "id": "j2"}}
{"timestamp": {"seconds": 1586502778, "microseconds": 578323}, "event": "JOB_STATUS_CHANGE", "data": {"status": "null", "id": "j2"}}
{"timestamp": {"seconds": 1586502778, "microseconds": 578447}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "j1", "len": 21474836480, "offset": 21474836480, "speed": 0, "type": "backup"}}
{"timestamp": {"seconds": 1586502778, "microseconds": 578538}, "event": "JOB_STATUS_CHANGE", "data": {"status": "concluded", "id": "j1"}}
{"timestamp": {"seconds": 1586502778, "microseconds": 578579}, "event": "JOB_STATUS_CHANGE", "data": {"status": "null", "id": "j1"}}

Comment 15 errata-xmlrpc 2020-05-05 09:59:02 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2017


Note You need to log in before you can comment on or make changes to this bug.