Bug 2058459

Summary: Qemu core dump when mirror before "STOP" event received that caused by no space left error(iothread enabled)
Product: Red Hat Enterprise Linux 8 Reporter: aihua liang <aliang>
Component: qemu-kvmAssignee: Hanna Czenczek <hreitz>
qemu-kvm sub component: Block Jobs QA Contact: aihua liang <aliang>
Status: CLOSED CURRENTRELEASE Docs Contact:
Severity: high    
Priority: medium CC: coli, hreitz, jinzhao, mdeng, ngu, virt-maint
Version: 8.6Keywords: Triaged
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 2058457 Environment:
Last Closed: 2023-06-25 01:47:42 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2058457    
Bug Blocks:    

Comment 1 aihua liang 2022-02-25 03:38:55 UTC
qemu-kvm-6.2.0-8.module+el8.6.0+14324+050a5215 also hit this issue, core dump info as bellow:
 (gdb)#bt
  #0  bdrv_parent_can_set_aio_context (errp=0x7fff6f3e2c98, ignore=0x7fff6f3e2c20, ctx=0x559542cafe60, c=0x101010101010101) at ../block.c:7161
#1  bdrv_can_set_aio_context (bs=0x5595435f1800, ctx=0x559542cafe60, ignore=0x7fff6f3e2c20, errp=0x7fff6f3e2c98) at ../block.c:7196
#2  0x0000559541da7913 in bdrv_child_try_set_aio_context (bs=bs@entry=0x5595435f1800, ctx=ctx@entry=0x559542cafe60, ignore_child=ignore_child@entry=0x0, 
    errp=errp@entry=0x7fff6f3e2c98) at ../block.c:7216
#3  0x0000559541da7a37 in bdrv_try_set_aio_context (errp=0x7fff6f3e2c98, ctx=0x559542cafe60, bs=0x5595435f1800) at ../block.c:7233
#4  bdrv_attach_child_common (child_bs=child_bs@entry=0x5595435f1800, child_name=child_name@entry=0x559541f78a07 "root", 
    child_class=child_class@entry=0x559542653160 <child_root>, child_role=child_role@entry=20, perm=perm@entry=0, shared_perm=shared_perm@entry=31, opaque=0x559543a83260, 
    child=0x7fff6f3e2d30, tran=0x559543a83650, errp=0x559542765858 <error_abort>) at ../block.c:2931
#5  0x0000559541da8c7d in bdrv_root_attach_child (child_bs=child_bs@entry=0x5595435f1800, child_name=child_name@entry=0x559541f78a07 "root", 
    child_class=child_class@entry=0x559542653160 <child_root>, child_role=child_role@entry=20, perm=0, shared_perm=31, opaque=0x559543a83260, 
    errp=0x559542765858 <error_abort>) at ../block.c:3055
#6  0x0000559541dc1575 in blk_insert_bs (blk=0x559543a83260, bs=bs@entry=0x5595435f1800, errp=0x559542765858 <error_abort>) at ../block/block-backend.c:862
#7  0x0000559541dd2dfe in mirror_exit_common (job=0x559543955670) at ../block/mirror.c:779
#8  0x0000559541db0121 in job_prepare (job=0x559543955670) at ../job.c:828
#9  0x0000559541db0b61 in job_txn_apply (job=job@entry=0x559543955670, fn=fn@entry=0x559541db0100 <job_prepare>) at ../job.c:158
#10 0x0000559541db15af in job_do_finalize (job=0x559543955670) at ../job.c:845
#11 0x0000559541db1795 in job_exit (opaque=0x559543955670) at ../job.c:932
#12 0x0000559541ea621d in aio_bh_call (bh=0x7f1ea4010200) at ../util/async.c:169
#13 aio_bh_poll (ctx=ctx@entry=0x559542b26170) at ../util/async.c:169
#14 0x0000559541e94562 in aio_dispatch (ctx=0x559542b26170) at ../util/aio-posix.c:381
#15 0x0000559541ea60c2 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at ../util/async.c:311
#16 0x00007f1eb57a095d in g_main_dispatch (context=0x559542b1e610) at gmain.c:3193
#17 g_main_context_dispatch (context=context@entry=0x559542b1e610) at gmain.c:3873
#18 0x0000559541eb0d60 in glib_pollfds_poll () at ../util/main-loop.c:232
#19 os_host_main_loop_wait (timeout=<optimized out>) at ../util/main-loop.c:255
#20 main_loop_wait (nonblocking=nonblocking@entry=0) at ../util/main-loop.c:531
#21 0x0000559541cab1e9 in qemu_main_loop () at ../softmmu/runstate.c:726
#22 0x0000559541addd82 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at ../softmmu/main.c:50

Comment 4 aihua liang 2022-02-25 07:31:35 UTC
qemu-kvm-6.2.0-1.module+el8.6.0+13725+61ae1949 also hit this issue.

Comment 5 Klaus Heinrich Kiwi 2022-02-25 15:02:35 UTC
Note: We will focus debugging on the RHEL9.0 clone: Bug 2058457

Comment 6 aihua liang 2022-03-14 03:45:55 UTC
Same keypoint to reproduce as bz2058457: iothread enable + '"STOP" event send out later than "blockdev_mirror" execution'.

Comment 8 aihua liang 2022-05-18 07:13:43 UTC
Check this issue on RHEL8.6, RHEL8.5-av, RHEL8.4-av, all hit this issue.
   qemu-kvm-6.2.0-11.module+el8.6.0+14707+5aa4b42d: 1/20 (the 20th run hit the coredump issue)
   qemu-kvm-6.2.0-1.module+el8.6.0+13725+61ae1949: 1/89  (the 89th run hit the coredump issue)
   qemu-kvm-6.0.0-33.module+el8.5.0+13514+2c386966.1: 1/29 (the 29th run hit the coredump issue)
   qemu-kvm-6.0.0-16.module+el8.5.0+10848+2dccc46d: 1/45 (the 45th run hit the coredump issue)
   qemu-kvm-5.2.0-16.module+el8.4.0+11721+c8bbc1be.3: 1/28 (the 28th run hit the coredump issue)
   qemu-kvm-5.2.0-0.module+el8.4.0+8855+a9e237a9: 1/58 (the 58th run hit the coredump issue)

So it's not a regression issue.


~

Comment 9 Hanna Czenczek 2022-05-30 10:04:29 UTC
Like for BZ 2058457, set the ITR to --- because we don’t have clear-cut plans for how to fix this at this point.

(I hope it’ll be fixed by introducing some form of locks for graph-change operations, but we’ll need to see that.)

Comment 10 aihua liang 2023-06-25 01:47:42 UTC
Test on qemu-kvm-6.2.0-35.module+el8.9.0+19024+8193e2ac+scsi+iothread with case:blockdev_mirror_after_block_error for 200 times, all pass.
 (199/200) repeat199.Host_RHEL.m8.u9.product_rhel.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.8.9.0.x86_64.io-github-autotest-qemu.blockdev_mirror_after_block_error.q35: PASS (99.16 s)
 (200/200) repeat200.Host_RHEL.m8.u9.product_rhel.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.8.9.0.x86_64.io-github-autotest-qemu.blockdev_mirror_after_block_error.q35: PASS (98.61 s)
RESULTS    : PASS 200 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0
JOB HTML   : /root/avocado/job-results/job-2023-06-20T22.06-146d4eb/results.html
JOB TIME   : 19720.71 s

Test on qemu-kvm-6.2.0-35.module+el8.9.0+19024+8193e2ac+virtio_blk+iothread with case:blockdev_mirror_after_block_error for 200 times, all pass.
 (199/200) repeat199.Host_RHEL.m8.u9.product_rhel.qcow2.virtio_blk.up.virtio_net.Guest.RHEL.8.9.0.x86_64.io-github-autotest-qemu.blockdev_mirror_after_block_error.q35: PASS (97.41 s)
 (200/200) repeat200.Host_RHEL.m8.u9.product_rhel.qcow2.virtio_blk.up.virtio_net.Guest.RHEL.8.9.0.x86_64.io-github-autotest-qemu.blockdev_mirror_after_block_error.q35: PASS (98.77 s)
RESULTS    : PASS 200 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0
JOB HTML   : /root/avocado/job-results/job-2023-06-21T05.47-b163a53/results.html
JOB TIME   : 19718.78 s
 

So will close this bug as currentrelease.