Bug 2155112
Summary: | Qemu coredump after do snapshot of mirrored top image and its converted base image(iothread enabled) | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 9 | Reporter: | aihua liang <aliang> | ||||
Component: | qemu-kvm | Assignee: | Stefano Garzarella <sgarzare> | ||||
qemu-kvm sub component: | Block Jobs | QA Contact: | aihua liang <aliang> | ||||
Status: | CLOSED ERRATA | Docs Contact: | |||||
Severity: | high | ||||||
Priority: | high | CC: | coli, hreitz, jinzhao, juzhang, kwolf, lijin, mdeng, pbonzini, sgarzare, stefanha, vgoyal, virt-maint, zhguo | ||||
Version: | 9.2 | Keywords: | Regression, Triaged | ||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | qemu-kvm-7.2.0-5.el9 | Doc Type: | If docs needed, set a value | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2023-05-09 07:20:55 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
aihua liang
2022-12-20 02:49:30 UTC
Test with qemu-kvm-7.1.0-7.el9, not hit this issue. So it's a regression issue, set the keyword. We have low coverage during the holidays, so Stefano will take a quick look, and try to reassign it to be addressed early January.. Paolo and Stefan fyi I'm able to reproduce with the upstream QEMU v7.2.0 This issue seems related to the following commit: commit ace5a161ea1c09d8eaa8b2a717528457dc924e83 Author: Hanna Reitz <hreitz> Date: Mon Nov 7 16:13:21 2022 +0100 block: Start/end drain on correct AioContext bdrv_parent_drained_{begin,end}_single() are supposed to operate on the parent, not on the child, so they should not attempt to get the context to poll from the child but the parent instead. BDRV_POLL_WHILE(c->bs) does get the context from the child, so we should replace it with AIO_WAIT_WHILE() on the parent's context instead. This problem becomes apparent when bdrv_replace_child_noperm() invokes bdrv_parent_drained_end_single() after removing a child from a subgraph that is in an I/O thread. By the time bdrv_parent_drained_end_single() is called, child->bs is NULL, and so BDRV_POLL_WHILE(c->bs, ...) will poll the main loop instead of the I/O thread; but anything that bdrv_parent_drained_end_single_no_poll() may have scheduled is going to want to run in the I/O thread, but because we poll the main loop, the I/O thread is never unpaused, and nothing is run, resulting in a deadlock. Closes: https://gitlab.com/qemu-project/qemu/-/issues/1215 Reviewed-by: Kevin Wolf <kwolf> Signed-off-by: Hanna Reitz <hreitz> Message-Id: <20221107151321.211175-4-hreitz> Signed-off-by: Kevin Wolf <kwolf> Reverting it fixes this issue, but I'm not sure if it is the right thing to do, since Hanna tried to fix another issue. In addition, I tried the latest QEMU master branch (700ce3b1bb52da4acbbf1ad8f6256baaf52c7953) and the issue seems to be solved. It contains some reworks made by Kevin around bdrv_parent_drained_begin_single() for QEMU v8.0. I'm not sure if we can backport all of them. I would like some advice from Hanna and Kevin before proceeding. They are on PTO and I will be in PTO from Dec 24 to Jan 8, but I think we could solve this problem as soon as we get back. Yes, that commit is a bug fix, so I wouldn’t just revert it. Kevin’s series (“block: Simplify drain>") does revert it, because it happens to fix the issue in another way. Backporting Kevin’s series doesn’t seem like the worst idea to me. On the issue at hand: I find the description in commit 0 not quite correct, specifically “After mirror complted” in step 8. If I let the mirror job complete, the guest device will point to drive_convert1sn, and blockdev-snapshot (step 10) will fail (“The overlay is already in use”). Steps 8 through 10 must be run before the mirror job has completed. (It would also be wrong to let mirror complete before step 10, because drive_convert1sn is lacking the base image before that point, so having the mirror complete would mean the guest would no longer see the data in the base image.) I think the problem is that callers of bdrv_append() should (before Kevin’s series) lock all AioContexts that are involved in the operation. Right now, it only locks the base image’s context, but then we quiesce the overlay’s parent (namely the mirror job), and I believe it is correct to poll the mirror job’s context for this (which is what my patch has changed). But for this to work, the caller (external_snapshot_prepare()) should have locked the mirror job’s context, i.e. the overlay image’s context, which it doesn’t do. I think this all becomes moot with Kevin’s series (thankfully), but if we need a quick fix, I think it should be sufficient to have external_snapshot_prepare() lock new_bs’s AioContext, too, if it differs from old_bs’s.; I’ll attach a diff to that effect. (It seems to fix the issue for me at least...) (Sorry for the typo in comment 5, I meant “description in comment 0”, not “description in commit 0”.) Created attachment 1935332 [details]
Have external_snapshot_prepare() lock the overlay’s AioContext
So what's next? Hanna, should QE test again with your patch and see if it fixes the issue? I feel that a quick fix probably is better. And then Kevin's series could go on top reverting the fix. So that we don't rely on having to backport kevin's patch series to fix this issue. (Kevin's patches are yet to get merged upstream, IIUC). I’d prefer to first wait on Stefano’s and Kevin’s opinions before continuing. Kevin’s patches are upstream in 8.0. Test by auto, it works on qemu-kvm-7.2.0-5.el9. (1/2) Host_RHEL.m9.u2.ovmf.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.9.2.0.x86_64.io-github-autotest-qemu.blockdev_mirror_sync_top.q35: PASS (136.45 s) (2/2) Host_RHEL.m9.u2.ovmf.qcow2.virtio_blk.up.virtio_net.Guest.RHEL.9.2.0.x86_64.io-github-autotest-qemu.blockdev_mirror_sync_top.q35: PASS (153.05 s) QE bot(pre verify): Set 'Verified:Tested,SanityOnly' as gating/tier1 test pass. As comment 17 and comment 18, set bug's status to "VERIFIED". Hi,Stefano When running tests on qemu-kvm-7.2.0-6.el9, I still hit this issue, with reproduce ratio lower than 20%.(I run the case 50 times, 7 times I hit this issue) Coredump info: Executable: /usr/libexec/qemu-kvm Control Group: /user.slice/user-0.slice/session-9.scope Unit: session-9.scope Slice: user-0.slice Session: 9 Owner UID: 0 (root) Boot ID: 52c869c21ef64de49877ac0eed7aeb06 Machine ID: 3919555703fd4043b7f3cc2611ad4d18 Hostname: dell-per740xd-01.lab.eng.pek2.redhat.com Storage: /var/lib/systemd/coredump/core.qemu-kvm.0.52c869c21ef64de49877ac0eed7aeb06.363458.1675305507000000.zst (present) Size on Disk: 304.5M Message: Process 363458 (qemu-kvm) of user 0 dumped core. Stack trace of thread 363458: #0 0x00007f03026a154c __pthread_kill_implementation (libc.so.6 + 0xa154c) #1 0x00007f0302654d46 raise (libc.so.6 + 0x54d46) #2 0x00007f03026287f3 abort (libc.so.6 + 0x287f3) #3 0x000055d3bc22dff2 qemu_mutex_unlock_impl (qemu-kvm + 0x9bdff2) #4 0x000055d3bc08cda7 bdrv_do_drained_begin (qemu-kvm + 0x81cda7) #5 0x000055d3bc055e1e bdrv_replace_node_noperm (qemu-kvm + 0x7e5e1e) #6 0x000055d3bc055c92 bdrv_append (qemu-kvm + 0x7e5c92) #7 0x000055d3bc03c62c external_snapshot_prepare (qemu-kvm + 0x7cc62c) #8 0x000055d3bc03aedd qmp_transaction (qemu-kvm + 0x7caedd) #9 0x000055d3bc14e826 qmp_marshal_blockdev_snapshot (qemu-kvm + 0x8de826) #10 0x000055d3bc21e3f2 do_qmp_dispatch_bh (qemu-kvm + 0x9ae3f2) #11 0x000055d3bc22a3f1 aio_dispatch (qemu-kvm + 0x9ba3f1) #12 0x000055d3bc2450a2 aio_ctx_dispatch (qemu-kvm + 0x9d50a2) #13 0x00007f0302d1ae2f g_main_context_dispatch (libglib-2.0.so.0 + 0x54e2f) #14 0x000055d3bc2469c4 main_loop_wait (qemu-kvm + 0x9d69c4) #15 0x000055d3bbd4f8e7 qemu_main_loop (qemu-kvm + 0x4df8e7) #16 0x000055d3bbbd592a qemu_default_main (qemu-kvm + 0x36592a) #17 0x00007f030263feb0 __libc_start_call_main (libc.so.6 + 0x3feb0) #18 0x00007f030263ff60 __libc_start_main@@GLIBC_2.34 (libc.so.6 + 0x3ff60) #19 0x000055d3bbbd5085 _start (qemu-kvm + 0x365085) Stack trace of thread 363465: #0 0x00007f03027429bf __poll (libc.so.6 + 0x1429bf) #1 0x00007f0302d6f49c g_main_context_iterate.constprop.0 (libglib-2.0.so.0 + 0xa949c) #2 0x00007f0302d1a483 g_main_loop_run (libglib-2.0.so.0 + 0x54483) #3 0x000055d3bc043e2f iothread_run (qemu-kvm + 0x7d3e2f) #4 0x000055d3bc22edea qemu_thread_start (qemu-kvm + 0x9bedea) #5 0x00007f030269f802 start_thread (libc.so.6 + 0x9f802) #6 0x00007f030263f450 __clone3 (libc.so.6 + 0x3f450) Stack trace of thread 363470: #0 0x00007f030263ec6b ioctl (libc.so.6 + 0x3ec6b) #1 0x000055d3bbfe998b kvm_vcpu_ioctl (qemu-kvm + 0x77998b) #2 0x000055d3bbfef191 kvm_cpu_exec (qemu-kvm + 0x77f191) #3 0x000055d3bbff178a kvm_vcpu_thread_fn (qemu-kvm + 0x78178a) #4 0x000055d3bc22edea qemu_thread_start (qemu-kvm + 0x9bedea) #5 0x00007f030269f802 start_thread (libc.so.6 + 0x9f802) #6 0x00007f030263f450 __clone3 (libc.so.6 + 0x3f450) Stack trace of thread 363468: #0 0x00007f030263ec6b ioctl (libc.so.6 + 0x3ec6b) #1 0x000055d3bbfe998b kvm_vcpu_ioctl (qemu-kvm + 0x77998b) #2 0x000055d3bbfef191 kvm_cpu_exec (qemu-kvm + 0x77f191) #3 0x000055d3bbff178a kvm_vcpu_thread_fn (qemu-kvm + 0x78178a) #4 0x000055d3bc22edea qemu_thread_start (qemu-kvm + 0x9bedea) #5 0x00007f030269f802 start_thread (libc.so.6 + 0x9f802) #6 0x00007f030263f450 __clone3 (libc.so.6 + 0x3f450) Stack trace of thread 363476: #0 0x00007f030263ec6b ioctl (libc.so.6 + 0x3ec6b) #1 0x000055d3bbfe998b kvm_vcpu_ioctl (qemu-kvm + 0x77998b) #2 0x000055d3bbfef191 kvm_cpu_exec (qemu-kvm + 0x77f191) #3 0x000055d3bbff178a kvm_vcpu_thread_fn (qemu-kvm + 0x78178a) #4 0x000055d3bc22edea qemu_thread_start (qemu-kvm + 0x9bedea) #5 0x00007f030269f802 start_thread (libc.so.6 + 0x9f802) #6 0x00007f030263f450 __clone3 (libc.so.6 + 0x3f450) Stack trace of thread 363570: #0 0x00007f030269c39a __futex_abstimed_wait_common (libc.so.6 + 0x9c39a) #1 0x00007f030269eea4 pthread_cond_timedwait@@GLIBC_2.3.2 (libc.so.6 + 0x9eea4) #2 0x000055d3bc22e53c qemu_cond_timedwait_ts (qemu-kvm + 0x9be53c) #3 0x000055d3bc22e4e0 qemu_cond_timedwait_impl (qemu-kvm + 0x9be4e0) #4 0x000055d3bc2492a7 worker_thread (qemu-kvm + 0x9d92a7) #5 0x000055d3bc22edea qemu_thread_start (qemu-kvm + 0x9bedea) #6 0x00007f030269f802 start_thread (libc.so.6 + 0x9f802) #7 0x00007f030263f450 __clone3 (libc.so.6 + 0x3f450) Stack trace of thread 363460: #0 0x00007f0302742abe ppoll (libc.so.6 + 0x142abe) #1 0x000055d3bc22b8de fdmon_poll_wait (qemu-kvm + 0x9bb8de) #2 0x000055d3bc22ab1e aio_poll (qemu-kvm + 0x9bab1e) #3 0x000055d3bc043e12 iothread_run (qemu-kvm + 0x7d3e12) #4 0x000055d3bc22edea qemu_thread_start (qemu-kvm + 0x9bedea) #5 0x00007f030269f802 start_thread (libc.so.6 + 0x9f802) #6 0x00007f030263f450 __clone3 (libc.so.6 + 0x3f450) Stack trace of thread 363548: #0 0x00007f030269c39a __futex_abstimed_wait_common (libc.so.6 + 0x9c39a) #1 0x00007f030269eea4 pthread_cond_timedwait@@GLIBC_2.3.2 (libc.so.6 + 0x9eea4) #2 0x000055d3bc22e53c qemu_cond_timedwait_ts (qemu-kvm + 0x9be53c) #3 0x000055d3bc22e4e0 qemu_cond_timedwait_impl (qemu-kvm + 0x9be4e0) #4 0x000055d3bc2492a7 worker_thread (qemu-kvm + 0x9d92a7) #5 0x000055d3bc22edea qemu_thread_start (qemu-kvm + 0x9bedea) #6 0x00007f030269f802 start_thread (libc.so.6 + 0x9f802) #7 0x00007f030263f450 __clone3 (libc.so.6 + 0x3f450) Stack trace of thread 363466: #0 0x00007f030263ec6b ioctl (libc.so.6 + 0x3ec6b) #1 0x000055d3bbfe998b kvm_vcpu_ioctl (qemu-kvm + 0x77998b) #2 0x000055d3bbfef191 kvm_cpu_exec (qemu-kvm + 0x77f191) #3 0x000055d3bbff178a kvm_vcpu_thread_fn (qemu-kvm + 0x78178a) #4 0x000055d3bc22edea qemu_thread_start (qemu-kvm + 0x9bedea) #5 0x00007f030269f802 start_thread (libc.so.6 + 0x9f802) #6 0x00007f030263f450 __clone3 (libc.so.6 + 0x3f450) Stack trace of thread 363572: #0 0x00007f030269c39a __futex_abstimed_wait_common (libc.so.6 + 0x9c39a) #1 0x00007f030269eea4 pthread_cond_timedwait@@GLIBC_2.3.2 (libc.so.6 + 0x9eea4) #2 0x000055d3bc22e53c qemu_cond_timedwait_ts (qemu-kvm + 0x9be53c) #3 0x000055d3bc22e4e0 qemu_cond_timedwait_impl (qemu-kvm + 0x9be4e0) #4 0x000055d3bc2492a7 worker_thread (qemu-kvm + 0x9d92a7) #5 0x000055d3bc22edea qemu_thread_start (qemu-kvm + 0x9bedea) #6 0x00007f030269f802 start_thread (libc.so.6 + 0x9f802) #7 0x00007f030263f450 __clone3 (libc.so.6 + 0x3f450) Stack trace of thread 363571: #0 0x00007f030269c39a __futex_abstimed_wait_common (libc.so.6 + 0x9c39a) #1 0x00007f030269eea4 pthread_cond_timedwait@@GLIBC_2.3.2 (libc.so.6 + 0x9eea4) #2 0x000055d3bc22e53c qemu_cond_timedwait_ts (qemu-kvm + 0x9be53c) #3 0x000055d3bc22e4e0 qemu_cond_timedwait_impl (qemu-kvm + 0x9be4e0) #4 0x000055d3bc2492a7 worker_thread (qemu-kvm + 0x9d92a7) #5 0x000055d3bc22edea qemu_thread_start (qemu-kvm + 0x9bedea) #6 0x00007f030269f802 start_thread (libc.so.6 + 0x9f802) #7 0x00007f030263f450 __clone3 (libc.so.6 + 0x3f450) Stack trace of thread 363459: #0 0x00007f030263ee5d syscall (libc.so.6 + 0x3ee5d) #1 0x000055d3bc22eb3f qemu_event_wait (qemu-kvm + 0x9beb3f) #2 0x000055d3bc23ac75 call_rcu_thread (qemu-kvm + 0x9cac75) #3 0x000055d3bc22edea qemu_thread_start (qemu-kvm + 0x9bedea) #4 0x00007f030269f802 start_thread (libc.so.6 + 0x9f802) #5 0x00007f030263f450 __clone3 (libc.so.6 + 0x3f450) Stack trace of thread 363475: #0 0x00007f030263ec6b ioctl (libc.so.6 + 0x3ec6b) #1 0x000055d3bbfe998b kvm_vcpu_ioctl (qemu-kvm + 0x77998b) #2 0x000055d3bbfef191 kvm_cpu_exec (qemu-kvm + 0x77f191) #3 0x000055d3bbff178a kvm_vcpu_thread_fn (qemu-kvm + 0x78178a) #4 0x000055d3bc22edea qemu_thread_start (qemu-kvm + 0x9bedea) #5 0x00007f030269f802 start_thread (libc.so.6 + 0x9f802) #6 0x00007f030263f450 __clone3 (libc.so.6 + 0x3f450) Stack trace of thread 363472: #0 0x00007f030263ec6b ioctl (libc.so.6 + 0x3ec6b) #1 0x000055d3bbfe998b kvm_vcpu_ioctl (qemu-kvm + 0x77998b) #2 0x000055d3bbfef191 kvm_cpu_exec (qemu-kvm + 0x77f191) #3 0x000055d3bbff178a kvm_vcpu_thread_fn (qemu-kvm + 0x78178a) #4 0x000055d3bc22edea qemu_thread_start (qemu-kvm + 0x9bedea) #5 0x00007f030269f802 start_thread (libc.so.6 + 0x9f802) #6 0x00007f030263f450 __clone3 (libc.so.6 + 0x3f450) Stack trace of thread 363580: #0 0x00007f030269c39a __futex_abstimed_wait_common (libc.so.6 + 0x9c39a) #1 0x00007f030269eea4 pthread_cond_timedwait@@GLIBC_2.3.2 (libc.so.6 + 0x9eea4) #2 0x000055d3bc22e53c qemu_cond_timedwait_ts (qemu-kvm + 0x9be53c) #3 0x000055d3bc22e4e0 qemu_cond_timedwait_impl (qemu-kvm + 0x9be4e0) #4 0x000055d3bc2492a7 worker_thread (qemu-kvm + 0x9d92a7) #5 0x000055d3bc22edea qemu_thread_start (qemu-kvm + 0x9bedea) #6 0x00007f030269f802 start_thread (libc.so.6 + 0x9f802) #7 0x00007f030263f450 __clone3 (libc.so.6 + 0x3f450) Stack trace of thread 363477: #0 0x00007f030263ec6b ioctl (libc.so.6 + 0x3ec6b) #1 0x000055d3bbfe998b kvm_vcpu_ioctl (qemu-kvm + 0x77998b) #2 0x000055d3bbfef191 kvm_cpu_exec (qemu-kvm + 0x77f191) #3 0x000055d3bbff178a kvm_vcpu_thread_fn (qemu-kvm + 0x78178a) #4 0x000055d3bc22edea qemu_thread_start (qemu-kvm + 0x9bedea) #5 0x00007f030269f802 start_thread (libc.so.6 + 0x9f802) #6 0x00007f030263f450 __clone3 (libc.so.6 + 0x3f450) Stack trace of thread 363577: #0 0x00007f030269c39a __futex_abstimed_wait_common (libc.so.6 + 0x9c39a) #1 0x00007f030269eea4 pthread_cond_timedwait@@GLIBC_2.3.2 (libc.so.6 + 0x9eea4) #2 0x000055d3bc22e53c qemu_cond_timedwait_ts (qemu-kvm + 0x9be53c) #3 0x000055d3bc22e4e0 qemu_cond_timedwait_impl (qemu-kvm + 0x9be4e0) #4 0x000055d3bc2492a7 worker_thread (qemu-kvm + 0x9d92a7) #5 0x000055d3bc22edea qemu_thread_start (qemu-kvm + 0x9bedea) #6 0x00007f030269f802 start_thread (libc.so.6 + 0x9f802) #7 0x00007f030263f450 __clone3 (libc.so.6 + 0x3f450) Stack trace of thread 363474: #0 0x00007f030263ec6b ioctl (libc.so.6 + 0x3ec6b) #1 0x000055d3bbfe998b kvm_vcpu_ioctl (qemu-kvm + 0x77998b) #2 0x000055d3bbfef191 kvm_cpu_exec (qemu-kvm + 0x77f191) #3 0x000055d3bbff178a kvm_vcpu_thread_fn (qemu-kvm + 0x78178a) #4 0x000055d3bc22edea qemu_thread_start (qemu-kvm + 0x9bedea) #5 0x00007f030269f802 start_thread (libc.so.6 + 0x9f802) #6 0x00007f030263f450 __clone3 (libc.so.6 + 0x3f450) Stack trace of thread 363584: #0 0x00007f030269c39a __futex_abstimed_wait_common (libc.so.6 + 0x9c39a) #1 0x00007f030269eea4 pthread_cond_timedwait@@GLIBC_2.3.2 (libc.so.6 + 0x9eea4) #2 0x000055d3bc22e53c qemu_cond_timedwait_ts (qemu-kvm + 0x9be53c) #3 0x000055d3bc22e4e0 qemu_cond_timedwait_impl (qemu-kvm + 0x9be4e0) #4 0x000055d3bc2492a7 worker_thread (qemu-kvm + 0x9d92a7) #5 0x000055d3bc22edea qemu_thread_start (qemu-kvm + 0x9bedea) #6 0x00007f030269f802 start_thread (libc.so.6 + 0x9f802) #7 0x00007f030263f450 __clone3 (libc.so.6 + 0x3f450) Stack trace of thread 363573: #0 0x00007f030269c39a __futex_abstimed_wait_common (libc.so.6 + 0x9c39a) #1 0x00007f030269eea4 pthread_cond_timedwait@@GLIBC_2.3.2 (libc.so.6 + 0x9eea4) #2 0x000055d3bc22e53c qemu_cond_timedwait_ts (qemu-kvm + 0x9be53c) #3 0x000055d3bc22e4e0 qemu_cond_timedwait_impl (qemu-kvm + 0x9be4e0) #4 0x000055d3bc2492a7 worker_thread (qemu-kvm + 0x9d92a7) #5 0x000055d3bc22edea qemu_thread_start (qemu-kvm + 0x9bedea) #6 0x00007f030269f802 start_thread (libc.so.6 + 0x9f802) #7 0x00007f030263f450 __clone3 (libc.so.6 + 0x3f450) Stack trace of thread 363479: #0 0x00007f030269c39a __futex_abstimed_wait_common (libc.so.6 + 0x9c39a) #1 0x00007f030269eba0 pthread_cond_wait@@GLIBC_2.3.2 (libc.so.6 + 0x9eba0) #2 0x000055d3bc22e39f qemu_cond_wait_impl (qemu-kvm + 0x9be39f) #3 0x000055d3bbc0bc76 vnc_worker_thread (qemu-kvm + 0x39bc76) #4 0x000055d3bc22edea qemu_thread_start (qemu-kvm + 0x9bedea) #5 0x00007f030269f802 start_thread (libc.so.6 + 0x9f802) #6 0x00007f030263f450 __clone3 (libc.so.6 + 0x3f450) Stack trace of thread 363583: #0 0x00007f030269c39a __futex_abstimed_wait_common (libc.so.6 + 0x9c39a) #1 0x00007f030269eea4 pthread_cond_timedwait@@GLIBC_2.3.2 (libc.so.6 + 0x9eea4) #2 0x000055d3bc22e53c qemu_cond_timedwait_ts (qemu-kvm + 0x9be53c) #3 0x000055d3bc22e4e0 qemu_cond_timedwait_impl (qemu-kvm + 0x9be4e0) #4 0x000055d3bc2492a7 worker_thread (qemu-kvm + 0x9d92a7) #5 0x000055d3bc22edea qemu_thread_start (qemu-kvm + 0x9bedea) #6 0x00007f030269f802 start_thread (libc.so.6 + 0x9f802) #7 0x00007f030263f450 __clone3 (libc.so.6 + 0x3f450) Stack trace of thread 363473: #0 0x00007f030263ec6b ioctl (libc.so.6 + 0x3ec6b) #1 0x000055d3bbfe998b kvm_vcpu_ioctl (qemu-kvm + 0x77998b) #2 0x000055d3bbfef191 kvm_cpu_exec (qemu-kvm + 0x77f191) #3 0x000055d3bbff178a kvm_vcpu_thread_fn (qemu-kvm + 0x78178a) #4 0x000055d3bc22edea qemu_thread_start (qemu-kvm + 0x9bedea) #5 0x00007f030269f802 start_thread (libc.so.6 + 0x9f802) #6 0x00007f030263f450 __clone3 (libc.so.6 + 0x3f450) Stack trace of thread 363469: #0 0x00007f030263ec6b ioctl (libc.so.6 + 0x3ec6b) #1 0x000055d3bbfe998b kvm_vcpu_ioctl (qemu-kvm + 0x77998b) #2 0x000055d3bbfef191 kvm_cpu_exec (qemu-kvm + 0x77f191) #3 0x000055d3bbff178a kvm_vcpu_thread_fn (qemu-kvm + 0x78178a) #4 0x000055d3bc22edea qemu_thread_start (qemu-kvm + 0x9bedea) #5 0x00007f030269f802 start_thread (libc.so.6 + 0x9f802) #6 0x00007f030263f450 __clone3 (libc.so.6 + 0x3f450) Stack trace of thread 363581: #0 0x00007f030269c39a __futex_abstimed_wait_common (libc.so.6 + 0x9c39a) #1 0x00007f030269eea4 pthread_cond_timedwait@@GLIBC_2.3.2 (libc.so.6 + 0x9eea4) #2 0x000055d3bc22e53c qemu_cond_timedwait_ts (qemu-kvm + 0x9be53c) #3 0x000055d3bc22e4e0 qemu_cond_timedwait_impl (qemu-kvm + 0x9be4e0) #4 0x000055d3bc2492a7 worker_thread (qemu-kvm + 0x9d92a7) #5 0x000055d3bc22edea qemu_thread_start (qemu-kvm + 0x9bedea) #6 0x00007f030269f802 start_thread (libc.so.6 + 0x9f802) #7 0x00007f030263f450 __clone3 (libc.so.6 + 0x3f450) Stack trace of thread 363619: #0 0x00007f030269c39a __futex_abstimed_wait_common (libc.so.6 + 0x9c39a) #1 0x00007f030269eea4 pthread_cond_timedwait@@GLIBC_2.3.2 (libc.so.6 + 0x9eea4) #2 0x000055d3bc22e53c qemu_cond_timedwait_ts (qemu-kvm + 0x9be53c) #3 0x000055d3bc22e4e0 qemu_cond_timedwait_impl (qemu-kvm + 0x9be4e0) #4 0x000055d3bc2492a7 worker_thread (qemu-kvm + 0x9d92a7) #5 0x000055d3bc22edea qemu_thread_start (qemu-kvm + 0x9bedea) #6 0x00007f030269f802 start_thread (libc.so.6 + 0x9f802) #7 0x00007f030263f450 __clone3 (libc.so.6 + 0x3f450) Stack trace of thread 363587: #0 0x00007f030269c39a __futex_abstimed_wait_common (libc.so.6 + 0x9c39a) #1 0x00007f030269eea4 pthread_cond_timedwait@@GLIBC_2.3.2 (libc.so.6 + 0x9eea4) #2 0x000055d3bc22e53c qemu_cond_timedwait_ts (qemu-kvm + 0x9be53c) #3 0x000055d3bc22e4e0 qemu_cond_timedwait_impl (qemu-kvm + 0x9be4e0) #4 0x000055d3bc2492a7 worker_thread (qemu-kvm + 0x9d92a7) #5 0x000055d3bc22edea qemu_thread_start (qemu-kvm + 0x9bedea) #6 0x00007f030269f802 start_thread (libc.so.6 + 0x9f802) #7 0x00007f030263f450 __clone3 (libc.so.6 + 0x3f450) Stack trace of thread 363586: #0 0x00007f030269c39a __futex_abstimed_wait_common (libc.so.6 + 0x9c39a) #1 0x00007f030269eea4 pthread_cond_timedwait@@GLIBC_2.3.2 (libc.so.6 + 0x9eea4) #2 0x000055d3bc22e53c qemu_cond_timedwait_ts (qemu-kvm + 0x9be53c) #3 0x000055d3bc22e4e0 qemu_cond_timedwait_impl (qemu-kvm + 0x9be4e0) #4 0x000055d3bc2492a7 worker_thread (qemu-kvm + 0x9d92a7) #5 0x000055d3bc22edea qemu_thread_start (qemu-kvm + 0x9bedea) #6 0x00007f030269f802 start_thread (libc.so.6 + 0x9f802) #7 0x00007f030263f450 __clone3 (libc.so.6 + 0x3f450) Stack trace of thread 363578: #0 0x00007f030269c39a __futex_abstimed_wait_common (libc.so.6 + 0x9c39a) #1 0x00007f030269eea4 pthread_cond_timedwait@@GLIBC_2.3.2 (libc.so.6 + 0x9eea4) #2 0x000055d3bc22e53c qemu_cond_timedwait_ts (qemu-kvm + 0x9be53c) #3 0x000055d3bc22e4e0 qemu_cond_timedwait_impl (qemu-kvm + 0x9be4e0) #4 0x000055d3bc2492a7 worker_thread (qemu-kvm + 0x9d92a7) #5 0x000055d3bc22edea qemu_thread_start (qemu-kvm + 0x9bedea) #6 0x00007f030269f802 start_thread (libc.so.6 + 0x9f802) #7 0x00007f030263f450 __clone3 (libc.so.6 + 0x3f450) Stack trace of thread 363471: #0 0x00007f030263ec6b ioctl (libc.so.6 + 0x3ec6b) #1 0x000055d3bbfe998b kvm_vcpu_ioctl (qemu-kvm + 0x77998b) #2 0x000055d3bbfef191 kvm_cpu_exec (qemu-kvm + 0x77f191) #3 0x000055d3bbff178a kvm_vcpu_thread_fn (qemu-kvm + 0x78178a) #4 0x000055d3bc22edea qemu_thread_start (qemu-kvm + 0x9bedea) #5 0x00007f030269f802 start_thread (libc.so.6 + 0x9f802) #6 0x00007f030263f450 __clone3 (libc.so.6 + 0x3f450) Stack trace of thread 363579: #0 0x00007f030269c39a __futex_abstimed_wait_common (libc.so.6 + 0x9c39a) #1 0x00007f030269eea4 pthread_cond_timedwait@@GLIBC_2.3.2 (libc.so.6 + 0x9eea4) #2 0x000055d3bc22e53c qemu_cond_timedwait_ts (qemu-kvm + 0x9be53c) #3 0x000055d3bc22e4e0 qemu_cond_timedwait_impl (qemu-kvm + 0x9be4e0) #4 0x000055d3bc2492a7 worker_thread (qemu-kvm + 0x9d92a7) #5 0x000055d3bc22edea qemu_thread_start (qemu-kvm + 0x9bedea) #6 0x00007f030269f802 start_thread (libc.so.6 + 0x9f802) #7 0x00007f030263f450 __clone3 (libc.so.6 + 0x3f450) Stack trace of thread 363585: #0 0x00007f030269c39a __futex_abstimed_wait_common (libc.so.6 + 0x9c39a) #1 0x00007f030269eea4 pthread_cond_timedwait@@GLIBC_2.3.2 (libc.so.6 + 0x9eea4) #2 0x000055d3bc22e53c qemu_cond_timedwait_ts (qemu-kvm + 0x9be53c) #3 0x000055d3bc22e4e0 qemu_cond_timedwait_impl (qemu-kvm + 0x9be4e0) #4 0x000055d3bc2492a7 worker_thread (qemu-kvm + 0x9d92a7) #5 0x000055d3bc22edea qemu_thread_start (qemu-kvm + 0x9bedea) #6 0x00007f030269f802 start_thread (libc.so.6 + 0x9f802) #7 0x00007f030263f450 __clone3 (libc.so.6 + 0x3f450) Stack trace of thread 363582: #0 0x00007f030269c39a __futex_abstimed_wait_common (libc.so.6 + 0x9c39a) #1 0x00007f030269eea4 pthread_cond_timedwait@@GLIBC_2.3.2 (libc.so.6 + 0x9eea4) #2 0x000055d3bc22e53c qemu_cond_timedwait_ts (qemu-kvm + 0x9be53c) #3 0x000055d3bc22e4e0 qemu_cond_timedwait_impl (qemu-kvm + 0x9be4e0) #4 0x000055d3bc2492a7 worker_thread (qemu-kvm + 0x9d92a7) #5 0x000055d3bc22edea qemu_thread_start (qemu-kvm + 0x9bedea) #6 0x00007f030269f802 start_thread (libc.so.6 + 0x9f802) #7 0x00007f030263f450 __clone3 (libc.so.6 + 0x3f450) Stack trace of thread 363467: #0 0x00007f030263ec6b ioctl (libc.so.6 + 0x3ec6b) #1 0x000055d3bbfe998b kvm_vcpu_ioctl (qemu-kvm + 0x77998b) #2 0x000055d3bbfef191 kvm_cpu_exec (qemu-kvm + 0x77f191) #3 0x000055d3bbff178a kvm_vcpu_thread_fn (qemu-kvm + 0x78178a) #4 0x000055d3bc22edea qemu_thread_start (qemu-kvm + 0x9bedea) #5 0x00007f030269f802 start_thread (libc.so.6 + 0x9f802) #6 0x00007f030263f450 __clone3 (libc.so.6 + 0x3f450) ELF object binary architecture: AMD x86-64 I tried with qemu-kvm-7.2.0-5.el9, and still reproduce it with a ratio: 4/50 Then I run the case on qemu-kvm-7.1.0-7.el9 for 100 times, and all tests pass. So from the test result, we can see that the code fix reduces the reproduce ratio to a low level, but not a complete fix. As it's still a regression bug but with a lower reproduce ratio, I'm not sure if we need to file a new bug to track, or just use this existing one is ok? BR, Aliang (In reply to aihua liang from comment #23) > > So from the test result, we can see that the code fix reduces the reproduce > ratio to a low level, but not a complete fix. Yep, Hanna and Kevin suggested on IRC to take a look at locking in bdrv_append(). I'll do in the next days. > > As it's still a regression bug but with a lower reproduce ratio, I'm not > sure if we need to file a new bug to track, or just use this existing one is > ok? Just to update this BZ, we agreed to create a new BZ since we need to fix qemu upstream and then backport the patch downstream. The new BZ is BZ2168209. We can continue to discuss there. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: qemu-kvm security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2023:2162 |