RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2167838 - qemu: qemu_mutex_unlock_impl: Operation not permitted
Summary: qemu: qemu_mutex_unlock_impl: Operation not permitted
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: qemu-kvm
Version: 9.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Emanuele Giuseppe Esposito
QA Contact: qing.wang
URL:
Whiteboard:
Depends On: 2180898
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-02-07 16:46 UTC by Lukáš Doktor
Modified: 2023-11-07 09:16 UTC (History)
9 users (show)

Fixed In Version: qemu-kvm-8.0.0-1.el9
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-11-07 08:26:38 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
/var/log/libvirt/qemu/$hostname.log file containing the crash (5.34 KB, text/plain)
2023-02-07 16:46 UTC, Lukáš Doktor
no flags Details
TunedLibvirt libvirt machine XML (6.96 KB, application/xml)
2023-02-07 17:52 UTC, Lukáš Doktor
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHELPLAN-147803 0 None None None 2023-02-07 16:48:02 UTC
Red Hat Product Errata RHSA-2023:6368 0 None None None 2023-11-07 08:27:17 UTC

Description Lukáš Doktor 2023-02-07 16:46:04 UTC
Created attachment 1942743 [details]
/var/log/libvirt/qemu/$hostname.log file containing the crash

Description of problem:
I'm running an upstream qemu CI based on RHEL-9.1.0-20221027.3 and between upstream qemu deabea6e88f7c4c3c12a36ee30051c6209561165 and 5736527050cfcc5b92521d79fe87b4883059d864 guests started crashing just before fio test execution.

Version-Release number of selected component (if applicable):
* upstream qemu 5736527050cfcc5b92521d79fe87b4883059d864
* host&guest: RHEL-9.1.0-20221027.3 as well as RHEL-8.7.0-20221013.1

How reproducible:
Always in the CI

Steps to Reproduce:
1. run-perf --host-setup-script host_script --host-setup-script-reboot --worker-setup-script worker_script -v --hosts virtlab506.virt.lab.eng.bos.redhat.com --distro RHEL-8.7.0-20221013.1 --default-password PASSWORD --profiles 'Localhost:{"qemu_bin": "/usr/local/bin/qemu-system-x86_64", "__KEEP_ASSETS__": "yes", "RUNPERF_TESTS": [["fio", {"runtime": "60", "targets": "/fio", "block-sizes": "1024", "test-types": "read,write", "samples": "5", "numjobs": "1", "iodepth": "1", "__NAME__": "fio-rot-1j-1i"}]]}' 'DefaultLibvirt:{"qemu_bin": "/usr/local/bin/qemu-system-x86_64", "__KEEP_ASSETS__": "yes", "virt-install-extra": "--vsock cid.address=42", "RUNPERF_TESTS": [["fio", {"runtime": "60", "targets": "/fio", "block-sizes": "1024", "test-types": "read,write", "samples": "5", "numjobs": "1", "iodepth": "1", "__NAME__": "fio-rot-1j-1i"}]]}' 'TunedLibvirt:{"qemu_bin": "/usr/local/bin/qemu-system-x86_64", "__KEEP_ASSETS__": "yes", "__none__": "__none__"}' --log run.log --paths ../runperf_config/ -- 'fio:{"runtime": "60", "targets": "/fio", "block-sizes": "4,1024", "test-types": "read,write", "samples": "5", "numjobs": "1", "iodepth": "1", "__NAME__": "fio-rot-1j-1i"}' 'fio:{"runtime": "60", "targets": "/fio", "block-sizes": "4,1024", "test-types": "read,write", "samples": "5", "numjobs": "__PER_WORKER_CPUS__", "iodepth": "8", "__NAME__": "fio-rot-Aj-8i"}' 'uperf:{"runtime": "60", "protocols": "tcp", "__NAME__": "uperf-tcp", "message-sizes": "1,16384", "instances": "1,64", "max-stddev": "20"}' 'uperf:{"runtime": "60", "protocols": "udp", "test-types": "rr", "__NAME__": "uperf-udp", "message-sizes": "1,16384", "instances": "1,64", "max-stddev": "20"}' 'linpack:{"threads": "1,128"}'

Actual results:
The first test executed in TunedLibvirt profile stalls and gets interrupted. From logs one can see that when pbench-fio ssh-ed to the VM the VM crashed.

Expected results:
VM should not crash.

Additional info:
I'll try to simplify the reproducer, but so far I was not successful when running it without the first DefaultLibvirt profile (which is odd as it uses completely different image and host reboots before the TunedLibvirt is applied).

Comment 1 Lukáš Doktor 2023-02-07 17:52:02 UTC
Created attachment 1942748 [details]
TunedLibvirt libvirt machine XML

OK so simple start of the VM and SSH to it is sufficient to reproduce this. Let me add the machine XML to see which feature is responsible for the crash.

Comment 2 Lukáš Doktor 2023-02-07 18:05:06 UTC
So the problem are the iothreads. When I remove the iothread from the /dev/vda everything works well. I'll try to bisect the commit (probably tomorrow)

Comment 3 Lukáš Doktor 2023-02-08 06:40:46 UTC
The bisection pointed to 1e97be915697fff198e9922321066cf9b44ef4b9 "block: Convert bdrv_is_inserted() to co_wrapper". Emanuele, could you please take a look at this?

Comment 4 Emanuele Giuseppe Esposito 2023-02-08 09:48:31 UTC
Hi Lukáš, could you run `coredump` after QEMU fails? All I need is the stack trace of the crash.

This is an "expected" mistake: bdrv_is_inserted could be called by both coroutine and not-coroutine context, and in the latter case the annotation `co_wrapper` will automatically create a new coroutine and wait that it finishes before continuing. The "wait" part is where we unfortunately must still call AIO_WAIT_WHILE(), which releases the bs AioContext lock and reacquires it afterwards. Now, if the caller of bdrv_is_inserted() is not taking the same AioContext lock beforehand, it will crash because we are trying to unlock something that was not locked.

The current issue with the code is that it is not clear which function is running in coroutine and which isn't, and bdrv/blk_is_inserted are used basically everywhere, so sometimes such mistakes could happen.
To fix this specific case, I just need to find which caller forgot to take the lock, that's why I need the full stack trace. In the meantime I will discuss with Kevin if we can find a general solution for bdrv/blk_is_inserted().

Thank you,
Emanuele

Comment 5 Lukáš Doktor 2023-02-08 10:56:59 UTC
Hopefully this suffices (executed on RHEL8 but the failure is alike):

           PID: 132555 (qemu-system-x86)
           UID: 107 (qemu)
           GID: 107 (qemu)
        Signal: 6 (ABRT)
     Timestamp: Wed 2023-02-08 05:53:57 EST (47s ago)
  Command Line: /usr/local/bin/qemu-system-x86_64 -name guest=virtlab506.TunedLibvirt0,debug-threads=on -S -object {"qom-type":"secret","id":"master>
    Executable: /usr/local/bin/qemu-system-x86_64
 Control Group: /
         Slice: -.slice
       Boot ID: 88a9619c80da4263b415e4b41a1ad80c
    Machine ID: 3a81340da6e241ddbb6e281d4b5b4dfc
      Hostname: virtlab506.virt.lab.eng.bos.redhat.com
       Storage: /var/lib/systemd/coredump/core.qemu-system-x86.107.88a9619c80da4263b415e4b41a1ad80c.132555.1675853637000000.lz4
       Message: Process 132555 (qemu-system-x86) of user 107 dumped core.
                
                Stack trace of thread 132580:
                #0  0x00007fa8fbb20aff raise (libc.so.6)
                #1  0x00007fa8fbaf3ea5 abort (libc.so.6)
                #2  0x000055dab20f532e error_exit (qemu-system-x86_64)
                #3  0x000055dab20f5b3d qemu_mutex_unlock_impl (qemu-system-x86_64)
                #4  0x000055dab1fc9d38 bdrv_poll_co (qemu-system-x86_64)
                #5  0x000055dab1ff0499 blk_is_available (qemu-system-x86_64)
                #6  0x000055dab1ff04c9 blk_getlength (qemu-system-x86_64)
                #7  0x000055dab1e96f71 virtio_blk_update_config (qemu-system-x86_64)
                #8  0x000055dab1ed8f44 virtio_config_readb (qemu-system-x86_64)
                #9  0x000055dab1d5bb05 virtio_pci_device_read (qemu-system-x86_64)
                #10 0x000055dab1f0a8dc memory_region_read_accessor (qemu-system-x86_64)
                #11 0x000055dab1f06f6e access_with_adjusted_size (qemu-system-x86_64)
                #12 0x000055dab1f0a341 memory_region_dispatch_read1 (qemu-system-x86_64)
                #13 0x000055dab1f14ce8 flatview_read_continue (qemu-system-x86_64)
                #14 0x000055dab1f14e80 flatview_read (qemu-system-x86_64)
                #15 0x000055dab1f14fc6 address_space_read_full (qemu-system-x86_64)
                #16 0x000055dab1f8e9c9 kvm_cpu_exec (qemu-system-x86_64)
                #17 0x000055dab1f8fcfd kvm_vcpu_thread_fn (qemu-system-x86_64)
                #18 0x000055dab20f543a qemu_thread_start (qemu-system-x86_64)
                #19 0x00007fa8fbea01ca start_thread (libpthread.so.0)
                #20 0x00007fa8fbb0be73 __clone (libc.so.6)
                
                Stack trace of thread 132587:
                #0  0x00007fa8fbbf70e1 __poll (libc.so.6)
                #1  0x00007fa8fe783c86 g_main_context_iterate.isra.20 (libglib-2.0.so.0)
                #2  0x00007fa8fe784042 g_main_loop_run (libglib-2.0.so.0)
                #3  0x00007fa8ff9178db red_worker_main (libspice-server.so.1)
                #4  0x00007fa8fbea01ca start_thread (libpthread.so.0)
                #5  0x00007fa8fbb0be73 __clone (libc.so.6)
                
                Stack trace of thread 132555:
                #0  0x00007fa8fbbf71d6 ppoll (libc.so.6)
                #1  0x000055dab2109471 ppoll (qemu-system-x86_64)
                #2  0x000055dab2106bd5 os_host_main_loop_wait (qemu-system-x86_64)
                #3  0x000055dab1d78999 qemu_main_loop (qemu-system-x86_64)
                #4  0x000055dab1bc8fa6 qemu_default_main (qemu-system-x86_64)
                #5  0x00007fa8fbb0cd85 __libc_start_main (libc.so.6)
                #6  0x000055dab1bc8ede _start (qemu-system-x86_64)
                
                Stack trace of thread 132566:
                #0  0x00007fa8fbbf71d6 ppoll (libc.so.6)
                #1  0x000055dab21094b9 ppoll (qemu-system-x86_64)
                #2  0x000055dab20f34d9 fdmon_poll_wait (qemu-system-x86_64)
                #3  0x000055dab20f28d3 aio_poll (qemu-system-x86_64)
                #4  0x000055dab1fc6106 iothread_run (qemu-system-x86_64)
                #5  0x000055dab20f543a qemu_thread_start (qemu-system-x86_64)
                #6  0x00007fa8fbea01ca start_thread (libpthread.so.0)
                #7  0x00007fa8fbb0be73 __clone (libc.so.6)
                
                Stack trace of thread 132572:
                #0  0x00007fa8fbbf70e1 __poll (libc.so.6)
                #1  0x00007fa8fe783c86 g_main_context_iterate.isra.20 (libglib-2.0.so.0)
                #2  0x00007fa8fe784042 g_main_loop_run (libglib-2.0.so.0)
                #3  0x000055dab1fc6139 iothread_run (qemu-system-x86_64)
                #4  0x000055dab20f543a qemu_thread_start (qemu-system-x86_64)
                #5  0x00007fa8fbea01ca start_thread (libpthread.so.0)
                #6  0x00007fa8fbb0be73 __clone (libc.so.6)
                
                Stack trace of thread 132564:
                #0  0x00007fa8fbb0b9bd syscall (libc.so.6)
                #1  0x000055dab20f631b qemu_futex_wait (qemu-system-x86_64)
                #2  0x000055dab20fe5fa call_rcu_thread (qemu-system-x86_64)
                #3  0x000055dab20f543a qemu_thread_start (qemu-system-x86_64)
                #4  0x00007fa8fbea01ca start_thread (libpthread.so.0)
                #5  0x00007fa8fbb0be73 __clone (libc.so.6)
                
                Stack trace of thread 132575:
                #0  0x00007fa8fbb0b7cb ioctl (libc.so.6)
                #1  0x000055dab1f8e63f kvm_vcpu_ioctl (qemu-system-x86_64)
                #2  0x000055dab1f8e7b1 kvm_cpu_exec (qemu-system-x86_64)
                #3  0x000055dab1f8fcfd kvm_vcpu_thread_fn (qemu-system-x86_64)
                #4  0x000055dab20f543a qemu_thread_start (qemu-system-x86_64)
                #5  0x00007fa8fbea01ca start_thread (libpthread.so.0)
                #6  0x00007fa8fbb0be73 __clone (libc.so.6)
                
                Stack trace of thread 132577:
                #0  0x00007fa8fbb0b7cb ioctl (libc.so.6)
                #1  0x000055dab1f8e63f kvm_vcpu_ioctl (qemu-system-x86_64)
                #2  0x000055dab1f8e7b1 kvm_cpu_exec (qemu-system-x86_64)
                #3  0x000055dab1f8fcfd kvm_vcpu_thread_fn (qemu-system-x86_64)
                #4  0x000055dab20f543a qemu_thread_start (qemu-system-x86_64)
                #5  0x00007fa8fbea01ca start_thread (libpthread.so.0)
                #6  0x00007fa8fbb0be73 __clone (libc.so.6)
                
                Stack trace of thread 132631:
                #0  0x00007fa8fbea67aa pthread_cond_timedwait@@GLIBC_2.3.2 (libpthread.so.0)
                #1  0x000055dab20f5501 qemu_cond_timedwait_ts (qemu-system-x86_64)
                #2  0x000055dab20f5f99 qemu_cond_timedwait_impl (qemu-system-x86_64)
                #3  0x000055dab21088ac worker_thread (qemu-system-x86_64)
                #4  0x000055dab20f543a qemu_thread_start (qemu-system-x86_64)
                #5  0x00007fa8fbea01ca start_thread (libpthread.so.0)
                #6  0x00007fa8fbb0be73 __clone (libc.so.6)
                
                Stack trace of thread 132576:
                #0  0x00007fa8fbb0b7cb ioctl (libc.so.6)
                #1  0x000055dab1f8e63f kvm_vcpu_ioctl (qemu-system-x86_64)
                #2  0x000055dab1f8e7b1 kvm_cpu_exec (qemu-system-x86_64)
                #3  0x000055dab1f8fcfd kvm_vcpu_thread_fn (qemu-system-x86_64)
                #4  0x000055dab20f543a qemu_thread_start (qemu-system-x86_64)
                #5  0x00007fa8fbea01ca start_thread (libpthread.so.0)
                #6  0x00007fa8fbb0be73 __clone (libc.so.6)
                
                Stack trace of thread 132578:
                #0  0x00007fa8fbb0b7cb ioctl (libc.so.6)
                #1  0x000055dab1f8e63f kvm_vcpu_ioctl (qemu-system-x86_64)
                #2  0x000055dab1f8e7b1 kvm_cpu_exec (qemu-system-x86_64)
                #3  0x000055dab1f8fcfd kvm_vcpu_thread_fn (qemu-system-x86_64)
                #4  0x000055dab20f543a qemu_thread_start (qemu-system-x86_64)
                #5  0x00007fa8fbea01ca start_thread (libpthread.so.0)
                #6  0x00007fa8fbb0be73 __clone (libc.so.6)
                
                Stack trace of thread 132574:
                #0  0x00007fa8fbb0b7cb ioctl (libc.so.6)
                #1  0x000055dab1f8e63f kvm_vcpu_ioctl (qemu-system-x86_64)
                #2  0x000055dab1f8e7b1 kvm_cpu_exec (qemu-system-x86_64)
                #3  0x000055dab1f8fcfd kvm_vcpu_thread_fn (qemu-system-x86_64)
                #4  0x000055dab20f543a qemu_thread_start (qemu-system-x86_64)
                #5  0x00007fa8fbea01ca start_thread (libpthread.so.0)
                #6  0x00007fa8fbb0be73 __clone (libc.so.6)
                
                Stack trace of thread 132573:
                #0  0x00007fa8fbb0b7cb ioctl (libc.so.6)
                #1  0x000055dab1f8e63f kvm_vcpu_ioctl (qemu-system-x86_64)
                #2  0x000055dab1f8e7b1 kvm_cpu_exec (qemu-system-x86_64)
                #3  0x000055dab1f8fcfd kvm_vcpu_thread_fn (qemu-system-x86_64)
                #4  0x000055dab20f543a qemu_thread_start (qemu-system-x86_64)
                #5  0x00007fa8fbea01ca start_thread (libpthread.so.0)
                #6  0x00007fa8fbb0be73 __clone (libc.so.6)
                
                Stack trace of thread 132579:
                #0  0x00007fa8fbb0b7cb ioctl (libc.so.6)
                #1  0x000055dab1f8e63f kvm_vcpu_ioctl (qemu-system-x86_64)
                #2  0x000055dab1f8e7b1 kvm_cpu_exec (qemu-system-x86_64)
                #3  0x000055dab1f8fcfd kvm_vcpu_thread_fn (qemu-system-x86_64)
                #4  0x000055dab20f543a qemu_thread_start (qemu-system-x86_64)
                #5  0x00007fa8fbea01ca start_thread (libpthread.so.0)
                #6  0x00007fa8fbb0be73 __clone (libc.so.6)

Comment 6 Emanuele Giuseppe Esposito 2023-02-08 11:14:10 UTC
Exactly what I needed. 
I also managed to reproduce the bug myself, with the help of Kevin.
Patch sent upstream: https://patchew.org/QEMU/20230208111148.1040083-1-eesposit@redhat.com/

Comment 7 Lukáš Doktor 2023-02-08 12:39:54 UTC
Thank you, the patch works for me (at least with the simplified reproducer + bunch of fio tests just in case). Looking forward for the patch to be applied.

Comment 19 Yanan Fu 2023-04-24 10:34:17 UTC
QE bot(pre verify): Set 'Verified:Tested,SanityOnly' as gating/tier1 test pass.

Comment 22 qing.wang 2023-05-19 06:27:06 UTC
Passed regression test on

Red Hat Enterprise Linux release 9.3 Beta (Plow)
5.14.0-312.el9.x86_64
qemu-kvm-8.0.0-2.el9.x86_64
seabios-bin-1.16.1-1.el9.noarch
edk2-ovmf-20230301gitf80f052277c8-3.el9.noarch
libvirt-9.0.0-10.1.el9_2.x86_64
virtio-win-prewhql-0.1-236.iso

Comment 24 errata-xmlrpc 2023-11-07 08:26:38 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: qemu-kvm security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:6368


Note You need to log in before you can comment on or make changes to this bug.