Description of problem: Boot up a guest with iothread object, unplug the data disk, then hotplug it again, core dump will happen. Version-Release number of selected component (if applicable): Host: RHEL-8.0-20181118.n.0 kernel-4.18.0-40.el8.x86_64 Version-Release number of selected component (if applicable): Host: RHEL-8.0-20181118.n.0 kernel-4.18.0-40.el8.x86_64 qemu-kvm-3.1.0-0.module+el8+2266+616cf026.next.candidate.x86_64 Guest: RHEL-8.0-20181101.1 kernel-4.18.0-45.el8.x86_64 How reproducible: 100% Steps to Reproduce: 1. Boot up the guest with below command line: /usr/libexec/qemu-kvm -M q35 \ -S \ -cpu SandyBridge \ -enable-kvm \ -m 4G \ -smp 4 \ -object iothread,id=iothread0 \ -rtc base=utc,clock=host,driftfix=slew \ -device pcie-root-port,id=pcie.0-root-port-2,slot=2,chassis=2,addr=0x2,bus=pcie.0 \ -device pcie-root-port,id=pcie.0-root-port-3,slot=3,chassis=3,addr=0x3,bus=pcie.0 \ -device virtio-scsi-pci,id=scsi0,iothread=iothread0,bus=pcie.0-root-port-2,addr=0x0 \ -blockdev driver=raw,cache.direct=off,cache.no-flush=on,file.filename=/home/kvmtest/win2.raw,node-name=win_disk,file.driver=file \ -device scsi-hd,drive=win_disk \ -blockdev driver=raw,cache.direct=off,cache.no-flush=on,file.filename=/home/kvmtest/ddd.raw,node-name=data_disk1,file.driver=file \ -device scsi-hd,drive=data_disk1,id=data1 \ -device virtio-net-pci,mac=3c:0b:65:a6:90:66,id=netdev1,vectors=4,netdev=net1,bus=pcie.0-root-port-3 -netdev tap,id=net1,vhost=on \ -qmp tcp:0:4446,server,nowait \ -vga qxl \ -vnc :4 \ -monitor stdio \ -boot menu=on 2. Unplug the data disk: {"execute":"device_del","arguments":{"id":"data1"}} {"timestamp": {"seconds": 1543988703, "microseconds": 565756}, "event": "DEVICE_DELETED", "data": {"device": "data1", "path": "/machine/peripheral/data1"}} {"return": {}} 3. Hotplug the delated disk: { 'execute':'device_add','arguments':{'driver':'scsi-hd','drive':'data_disk1','id':'data1'}} Actual results: For step 3, the disk could be pluged successfully. Expected results: For step 3, core dump happened. Additional info: 1) Check the coredump file with gdb, the following info could be get: Core was generated by `/usr/libexec/qemu-kvm -M q35 -S -cpu SandyBridge -enable-kvm -m 4G -smp 4 -obje'. Program terminated with signal SIGABRT, Aborted. #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 50 return ret; [Current thread is 1 (Thread 0x7ff869d4ee80 (LWP 10277))] (gdb) (gdb) (gdb) bt #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 #1 0x00007ff864ddcc95 in __GI_abort () at abort.c:79 #2 0x00005607c296d51e in error_exit (err=<optimized out>, msg=msg@entry=0x5607c2af7270 <__func__.18942> "qemu_mutex_unlock_impl") at util/qemu-thread-posix.c:36 #3 0x00005607c296d76a in qemu_mutex_unlock_impl (mutex=mutex@entry=0x5607c348e180, file=file@entry=0x5607c2af67ff "util/async.c", line=line@entry=516) at util/qemu-thread-posix.c:96 #4 0x00005607c2968dd9 in aio_context_release (ctx=ctx@entry=0x5607c348e120) at util/async.c:516 #5 0x00005607c28daa88 in blk_prw (blk=blk@entry=0x5607c358d6f0, offset=offset@entry=0, buf=buf@entry=0x7ffe58c49fa0 "\002", bytes=bytes@entry=512, co_entry=co_entry@entry=0x5607c28dc5f0 <blk_read_entry>, flags=flags@entry=0) at block/block-backend.c:1260 #6 0x00005607c28dc83b in blk_pread (count=512, buf=0x7ffe58c49fa0, offset=0, blk=0x5607c358d6f0) at block/block-backend.c:1422 #7 blk_pread_unthrottled (blk=blk@entry=0x5607c358d6f0, offset=offset@entry=0, buf=buf@entry=0x7ffe58c49fa0 "\002", count=count@entry=512) at block/block-backend.c:1277 #8 0x00005607c2785b9f in guess_disk_lchs (blk=blk@entry=0x5607c358d6f0, pcylinders=pcylinders@entry=0x7ffe58c4a1e4, pheads=pheads@entry=0x7ffe58c4a1e8, psectors=psectors@entry=0x7ffe58c4a1ec) at hw/block/hd-geometry.c:71 #9 0x00005607c2785cff in hd_geometry_guess (blk=0x5607c358d6f0, pcyls=pcyls@entry=0x5607c3417cac, pheads=pheads@entry=0x5607c3417cb0, psecs=psecs@entry=0x5607c3417cb4, ptrans=ptrans@entry=0x0) at hw/block/hd-geometry.c:136 #10 0x00005607c27858db in blkconf_geometry (conf=conf@entry=0x5607c3417c90, ptrans=ptrans@entry=0x0, cyls_max=cyls_max@entry=65535, heads_max=heads_max@entry=255, secs_max=secs_max@entry=255, errp=errp@entry=0x7ffe58c4a2c0) at hw/block/block.c:126 #11 0x00005607c2800248 in scsi_realize (dev=0x5607c3417c00, errp=0x7ffe58c4a2c0) at hw/scsi/scsi-disk.c:2393 #12 0x00005607c2807757 in scsi_device_realize (errp=0x7ffe58c4a2c0, s=0x5607c3417c00) at hw/scsi/scsi-bus.c:54 #13 scsi_qdev_realize (qdev=<optimized out>, errp=0x7ffe58c4a320) at hw/scsi/scsi-bus.c:204 #14 0x00005607c2797390 in device_set_realized (obj=<optimized out>, value=<optimized out>, errp=0x7ffe58c4a448) at hw/core/qdev.c:826 #15 0x00005607c288f08b in property_set_bool (obj=0x5607c3417c00, v=<optimized out>, name=<optimized out>, opaque=0x5607c34192d0, errp=0x7ffe58c4a448) at qom/object.c:1984 #16 0x00005607c2893253 in object_property_set_qobject (obj=0x5607c3417c00, value=<optimized out>, name=0x5607c2a386bd "realized", errp=0x7ffe58c4a448) at qom/qom-qobject.c:27 #17 0x00005607c2890c49 in object_property_set_bool (obj=0x5607c3417c00, value=<optimized out>, name=0x5607c2a386bd "realized", errp=0x7ffe58c4a448) at qom/object.c:1242 #18 0x00005607c273dda4 in qdev_device_add (opts=opts@entry=0x5607c385b170, errp=errp@entry=0x7ffe58c4a520) at qdev-monitor.c:626 #19 0x00005607c273e27b in qmp_device_add (qdict=<optimized out>, ret_data=<optimized out>, errp=0x7ffe58c4a568) at qdev-monitor.c:806 #20 0x00005607c295dba3 in do_qmp_dispatch (errp=0x7ffe58c4a560, allow_oob=<optimized out>, request=<optimized out>, cmds=0x5607c31a6e70 <qmp_commands>) at qapi/qmp-dispatch.c:130 #21 qmp_dispatch (cmds=0x5607c31a6e70 <qmp_commands>, request=<optimized out>, allow_oob=<optimized out>) at qapi/qmp-dispatch.c:172 #22 0x00005607c264d4f3 in monitor_qmp_dispatch (mon=0x5607c341c400, req=<optimized out>, id=0x0) at /usr/src/debug/qemu-kvm-3.0.0-2.module+el8+2208+e41b12e0.x86_64/monitor.c:4158 #23 0x00005607c2652ee0 in monitor_qmp_bh_dispatcher (data=<optimized out>) at /usr/src/debug/qemu-kvm-3.0.0-2.module+el8+2208+e41b12e0.x86_64/monitor.c:4227 ... 2) Host cpu info: # lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 24 On-line CPU(s) list: 0-23 Thread(s) per core: 2 Core(s) per socket: 6 Socket(s): 2 NUMA node(s): 2 Vendor ID: GenuineIntel CPU family: 6 Model: 45 Model name: Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz Stepping: 7 CPU MHz: 2416.091 CPU max MHz: 2500.0000 CPU min MHz: 1200.0000 BogoMIPS: 3999.76 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 15360K NUMA node0 CPU(s): 0-5,12-17 NUMA node1 CPU(s): 6-11,18-23 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm epb pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm ida arat pln pts flush_l1d
Reproduced upstream with this simplified reproducer: qemu-system-x86_64 -nodefaults -display none -M q35 -S -object iothread,id=iothread0 -device pcie-root-port,id=pcie.0-root-port-2,slot=2,chassis=2,addr=0x2,bus=pcie.0 -device virtio-scsi-pci,id=scsi0,iothread=iothread0,bus=pcie.0-root-port-2,addr=0x0 -blockdev node-name=data_disk1,driver=file,filename=tmp.img -device scsi-hd,drive=data_disk1,id=data1 -qmp stdio {"QMP": {"version": {"qemu": {"micro": 95, "minor": 0, "major": 3}, "package": "v3.1.0-88-gc3ec0fa1a8"}, "capabilities": ["oob"]}} {"execute": "qmp_capabilities"} {"return": {}} {"execute":"device_del","arguments":{"id":"data1"}} {"timestamp": {"seconds": 1544772971, "microseconds": 141232}, "event": "DEVICE_DELETED", "data": {"device": "data1", "path": "/machine/peripheral/data1"}} {"return": {}} {"execute":"device_add","arguments":{"driver":"scsi-hd","drive":"data_disk1","id":"data1"}} qemu: qemu_mutex_unlock_impl: Operation not permitted Aborted (core dumped)
We release an AioContext we didn't acquire: {"execute":"device_del","arguments":{"id":"data1"}} ### aio_context_acquire 0x5585f0c644d0 ### aio_context_release 0x5585f0c644d0 ### aio_context_acquire 0x5585f0c644d0 ### aio_context_release 0x5585f0c644d0 {"timestamp": {"seconds": 1544777807, "microseconds": 966713}, "event": "DEVICE_DELETED", "data": {"device": "data1", "path": "/machine/peripheral/data1"}} {"return": {}} {"execute":"device_add","arguments":{"driver":"scsi-hd","drive":"data_disk1","id":"data1"}} --> ### aio_context_release 0x5585f0c644d0 qemu: qemu_mutex_unlock_impl: Operation not permitted ### aio_context_acquire 0x5585f0c644d0 ### aio_context_release 0x5585f0c644d0 Aborted (core dumped) Output is from the obvious debugging patch diff --git a/util/async.c b/util/async.c index c10642a385..a6625e75f3 100644 --- a/util/async.c +++ b/util/async.c @@ -508,10 +508,12 @@ void aio_context_unref(AioContext *ctx) void aio_context_acquire(AioContext *ctx) { + printf("### %s %p\n", __func__, ctx); qemu_rec_mutex_lock(&ctx->lock); } void aio_context_release(AioContext *ctx) { + printf("### %s %p\n", __func__, ctx); qemu_rec_mutex_unlock(&ctx->lock); }
*** Bug 1658974 has been marked as a duplicate of this bug. ***
Possible upstream patch: Subject: [PATCH 0/6] Acquire the AioContext during _realize() Message-Id: <cover.1547132561.git.berto> https://lists.nongnu.org/archive/html/qemu-devel/2019-01/msg01967.html The patch also fixes bug 1662508 for me. The two bugs are definitely related, but I'm not yet sure they're actually duplicates.
Update: 1. Reproduced with kernel-4.18.0-60.el8.x86_64 + qemu-kvm-3.1.0-5.module+el8+2708+fbd828c6.x86_64; 2. Verified with kernel-4.18.0-60.el8.x86_64 + qemu-kvm-3.1.0-7.el8bz1656276.armbru1.x86_64 1) Boot guest with the following command line: /usr/libexec/qemu-kvm -M q35 \ -S \ -cpu SandyBridge \ -enable-kvm \ -m 4G \ -smp 4 \ -object iothread,id=iothread0 \ -rtc base=utc,clock=host,driftfix=slew \ -device pcie-root-port,id=pcie.0-root-port-2,slot=2,chassis=2,addr=0x2,bus=pcie.0 \ -device pcie-root-port,id=pcie.0-root-port-3,slot=3,chassis=3,addr=0x3,bus=pcie.0 \ ***-device virtio-scsi-pci,id=scsi0,iothread=iothread0,bus=pcie.0-root-port-2,addr=0x0 \ *** -blockdev driver=file,cache.direct=on,cache.no-flush=off,filename=/home/rhel-7.6.z.qcow2,node-name=win_disk \ -blockdev driver=qcow2,node-name=drive_win,file=win_disk \ -device scsi-hd,drive=drive_win,id=win1,write-cache=on \ ***-blockdev driver=file,cache.direct=on,cache.no-flush=off,filename=/home/data.qcow2,node-name=data_disk1 \ -blockdev driver=qcow2,node-name=drive_stg1,file=data_disk1 \ -device scsi-hd,drive=drive_stg1,id=data1,write-cache=on \ *** -device virtio-net-pci,mac=6c:ae:8b:20:80:59,id=netdev1,vectors=4,netdev=net1,bus=pcie.0-root-port-3 -netdev tap,id=net1,vhost=on \ -qmp tcp:0:4446,server,nowait \ -vga qxl \ -vnc :4 \ -monitor stdio \ -boot menu=on 2) Unplug the data disk: {"execute":"device_del","arguments":{"id":"data1"}} {"timestamp": {"seconds": 1548736583, "microseconds": 557905}, "event": "DEVICE_DELETED", "data": {"device": "data1", "path": "/machine/peripheral/data1"}} {"return": {}} 3. Hotplug the deleted disk successfully: { 'execute':'device_add','arguments':{'driver':'scsi-hd','drive':'drive_stg1','id':'data1'}} {"return": {}} 4. Run IO test on the data disk, it worked normally: # lsblk sdb 8:16 0 10G disk # dd if=/dev/zero of=/dev/sdb bs=1M count=1000 oflag=direct P.S. The following error info existed in guest dmesg log after step3: device-mapper: table: 253:2: multipath: error getting device
Fix included in qemu-kvm-3.1.0-13.module+el8+2783+15cec5ae
Update: 1. Reproduced with kernel-4.18.0-60.el8.x86_64 + qemu-kvm-3.1.0-10.module+el8+2732+3228f155.x86_64 2. Verified with kernel-4.18.0-60.el8.x86_64 + qemu-kvm-3.1.0-13.module+el8+2783+15cec5ae 1) Boot guest with the following command line: /usr/libexec/qemu-kvm -M q35 \ -S \ -cpu SandyBridge \ -enable-kvm \ -m 4G \ -smp 4 \ -object iothread,id=iothread0 \ -rtc base=utc,clock=host,driftfix=slew \ -device pcie-root-port,id=pcie.0-root-port-2,slot=2,chassis=2,addr=0x2,bus=pcie.0 \ -device pcie-root-port,id=pcie.0-root-port-3,slot=3,chassis=3,addr=0x3,bus=pcie.0 \ ***-device virtio-scsi-pci,id=scsi0,iothread=iothread0,bus=pcie.0-root-port-2,addr=0x0 \ *** -blockdev driver=file,cache.direct=on,cache.no-flush=off,filename=/home/rhel-7.6.z.qcow2,node-name=win_disk \ -blockdev driver=qcow2,node-name=drive_win,file=win_disk \ -device scsi-hd,drive=drive_win,id=win1,write-cache=on \ ***-blockdev driver=file,cache.direct=on,cache.no-flush=off,filename=/home/data.qcow2,node-name=data_disk1 \ -blockdev driver=qcow2,node-name=drive_stg1,file=data_disk1 \ -device scsi-hd,drive=drive_stg1,id=data1,write-cache=on \ *** -device virtio-net-pci,mac=6c:ae:8b:20:80:59,id=netdev1,vectors=4,netdev=net1,bus=pcie.0-root-port-3 -netdev tap,id=net1,vhost=on \ -qmp tcp:0:4446,server,nowait \ -vga qxl \ -vnc :4 \ -monitor stdio \ -boot menu=on 2) Unplug the data disk: {"execute":"device_del","arguments":{"id":"data1"}} {"timestamp": {"seconds": 1548736583, "microseconds": 557905}, "event": "DEVICE_DELETED", "data": {"device": "data1", "path": "/machine/peripheral/data1"}} {"return": {}} 3) Hotplug the deleted disk successfully: { 'execute':'device_add','arguments':{'driver':'scsi-hd','drive':'drive_stg1','id':'data1'}} {"return": {}} 4) Run IO test on the data disk, it worked normally: # lsblk sdb 8:16 0 10G disk # dd if=/dev/zero of=/dev/sdb bs=1M count=1000 oflag=direct 1000+0 records in 1000+0 records out 1048576000 bytes (1.0 GB) copied, 26.2287 s, 40.0 MB/s P.S. The following dmesg log could be found when execute step 2&3, no fail/error happened # dmesg -w [ 347.311563] sd 6:0:1:0: [sdb] Synchronizing SCSI cache [ 347.311804] sd 6:0:1:0: [sdb] Synchronize Cache(10) failed: Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [ 362.104033] scsi 6:0:1:0: Direct-Access QEMU QEMU HARDDISK 2.5+ PQ: 0 ANSI: 5 [ 362.109293] sd 6:0:1:0: Attached scsi generic sg1 type 0 [ 362.109569] sd 6:0:1:0: [sdb] 20971520 512-byte logical blocks: (10.7 GB/10.0 GiB) [ 362.109788] sd 6:0:1:0: [sdb] Write Protect is off [ 362.109791] sd 6:0:1:0: [sdb] Mode Sense: 63 00 00 08 [ 362.109828] sd 6:0:1:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 362.113706] sd 6:0:1:0: [sdb] Attached SCSI disk
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:1293