Bug 1656276
Summary: | qemu-kvm core dumped after hotplug the deleted disk with iothread parameter | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux Advanced Virtualization | Reporter: | lchai <lchai> | |
Component: | qemu-kvm | Assignee: | Markus Armbruster <armbru> | |
Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> | |
Severity: | high | Docs Contact: | ||
Priority: | high | |||
Version: | 8.0 | CC: | armbru, chayang, coli, juzhang, knoel, ngu, qzhang, rbalakri, virt-maint | |
Target Milestone: | rc | |||
Target Release: | 8.0 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | qemu-kvm-3.1.0-13.module+el8+2783+15cec5ae | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1673396 1673397 (view as bug list) | Environment: | ||
Last Closed: | 2019-05-29 16:04:52 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1673396, 1673397, 1718992, 1722710 |
Description
lchai
2018-12-05 06:52:36 UTC
Reproduced upstream with this simplified reproducer: qemu-system-x86_64 -nodefaults -display none -M q35 -S -object iothread,id=iothread0 -device pcie-root-port,id=pcie.0-root-port-2,slot=2,chassis=2,addr=0x2,bus=pcie.0 -device virtio-scsi-pci,id=scsi0,iothread=iothread0,bus=pcie.0-root-port-2,addr=0x0 -blockdev node-name=data_disk1,driver=file,filename=tmp.img -device scsi-hd,drive=data_disk1,id=data1 -qmp stdio {"QMP": {"version": {"qemu": {"micro": 95, "minor": 0, "major": 3}, "package": "v3.1.0-88-gc3ec0fa1a8"}, "capabilities": ["oob"]}} {"execute": "qmp_capabilities"} {"return": {}} {"execute":"device_del","arguments":{"id":"data1"}} {"timestamp": {"seconds": 1544772971, "microseconds": 141232}, "event": "DEVICE_DELETED", "data": {"device": "data1", "path": "/machine/peripheral/data1"}} {"return": {}} {"execute":"device_add","arguments":{"driver":"scsi-hd","drive":"data_disk1","id":"data1"}} qemu: qemu_mutex_unlock_impl: Operation not permitted Aborted (core dumped) We release an AioContext we didn't acquire: {"execute":"device_del","arguments":{"id":"data1"}} ### aio_context_acquire 0x5585f0c644d0 ### aio_context_release 0x5585f0c644d0 ### aio_context_acquire 0x5585f0c644d0 ### aio_context_release 0x5585f0c644d0 {"timestamp": {"seconds": 1544777807, "microseconds": 966713}, "event": "DEVICE_DELETED", "data": {"device": "data1", "path": "/machine/peripheral/data1"}} {"return": {}} {"execute":"device_add","arguments":{"driver":"scsi-hd","drive":"data_disk1","id":"data1"}} --> ### aio_context_release 0x5585f0c644d0 qemu: qemu_mutex_unlock_impl: Operation not permitted ### aio_context_acquire 0x5585f0c644d0 ### aio_context_release 0x5585f0c644d0 Aborted (core dumped) Output is from the obvious debugging patch diff --git a/util/async.c b/util/async.c index c10642a385..a6625e75f3 100644 --- a/util/async.c +++ b/util/async.c @@ -508,10 +508,12 @@ void aio_context_unref(AioContext *ctx) void aio_context_acquire(AioContext *ctx) { + printf("### %s %p\n", __func__, ctx); qemu_rec_mutex_lock(&ctx->lock); } void aio_context_release(AioContext *ctx) { + printf("### %s %p\n", __func__, ctx); qemu_rec_mutex_unlock(&ctx->lock); } *** Bug 1658974 has been marked as a duplicate of this bug. *** Possible upstream patch: Subject: [PATCH 0/6] Acquire the AioContext during _realize() Message-Id: <cover.1547132561.git.berto> https://lists.nongnu.org/archive/html/qemu-devel/2019-01/msg01967.html The patch also fixes bug 1662508 for me. The two bugs are definitely related, but I'm not yet sure they're actually duplicates. Update: 1. Reproduced with kernel-4.18.0-60.el8.x86_64 + qemu-kvm-3.1.0-5.module+el8+2708+fbd828c6.x86_64; 2. Verified with kernel-4.18.0-60.el8.x86_64 + qemu-kvm-3.1.0-7.el8bz1656276.armbru1.x86_64 1) Boot guest with the following command line: /usr/libexec/qemu-kvm -M q35 \ -S \ -cpu SandyBridge \ -enable-kvm \ -m 4G \ -smp 4 \ -object iothread,id=iothread0 \ -rtc base=utc,clock=host,driftfix=slew \ -device pcie-root-port,id=pcie.0-root-port-2,slot=2,chassis=2,addr=0x2,bus=pcie.0 \ -device pcie-root-port,id=pcie.0-root-port-3,slot=3,chassis=3,addr=0x3,bus=pcie.0 \ ***-device virtio-scsi-pci,id=scsi0,iothread=iothread0,bus=pcie.0-root-port-2,addr=0x0 \ *** -blockdev driver=file,cache.direct=on,cache.no-flush=off,filename=/home/rhel-7.6.z.qcow2,node-name=win_disk \ -blockdev driver=qcow2,node-name=drive_win,file=win_disk \ -device scsi-hd,drive=drive_win,id=win1,write-cache=on \ ***-blockdev driver=file,cache.direct=on,cache.no-flush=off,filename=/home/data.qcow2,node-name=data_disk1 \ -blockdev driver=qcow2,node-name=drive_stg1,file=data_disk1 \ -device scsi-hd,drive=drive_stg1,id=data1,write-cache=on \ *** -device virtio-net-pci,mac=6c:ae:8b:20:80:59,id=netdev1,vectors=4,netdev=net1,bus=pcie.0-root-port-3 -netdev tap,id=net1,vhost=on \ -qmp tcp:0:4446,server,nowait \ -vga qxl \ -vnc :4 \ -monitor stdio \ -boot menu=on 2) Unplug the data disk: {"execute":"device_del","arguments":{"id":"data1"}} {"timestamp": {"seconds": 1548736583, "microseconds": 557905}, "event": "DEVICE_DELETED", "data": {"device": "data1", "path": "/machine/peripheral/data1"}} {"return": {}} 3. Hotplug the deleted disk successfully: { 'execute':'device_add','arguments':{'driver':'scsi-hd','drive':'drive_stg1','id':'data1'}} {"return": {}} 4. Run IO test on the data disk, it worked normally: # lsblk sdb 8:16 0 10G disk # dd if=/dev/zero of=/dev/sdb bs=1M count=1000 oflag=direct P.S. The following error info existed in guest dmesg log after step3: device-mapper: table: 253:2: multipath: error getting device Fix included in qemu-kvm-3.1.0-13.module+el8+2783+15cec5ae Update: 1. Reproduced with kernel-4.18.0-60.el8.x86_64 + qemu-kvm-3.1.0-10.module+el8+2732+3228f155.x86_64 2. Verified with kernel-4.18.0-60.el8.x86_64 + qemu-kvm-3.1.0-13.module+el8+2783+15cec5ae 1) Boot guest with the following command line: /usr/libexec/qemu-kvm -M q35 \ -S \ -cpu SandyBridge \ -enable-kvm \ -m 4G \ -smp 4 \ -object iothread,id=iothread0 \ -rtc base=utc,clock=host,driftfix=slew \ -device pcie-root-port,id=pcie.0-root-port-2,slot=2,chassis=2,addr=0x2,bus=pcie.0 \ -device pcie-root-port,id=pcie.0-root-port-3,slot=3,chassis=3,addr=0x3,bus=pcie.0 \ ***-device virtio-scsi-pci,id=scsi0,iothread=iothread0,bus=pcie.0-root-port-2,addr=0x0 \ *** -blockdev driver=file,cache.direct=on,cache.no-flush=off,filename=/home/rhel-7.6.z.qcow2,node-name=win_disk \ -blockdev driver=qcow2,node-name=drive_win,file=win_disk \ -device scsi-hd,drive=drive_win,id=win1,write-cache=on \ ***-blockdev driver=file,cache.direct=on,cache.no-flush=off,filename=/home/data.qcow2,node-name=data_disk1 \ -blockdev driver=qcow2,node-name=drive_stg1,file=data_disk1 \ -device scsi-hd,drive=drive_stg1,id=data1,write-cache=on \ *** -device virtio-net-pci,mac=6c:ae:8b:20:80:59,id=netdev1,vectors=4,netdev=net1,bus=pcie.0-root-port-3 -netdev tap,id=net1,vhost=on \ -qmp tcp:0:4446,server,nowait \ -vga qxl \ -vnc :4 \ -monitor stdio \ -boot menu=on 2) Unplug the data disk: {"execute":"device_del","arguments":{"id":"data1"}} {"timestamp": {"seconds": 1548736583, "microseconds": 557905}, "event": "DEVICE_DELETED", "data": {"device": "data1", "path": "/machine/peripheral/data1"}} {"return": {}} 3) Hotplug the deleted disk successfully: { 'execute':'device_add','arguments':{'driver':'scsi-hd','drive':'drive_stg1','id':'data1'}} {"return": {}} 4) Run IO test on the data disk, it worked normally: # lsblk sdb 8:16 0 10G disk # dd if=/dev/zero of=/dev/sdb bs=1M count=1000 oflag=direct 1000+0 records in 1000+0 records out 1048576000 bytes (1.0 GB) copied, 26.2287 s, 40.0 MB/s P.S. The following dmesg log could be found when execute step 2&3, no fail/error happened # dmesg -w [ 347.311563] sd 6:0:1:0: [sdb] Synchronizing SCSI cache [ 347.311804] sd 6:0:1:0: [sdb] Synchronize Cache(10) failed: Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [ 362.104033] scsi 6:0:1:0: Direct-Access QEMU QEMU HARDDISK 2.5+ PQ: 0 ANSI: 5 [ 362.109293] sd 6:0:1:0: Attached scsi generic sg1 type 0 [ 362.109569] sd 6:0:1:0: [sdb] 20971520 512-byte logical blocks: (10.7 GB/10.0 GiB) [ 362.109788] sd 6:0:1:0: [sdb] Write Protect is off [ 362.109791] sd 6:0:1:0: [sdb] Mode Sense: 63 00 00 08 [ 362.109828] sd 6:0:1:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 362.113706] sd 6:0:1:0: [sdb] Attached SCSI disk Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:1293 |