Bug 1631227
| Summary: | Qemu Core dump when quit vm that's in status "paused(io-error)" with data plane enabled | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | aihua liang <aliang> | ||||
| Component: | qemu-kvm-rhev | Assignee: | Kevin Wolf <kwolf> | ||||
| Status: | CLOSED ERRATA | QA Contact: | yujie ma <yujma> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | high | ||||||
| Version: | 7.6 | CC: | aliang, chayang, coli, jomurphy, juzhang, lijin, ngu, phou, qzhang, timao, virt-maint, xuwei, yhong | ||||
| Target Milestone: | rc | Keywords: | Regression | ||||
| Target Release: | --- | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | qemu-kvm-rhev-2.12.0-28.el7 | Doc Type: | If docs needed, set a value | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | |||||||
| : | 1716347 (view as bug list) | Environment: | |||||
| Last Closed: | 2019-08-22 09:18:53 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | |||||||
| Bug Blocks: | 1649160, 1651787, 1716347 | ||||||
| Attachments: |
|
||||||
|
Description
aihua liang
2018-09-20 09:12:10 UTC
Looking at the list of introduced in 12.12.0-7, my initial suspicious would be this commit:
commit d53f85c0fbaf3b7936e7a4b0f019be986a39fe7a
Author: Kevin Wolf <kwolf>
Date: Mon Jul 2 15:40:07 2018 +0200
qcow2: Free allocated clusters on write error
Here's the full list of commits:
$ git shortlog -n qemu-kvm-rhev-2.12.0-6.el7..qemu-kvm-rhev-2.12.0-7.el7
Dr. David Alan Gilbert (18):
migration: stop compressing page in migration thread
migration: stop compression to allocate and free memory frequently
migration: stop decompression to allocate and free memory frequently
migration: detect compression and decompression errors
migration: introduce control_save_page()
migration: move some code to ram_save_host_page
migration: move calling control_save_page to the common place
migration: move calling save_zero_page to the common place
migration: introduce save_normal_page()
migration: remove ram_save_compressed_page()
migration/block-dirty-bitmap: fix memory leak in dirty_bitmap_load_bits
migration: fix saving normal page even if it's been compressed
migration: update index field when delete or qsort RDMALocalBlock
Migration+TLS: Fix crash due to double cleanup
migration: introduce decompress-error-check
migration: Don't activate block devices if using -S
migration: not wait RDMA_CM_EVENT_DISCONNECTED event after rdma_disconnect
migration/block-dirty-bitmap: fix dirty_bitmap_load
Fam Zheng (13):
block: Introduce API for copy offloading
raw: Check byte range uniformly
raw: Implement copy offloading
qcow2: Implement copy offloading
file-posix: Implement bdrv_co_copy_range
iscsi: Query and save device designator when opening
iscsi: Create and use iscsi_co_wait_for_task
iscsi: Implement copy offloading
block-backend: Add blk_co_copy_range
qemu-img: Convert with copy offloading
qcow2: Fix src_offset in copy offloading
iscsi: Don't blindly use designator length in response for memcpy
file-posix: Fix EINTR handling
plai (10):
vhost-user: add Net prefix to internal state structure
virtio: support setting memory region based host notifier
vhost-user: support receiving file descriptors in slave_read
osdep: add wait.h compat macros
vhost-user-bridge: support host notifier
vhost: allow backends to filter memory sections
vhost-user: allow slave to send fds via slave channel
vhost-user: introduce shared vhost-user state
vhost-user: support registering external host notifiers
libvhost-user: support host notifier
Eduardo Habkost (4):
i386: Define the Virt SSBD MSR and handling of it (CVE-2018-3639)
i386: define the AMD 'virt-ssbd' CPUID feature bit (CVE-2018-3639)
pc: Add rhel7.6.0 machine-types
qemu-options: Add missing newline to -accel help text
Kevin Wolf (4):
usb-storage: Add rerror/werror properties
qemu-iotests: Update 026.out.nocache reference output
qcow2: Free allocated clusters on write error
qemu-iotests: Test qcow2 not leaking clusters on write error
Max Reitz (3):
block/file-posix: Pass FD to locking helpers
block/file-posix: File locking during creation
iotests: Add creation test to 153
Gerd Hoffmann (2):
Add qemu-keymap to qemu-kvm-tools
usb-host: skip open on pending postload bh
Alex Williamson (1):
vfio/pci: Default display option to "off"
Cornelia Huck (1):
s390x/cpumodel: default enable bpb and ppa15 for z196 and later
Igor Mammedov (1):
numa: clarify error message when node index is out of range in -numa dist, ...
Miroslav Rezanina (1):
Update to qemu-kvm-ma-2.12.0-7.el7 / qemu-kvm-rhev-2.12.0-7.el7
Posted a patch upstream that should fix this: https://lists.gnu.org/archive/html/qemu-block/2019-04/msg00490.html Fix included in qemu-kvm-rhev-2.12.0-28.el7 Reproduced this bug as below:
Tested with:
kernel-3.10.0-1048.el7.x86_64
qemu-kvm-rhev-2.12.0-16.el7
Steps:
1. Mount the gluster volume to local host, and full write it;
# mount.glusterfs dhcp-8-206.nay.redhat.com:/vol1 /home/gluster/
# dd if=/dev/zero of=/home/gluster/test.bin bs=1M oflag=direct
dd: error writing ‘/home/gluster/test.bin’: No space left on device
21350+0 records in
21349+0 records out
22386311168 bytes (22 GB) copied, 2508.5 s, 8.9 MB/s
2. Boot the vm with the image stored on the gluster volume;
/usr/libexec/qemu-kvm \
-name 'rhel7.7' \
-machine q35 \
-nodefaults \
-vga qxl \
-object iothread,id=iothread0 \
-rtc base=utc,clock=host,driftfix=slew \
-device pcie-root-port,id=pcie.0-root-port-2,slot=2,chassis=2,addr=0x2 \
-device virtio-scsi-pci,id=scsi0,iothread=iothread0,bus=pcie.0-root-port-2,addr=0x0 \
-drive if=none,cache=none,format=qcow2,id=drive_image1,aio=native,file=gluster://gluster-virt-qe-01.lab.eng.pek2.redhat.com/vol1/rhel77-64-virtio.qcow2 \
-device pcie-root-port,id=pcie.0-root-port-3,slot=3,chassis=3,addr=0x3,bus=pcie.0 \
-device virtio-blk-pci,id=image2,drive=drive_image1,write-cache=on,iothread=iothread0,bus=pcie.0-root-port-3,bootindex=0 \
-blockdev driver=raw,cache.direct=off,cache.no-flush=on,file.filename=/home/IOtest/data.qcow2,node-name=data_disk1,file.driver=file \
-device scsi-hd,drive=data_disk1,id=data1,bootindex=1 \
-vnc :0 \
-monitor stdio \
-m 4096 \
-smp 8 \
-device virtio-net-pci,mac=9a:b5:b6:b1:b2:b3,id=idMmq1jH,vectors=4,netdev=idxgXAlm,bus=pcie.0,addr=0x9 \
-netdev tap,id=idxgXAlm \
-qmp tcp:localhost:5902,server,nowait \
-device nec-usb-xhci,id=usb1,bus=pcie.0,addr=0x5 -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
3. Start some apps on guest, and check vm status by qmp monitor;
# telnet localhost 5902
{"execute":"qmp_capabilities"}
{"return": {}}
{"timestamp": {"seconds": 1558420428, "microseconds": 418941}, "event": "BLOCK_IO_ERROR", "data": {"device": "drive_image1", "nospace": true, "__com.redhat_reason": "enospc", "node-name": "#block247", "reason": "No space left on device", "operation": "read", "action": "report"}}
{"timestamp": {"seconds": 1558420429, "microseconds": 696417}, "event": "BLOCK_IO_ERROR", "data": {"device": "drive_image1", "nospace": true, "__com.redhat_reason": "enospc", "node-name": "#block247", "reason": "No space left on device", "operation": "write", "action": "stop"}}
{"timestamp": {"seconds": 1558420429, "microseconds": 697998}, "event": "STOP"}
4. After we get "STOP" via qmp monitor, check vm status in hmp:
(qemu) info status
VM status: paused (io-error)
5. Quit vm;
(qemu)quit
Actual results:
After step 6, qemu core dump with info:
(qemu) quit
qemu: qemu_mutex_unlock_impl: Operation not permitted
test.gluster.sh: line 23: 21985 Aborted (core dumped) /usr/libexec/qemu-kvm -name 'rhel7.7' -machine q35 -nodefaults -vga qxl -object iothread,id=iothread0 -rtc base=utc,clock=host,driftfix=slew -device pcie-root-port,id=pcie.0-root-port-2,slot=2,chassis=2,addr=0x2 -device virtio-scsi-pci,id=scsi0,iothread=iothread0,bus=pcie.0-root-port-2,addr=0x0 -drive if=none,cache=none,format=qcow2,id=drive_image1,aio=native,file=gluster://gluster-virt-qe-01.lab.eng.pek2.redhat.com/vol1/rhel77-64-virtio.qcow2 -device pcie-root-port,id=pcie.0-root-port-3,slot=3,chassis=3,addr=0x3,bus=pcie.0 -device virtio-blk-pci,id=image2,drive=drive_image1,write-cache=on,iothread=iothread0,bus=pcie.0-root-port-3,bootindex=0 -blockdev driver=raw,cache.direct=off,cache.no-flush=on,file.filename=/home/IOtest/data.qcow2,node-name=data_disk1,file.driver=file -device scsi-hd,drive=data_disk1,id=data1,bootindex=1 -vnc :0 -monitor stdio -m 4096 -smp 8 -device virtio-net-pci,mac=9a:b5:b6:b1:b2:b3,id=idMmq1jH,vectors=4,netdev=idxgXAlm,bus=pcie.0,addr=0x9 -netdev tap,id=idxgXAlm -qmp tcp:localhost:5902,server,nowait -device nec-usb-xhci,id=usb1,bus=pcie.0,addr=0x5 -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1
Check the coredump file with gdb, the following info could be get:
(gdb) bt
#0 0x00007f6aba407377 in raise () at /lib64/libc.so.6
#1 0x00007f6aba408a68 in abort () at /lib64/libc.so.6
#2 0x0000561d26dbde0f in error_exit (err=<optimized out>, msg=msg@entry=0x561d272b5500 <__func__.18621> "qemu_mutex_unlock_impl") at util/qemu-thread-posix.c:36
#3 0x0000561d271212df in qemu_mutex_unlock_impl (mutex=mutex@entry=0x561d29741960, file=file@entry=0x561d272b4adf "util/async.c", line=line@entry=507) at util/qemu-thread-posix.c:97
#4 0x0000561d2711cb05 in aio_context_release (ctx=ctx@entry=0x561d29741900) at util/async.c:507
#5 0x0000561d27098d18 in bdrv_flush (bs=<optimized out>) at block/io.c:2669
#6 0x0000561d27078483 in qcow2_cache_flush (bs=bs@entry=0x561d2981e800, c=<optimized out>)
at block/qcow2-cache.c:262
#7 0x0000561d2706996c in qcow2_inactivate (bs=bs@entry=0x561d2981e800) at block/qcow2.c:2124
#8 0x0000561d27069a3f in qcow2_close (bs=0x561d2981e800) at block/qcow2.c:2153
#9 0x0000561d270491c2 in bdrv_unref (bs=0x561d2981e800) at block.c:3358
#10 0x0000561d270491c2 in bdrv_unref (bs=0x561d2981e800) at block.c:3542
#11 0x0000561d270491c2 in bdrv_unref (bs=0x561d2981e800) at block.c:4598
#12 0x0000561d2708adf1 in blk_remove_bs (blk=blk@entry=0x561d29830000)
at block/block-backend.c:785
#13 0x0000561d2708ae4b in blk_remove_all_bs () at block/block-backend.c:483
#14 0x0000561d2704675f in bdrv_close_all () at block.c:3412
#15 0x0000561d26dc18db in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>)
at vl.c:4776
Verified this bug as below:
Tested with:
kernel-3.10.0-1040.el7.x86_64
qemu-kvm-rhev-2.12.0-29.el7
Steps:
1. Mount the gluster volume to local host, and full write it;
# mount.glusterfs dhcp-8-206.nay.redhat.com:/vol1 /home/gluster/
# dd if=/dev/zero of=/home/gluster/test.bin bs=1M oflag=direct
dd: error writing ‘/home/gluster/test.bin’: No space left on device
21350+0 records in
21349+0 records out
22386311168 bytes (22 GB) copied, 2508.5 s, 8.9 MB/s
2. Boot the vm with the image stored on the gluster volume;
/usr/libexec/qemu-kvm \
-name 'rhel7.7' \
-machine q35 \
-nodefaults \
-vga qxl \
-object iothread,id=iothread0 \
-rtc base=utc,clock=host,driftfix=slew \
-device pcie-root-port,id=pcie.0-root-port-2,slot=2,chassis=2,addr=0x2 \
-device virtio-scsi-pci,id=scsi0,iothread=iothread0,bus=pcie.0-root-port-2,addr=0x0 \
-drive if=none,cache=none,format=qcow2,id=drive_image1,aio=native,file=gluster://gluster-virt-qe-01.lab.eng.pek2.redhat.com/vol1/rhel77-64-virtio.qcow2 \
-device pcie-root-port,id=pcie.0-root-port-3,slot=3,chassis=3,addr=0x3,bus=pcie.0 \
-device virtio-blk-pci,id=image2,drive=drive_image1,write-cache=on,iothread=iothread0,bus=pcie.0-root-port-3,bootindex=0 \
-blockdev driver=raw,cache.direct=off,cache.no-flush=on,file.filename=/home/IOtest/data.qcow2,node-name=data_disk1,file.driver=file \
-device scsi-hd,drive=data_disk1,id=data1,bootindex=1 \
-vnc :0 \
-monitor stdio \
-m 4096 \
-smp 8 \
-device virtio-net-pci,mac=9a:b5:b6:b1:b2:b3,id=idMmq1jH,vectors=4,netdev=idxgXAlm,bus=pcie.0,addr=0x9 \
-netdev tap,id=idxgXAlm \
-qmp tcp:localhost:5902,server,nowait \
-device nec-usb-xhci,id=usb1,bus=pcie.0,addr=0x5 -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
3. Start some apps on guest, and check vm status by qmp monitor;
# telnet localhost 5902
{"execute":"qmp_capabilities"}
{"return": {}}
{"timestamp": {"seconds": 1558421625, "microseconds": 697088}, "event": "BLOCK_IO_ERROR", "data": {"device": "drive_image1", "nospace": true, "__com.redhat_reason": "enospc", "node-name": "#block212", "reason": "No space left on device", "operation": "read", "action": "report"}}
{"timestamp": {"seconds": 1558421626, "microseconds": 74598}, "event": "BLOCK_IO_ERROR", "data": {"device": "drive_image1", "nospace": true, "__com.redhat_reason": "enospc", "node-name": "#block212", "reason": "No space left on device", "operation": "write", "action": "stop"}}
{"timestamp": {"seconds": 1558421626, "microseconds": 76967}, "event": "STOP"}
4. After we get "STOP" via qmp monitor, check vm status in hmp:
(qemu) info status
VM status: paused (io-error)
5. Quit vm;
(qemu)quit
Actual result:
No core dump and qemu can quit successfully although vm in "paused(io-error)" status.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2019:2553 |