Bug 1631227
Summary: | Qemu Core dump when quit vm that's in status "paused(io-error)" with data plane enabled | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | aihua liang <aliang> | ||||
Component: | qemu-kvm-rhev | Assignee: | Kevin Wolf <kwolf> | ||||
Status: | CLOSED ERRATA | QA Contact: | yujie ma <yujma> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | high | ||||||
Version: | 7.6 | CC: | aliang, chayang, coli, jomurphy, juzhang, lijin, ngu, phou, qzhang, timao, virt-maint, xuwei, yhong | ||||
Target Milestone: | rc | Keywords: | Regression | ||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | qemu-kvm-rhev-2.12.0-28.el7 | Doc Type: | If docs needed, set a value | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | |||||||
: | 1716347 (view as bug list) | Environment: | |||||
Last Closed: | 2019-08-22 09:18:53 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1649160, 1651787, 1716347 | ||||||
Attachments: |
|
Description
aihua liang
2018-09-20 09:12:10 UTC
Looking at the list of introduced in 12.12.0-7, my initial suspicious would be this commit: commit d53f85c0fbaf3b7936e7a4b0f019be986a39fe7a Author: Kevin Wolf <kwolf> Date: Mon Jul 2 15:40:07 2018 +0200 qcow2: Free allocated clusters on write error Here's the full list of commits: $ git shortlog -n qemu-kvm-rhev-2.12.0-6.el7..qemu-kvm-rhev-2.12.0-7.el7 Dr. David Alan Gilbert (18): migration: stop compressing page in migration thread migration: stop compression to allocate and free memory frequently migration: stop decompression to allocate and free memory frequently migration: detect compression and decompression errors migration: introduce control_save_page() migration: move some code to ram_save_host_page migration: move calling control_save_page to the common place migration: move calling save_zero_page to the common place migration: introduce save_normal_page() migration: remove ram_save_compressed_page() migration/block-dirty-bitmap: fix memory leak in dirty_bitmap_load_bits migration: fix saving normal page even if it's been compressed migration: update index field when delete or qsort RDMALocalBlock Migration+TLS: Fix crash due to double cleanup migration: introduce decompress-error-check migration: Don't activate block devices if using -S migration: not wait RDMA_CM_EVENT_DISCONNECTED event after rdma_disconnect migration/block-dirty-bitmap: fix dirty_bitmap_load Fam Zheng (13): block: Introduce API for copy offloading raw: Check byte range uniformly raw: Implement copy offloading qcow2: Implement copy offloading file-posix: Implement bdrv_co_copy_range iscsi: Query and save device designator when opening iscsi: Create and use iscsi_co_wait_for_task iscsi: Implement copy offloading block-backend: Add blk_co_copy_range qemu-img: Convert with copy offloading qcow2: Fix src_offset in copy offloading iscsi: Don't blindly use designator length in response for memcpy file-posix: Fix EINTR handling plai (10): vhost-user: add Net prefix to internal state structure virtio: support setting memory region based host notifier vhost-user: support receiving file descriptors in slave_read osdep: add wait.h compat macros vhost-user-bridge: support host notifier vhost: allow backends to filter memory sections vhost-user: allow slave to send fds via slave channel vhost-user: introduce shared vhost-user state vhost-user: support registering external host notifiers libvhost-user: support host notifier Eduardo Habkost (4): i386: Define the Virt SSBD MSR and handling of it (CVE-2018-3639) i386: define the AMD 'virt-ssbd' CPUID feature bit (CVE-2018-3639) pc: Add rhel7.6.0 machine-types qemu-options: Add missing newline to -accel help text Kevin Wolf (4): usb-storage: Add rerror/werror properties qemu-iotests: Update 026.out.nocache reference output qcow2: Free allocated clusters on write error qemu-iotests: Test qcow2 not leaking clusters on write error Max Reitz (3): block/file-posix: Pass FD to locking helpers block/file-posix: File locking during creation iotests: Add creation test to 153 Gerd Hoffmann (2): Add qemu-keymap to qemu-kvm-tools usb-host: skip open on pending postload bh Alex Williamson (1): vfio/pci: Default display option to "off" Cornelia Huck (1): s390x/cpumodel: default enable bpb and ppa15 for z196 and later Igor Mammedov (1): numa: clarify error message when node index is out of range in -numa dist, ... Miroslav Rezanina (1): Update to qemu-kvm-ma-2.12.0-7.el7 / qemu-kvm-rhev-2.12.0-7.el7 Posted a patch upstream that should fix this: https://lists.gnu.org/archive/html/qemu-block/2019-04/msg00490.html Fix included in qemu-kvm-rhev-2.12.0-28.el7 Reproduced this bug as below: Tested with: kernel-3.10.0-1048.el7.x86_64 qemu-kvm-rhev-2.12.0-16.el7 Steps: 1. Mount the gluster volume to local host, and full write it; # mount.glusterfs dhcp-8-206.nay.redhat.com:/vol1 /home/gluster/ # dd if=/dev/zero of=/home/gluster/test.bin bs=1M oflag=direct dd: error writing ‘/home/gluster/test.bin’: No space left on device 21350+0 records in 21349+0 records out 22386311168 bytes (22 GB) copied, 2508.5 s, 8.9 MB/s 2. Boot the vm with the image stored on the gluster volume; /usr/libexec/qemu-kvm \ -name 'rhel7.7' \ -machine q35 \ -nodefaults \ -vga qxl \ -object iothread,id=iothread0 \ -rtc base=utc,clock=host,driftfix=slew \ -device pcie-root-port,id=pcie.0-root-port-2,slot=2,chassis=2,addr=0x2 \ -device virtio-scsi-pci,id=scsi0,iothread=iothread0,bus=pcie.0-root-port-2,addr=0x0 \ -drive if=none,cache=none,format=qcow2,id=drive_image1,aio=native,file=gluster://gluster-virt-qe-01.lab.eng.pek2.redhat.com/vol1/rhel77-64-virtio.qcow2 \ -device pcie-root-port,id=pcie.0-root-port-3,slot=3,chassis=3,addr=0x3,bus=pcie.0 \ -device virtio-blk-pci,id=image2,drive=drive_image1,write-cache=on,iothread=iothread0,bus=pcie.0-root-port-3,bootindex=0 \ -blockdev driver=raw,cache.direct=off,cache.no-flush=on,file.filename=/home/IOtest/data.qcow2,node-name=data_disk1,file.driver=file \ -device scsi-hd,drive=data_disk1,id=data1,bootindex=1 \ -vnc :0 \ -monitor stdio \ -m 4096 \ -smp 8 \ -device virtio-net-pci,mac=9a:b5:b6:b1:b2:b3,id=idMmq1jH,vectors=4,netdev=idxgXAlm,bus=pcie.0,addr=0x9 \ -netdev tap,id=idxgXAlm \ -qmp tcp:localhost:5902,server,nowait \ -device nec-usb-xhci,id=usb1,bus=pcie.0,addr=0x5 -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \ 3. Start some apps on guest, and check vm status by qmp monitor; # telnet localhost 5902 {"execute":"qmp_capabilities"} {"return": {}} {"timestamp": {"seconds": 1558420428, "microseconds": 418941}, "event": "BLOCK_IO_ERROR", "data": {"device": "drive_image1", "nospace": true, "__com.redhat_reason": "enospc", "node-name": "#block247", "reason": "No space left on device", "operation": "read", "action": "report"}} {"timestamp": {"seconds": 1558420429, "microseconds": 696417}, "event": "BLOCK_IO_ERROR", "data": {"device": "drive_image1", "nospace": true, "__com.redhat_reason": "enospc", "node-name": "#block247", "reason": "No space left on device", "operation": "write", "action": "stop"}} {"timestamp": {"seconds": 1558420429, "microseconds": 697998}, "event": "STOP"} 4. After we get "STOP" via qmp monitor, check vm status in hmp: (qemu) info status VM status: paused (io-error) 5. Quit vm; (qemu)quit Actual results: After step 6, qemu core dump with info: (qemu) quit qemu: qemu_mutex_unlock_impl: Operation not permitted test.gluster.sh: line 23: 21985 Aborted (core dumped) /usr/libexec/qemu-kvm -name 'rhel7.7' -machine q35 -nodefaults -vga qxl -object iothread,id=iothread0 -rtc base=utc,clock=host,driftfix=slew -device pcie-root-port,id=pcie.0-root-port-2,slot=2,chassis=2,addr=0x2 -device virtio-scsi-pci,id=scsi0,iothread=iothread0,bus=pcie.0-root-port-2,addr=0x0 -drive if=none,cache=none,format=qcow2,id=drive_image1,aio=native,file=gluster://gluster-virt-qe-01.lab.eng.pek2.redhat.com/vol1/rhel77-64-virtio.qcow2 -device pcie-root-port,id=pcie.0-root-port-3,slot=3,chassis=3,addr=0x3,bus=pcie.0 -device virtio-blk-pci,id=image2,drive=drive_image1,write-cache=on,iothread=iothread0,bus=pcie.0-root-port-3,bootindex=0 -blockdev driver=raw,cache.direct=off,cache.no-flush=on,file.filename=/home/IOtest/data.qcow2,node-name=data_disk1,file.driver=file -device scsi-hd,drive=data_disk1,id=data1,bootindex=1 -vnc :0 -monitor stdio -m 4096 -smp 8 -device virtio-net-pci,mac=9a:b5:b6:b1:b2:b3,id=idMmq1jH,vectors=4,netdev=idxgXAlm,bus=pcie.0,addr=0x9 -netdev tap,id=idxgXAlm -qmp tcp:localhost:5902,server,nowait -device nec-usb-xhci,id=usb1,bus=pcie.0,addr=0x5 -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 Check the coredump file with gdb, the following info could be get: (gdb) bt #0 0x00007f6aba407377 in raise () at /lib64/libc.so.6 #1 0x00007f6aba408a68 in abort () at /lib64/libc.so.6 #2 0x0000561d26dbde0f in error_exit (err=<optimized out>, msg=msg@entry=0x561d272b5500 <__func__.18621> "qemu_mutex_unlock_impl") at util/qemu-thread-posix.c:36 #3 0x0000561d271212df in qemu_mutex_unlock_impl (mutex=mutex@entry=0x561d29741960, file=file@entry=0x561d272b4adf "util/async.c", line=line@entry=507) at util/qemu-thread-posix.c:97 #4 0x0000561d2711cb05 in aio_context_release (ctx=ctx@entry=0x561d29741900) at util/async.c:507 #5 0x0000561d27098d18 in bdrv_flush (bs=<optimized out>) at block/io.c:2669 #6 0x0000561d27078483 in qcow2_cache_flush (bs=bs@entry=0x561d2981e800, c=<optimized out>) at block/qcow2-cache.c:262 #7 0x0000561d2706996c in qcow2_inactivate (bs=bs@entry=0x561d2981e800) at block/qcow2.c:2124 #8 0x0000561d27069a3f in qcow2_close (bs=0x561d2981e800) at block/qcow2.c:2153 #9 0x0000561d270491c2 in bdrv_unref (bs=0x561d2981e800) at block.c:3358 #10 0x0000561d270491c2 in bdrv_unref (bs=0x561d2981e800) at block.c:3542 #11 0x0000561d270491c2 in bdrv_unref (bs=0x561d2981e800) at block.c:4598 #12 0x0000561d2708adf1 in blk_remove_bs (blk=blk@entry=0x561d29830000) at block/block-backend.c:785 #13 0x0000561d2708ae4b in blk_remove_all_bs () at block/block-backend.c:483 #14 0x0000561d2704675f in bdrv_close_all () at block.c:3412 #15 0x0000561d26dc18db in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4776 Verified this bug as below: Tested with: kernel-3.10.0-1040.el7.x86_64 qemu-kvm-rhev-2.12.0-29.el7 Steps: 1. Mount the gluster volume to local host, and full write it; # mount.glusterfs dhcp-8-206.nay.redhat.com:/vol1 /home/gluster/ # dd if=/dev/zero of=/home/gluster/test.bin bs=1M oflag=direct dd: error writing ‘/home/gluster/test.bin’: No space left on device 21350+0 records in 21349+0 records out 22386311168 bytes (22 GB) copied, 2508.5 s, 8.9 MB/s 2. Boot the vm with the image stored on the gluster volume; /usr/libexec/qemu-kvm \ -name 'rhel7.7' \ -machine q35 \ -nodefaults \ -vga qxl \ -object iothread,id=iothread0 \ -rtc base=utc,clock=host,driftfix=slew \ -device pcie-root-port,id=pcie.0-root-port-2,slot=2,chassis=2,addr=0x2 \ -device virtio-scsi-pci,id=scsi0,iothread=iothread0,bus=pcie.0-root-port-2,addr=0x0 \ -drive if=none,cache=none,format=qcow2,id=drive_image1,aio=native,file=gluster://gluster-virt-qe-01.lab.eng.pek2.redhat.com/vol1/rhel77-64-virtio.qcow2 \ -device pcie-root-port,id=pcie.0-root-port-3,slot=3,chassis=3,addr=0x3,bus=pcie.0 \ -device virtio-blk-pci,id=image2,drive=drive_image1,write-cache=on,iothread=iothread0,bus=pcie.0-root-port-3,bootindex=0 \ -blockdev driver=raw,cache.direct=off,cache.no-flush=on,file.filename=/home/IOtest/data.qcow2,node-name=data_disk1,file.driver=file \ -device scsi-hd,drive=data_disk1,id=data1,bootindex=1 \ -vnc :0 \ -monitor stdio \ -m 4096 \ -smp 8 \ -device virtio-net-pci,mac=9a:b5:b6:b1:b2:b3,id=idMmq1jH,vectors=4,netdev=idxgXAlm,bus=pcie.0,addr=0x9 \ -netdev tap,id=idxgXAlm \ -qmp tcp:localhost:5902,server,nowait \ -device nec-usb-xhci,id=usb1,bus=pcie.0,addr=0x5 -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \ 3. Start some apps on guest, and check vm status by qmp monitor; # telnet localhost 5902 {"execute":"qmp_capabilities"} {"return": {}} {"timestamp": {"seconds": 1558421625, "microseconds": 697088}, "event": "BLOCK_IO_ERROR", "data": {"device": "drive_image1", "nospace": true, "__com.redhat_reason": "enospc", "node-name": "#block212", "reason": "No space left on device", "operation": "read", "action": "report"}} {"timestamp": {"seconds": 1558421626, "microseconds": 74598}, "event": "BLOCK_IO_ERROR", "data": {"device": "drive_image1", "nospace": true, "__com.redhat_reason": "enospc", "node-name": "#block212", "reason": "No space left on device", "operation": "write", "action": "stop"}} {"timestamp": {"seconds": 1558421626, "microseconds": 76967}, "event": "STOP"} 4. After we get "STOP" via qmp monitor, check vm status in hmp: (qemu) info status VM status: paused (io-error) 5. Quit vm; (qemu)quit Actual result: No core dump and qemu can quit successfully although vm in "paused(io-error)" status. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2019:2553 |