Red Hat Bugzilla – Bug 1114889
drive-mirror cause qemu-kvm process segfaults
Last modified: 2015-03-05 04:47:37 EST
Description of problem: Boot qemu-kvm process with snapshot file, then use drive-mirror copy snapshot images to file via qmp command, qemu-kvm segfaults. Version-Release number of selected component (if applicable): qemu-kvm-rhev-1.5.3-60.el7ev.x86_64 3.10.0-123.4.2.el7.x86_64 How reproducible: 100% Steps to Reproduce: 1. qemu-img create -f qcow2 base.img 10M 2. qemu-img create -f qcow2 -b base.img -o backing_fmt=raw snap1.img 3 qemu-img create -f qcow2 -b snap1.img -o backing_fmt=qcow2 snap2.img 4.# truncate --size 20480000 copy.img 5.qemu-img info copy.img image: copy.img file format: raw virtual size: 20M (20480000 bytes) disk size: 0 6./usr/libexec/qemu-kvm \ -machine accel=kvm -name testvm1 -machine pc-i440fx-rhel7.0.0,accel=kvm,usb=off -m 4096 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid 5a74eeb4-09c5-4fc2-869d-0e04c13f9db0 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=testvm1.monitor,server,nowait \ -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -no-acpi \ -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 \ -drive file=snap2.img,if=none,id=drive-virtio-disk0,format=qcow2 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x3,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 \ -vnc :0 -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 -msg timestamp=on -monitor stdio -qmp tcp:0:4444,server,nowait 6.qmp command 6.1 {"execute":"qmp_capabilities"} {"return": {}} 6.2 {"execute":"drive-mirror","arguments":{"device":"drive-virtio-disk0","target":"copy.img","format":"raw", "mode":"existing","sync":"full"}} {"return": {}} Actual results: (gdb) bt #0 0x00007ffff2dac82b in __memcpy_ssse3_back () from /lib64/libc.so.6 #1 0x0000555555635ed9 in memcpy (__len=<optimized out>, __src=<optimized out>, __dest=<optimized out>) at /usr/include/bits/string3.h:51 #2 handle_aiocb_rw (aiocb=0x5555564f24c0) at block/raw-posix.c:746 #3 0x0000555555636975 in aio_worker (arg=0x5555564f24c0) at block/raw-posix.c:912 #4 0x000055555572aa7b in worker_thread (opaque=0x5555564f1ee0) at thread-pool.c:109 #5 0x00007ffff604ddf3 in start_thread () from /lib64/libpthread.so.0 #6 0x00007ffff2d593dd in clone () from /lib64/libc.so.6 Expected results: copy job should work just fine Additional info:
I've also tripped on a variant of this bug while trying to port active block commit to RHEL 7.0.z. My reproducer, using qemu-kvm-rhev-1.5.3-60.el7_0_0.5.x86_64 #!/bin/sh cd /tmp rm -f base.img snap1.img snap2.img copy.img virsh destroy testvm1 2>/dev/null # base.img <- snap1.img <- snap2.img, with qcow2 data in base.img # but explicitly treating it as raw qemu-img create -f qcow2 -o compat=0.10 base.img 10M qemu-img create -f qcow2 -o compat=0.10 -b base.img \ -o backing_fmt=raw snap1.img qemu-img create -f qcow2 -o compat=0.10 -b snap1.img \ -o backing_fmt=qcow2 snap2.img virsh create /dev/stdin <<EOF <domain type='kvm'> <name>testvm1</name> <memory unit='MiB'>256</memory> <vcpu>1</vcpu> <os> <type arch='x86_64'>hvm</type> </os> <devices> <disk type='file' device='disk'> <driver name='qemu' type='qcow2'/> <source file='$PWD/snap2.img'/> <target dev='vda' bus='virtio'/> </disk> <graphics type='vnc'/> </devices> </domain> EOF virsh blockcopy testvm1 vda --raw /tmp/copy.img
I think this upstream qemu patch is relevant: commit 5a0f6fd5c84573387056e0464a7fc0c6fb70b2dc Author: Kevin Wolf <kwolf@redhat.com> Date: Tue Jul 1 16:52:21 2014 +0200 mirror: Fix qiov size for short requests When mirroring an image of a size that is not a multiple of the mirror job granularity, the last request would have the right nb_sectors argument, but a qiov that is rounded up to the next multiple of the granularity. Don't do this. This fixes a segfault that is caused by raw-posix being confused by this and allocating a buffer with request length, but operating on it with qiov length. [s/Driver/Drive/ in qemu-iotests 041 as suggested by Eric --Stefan] Reported-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Tested-by: Eric Blake <eblake@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
I can't reproduce this on current qemu-kvm-rhev. The commit that Eric mentions was in the qemu 2.1 release, so the rebase picked it up. The fix for bug 1132806 backported this commit to RHEL 7.1/7.0.z, so that should be covered as well. The worrying part is that even with that commit reverted, I still can't reproduce it. I'll move this to ON_QA anyway so that it gets retested with a current build.
Reproduced on qemu-kvm-rhev-1.5.3-60.el7ev.x86_64 with same steps as reporter. Actual Result: After below command: {"execute":"drive-mirror","arguments":{"device":"drive-virtio-disk0","target":"copy.img","format":"raw", "mode":"existing","sync":"full"}} {"return": {}}{"return": {}} Qemu-kvm got crashed on: Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7f2ee556f700 (LWP 24279)] 0x00007f2eecdff45b in __memcpy_ssse3_back () from /lib64/libc.so.6 (gdb) bt #0 0x00007f2eecdff45b in __memcpy_ssse3_back () from /lib64/libc.so.6 #1 0x00007f2ef2143ed9 in memcpy (__len=<optimized out>, __src=<optimized out>, __dest=<optimized out>) at /usr/include/bits/string3.h:51 #2 handle_aiocb_rw (aiocb=0x7f2ef4c92cc0) at block/raw-posix.c:746 #3 0x00007f2ef2144975 in aio_worker (arg=0x7f2ef4c92cc0) at block/raw-posix.c:912 #4 0x00007f2ef2238a7b in worker_thread (opaque=0x7f2ef4c92430) at thread-pool.c:109 #5 0x00007f2ef00cddf5 in start_thread () from /lib64/libpthread.so.0 #6 0x00007f2eecdac01d in clone () from /lib64/libc.so.6 (gdb) CLI: /usr/libexec/qemu-kvm -M pc-i440fx-rhel7.0.0 -drive file=snap2.img,if=none,id=drive-virtio-disk0,format=qcow2 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x3,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -monitor stdio -qmp tcp:0:4444,server,nowait -nodefaults Verified passed on qemu-kvm-rhev-2.1.2-8.el7.x86_64. After drive-mirror, guest ran well. {"execute":"drive-mirror","arguments":{"device":"drive-virtio-disk0","target":"copy.img","format":"raw", "mode":"existing","sync":"full"}} {"return": {}} {"timestamp": {"seconds": 1416749282, "microseconds": 57853}, "event": "BLOCK_JOB_READY", "data": {"device": "drive-virtio-disk0", "len": 197120, "offset": 197120, "speed": 0, "type": "mirror"}} CLI is almost the same except I used -M pc-i440fx-rhel7.1.0 for fixed version test. As per above, this issue has fixed correctly.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2015-0624.html