Bug 1114889 - drive-mirror cause qemu-kvm process segfaults
Summary: drive-mirror cause qemu-kvm process segfaults
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm-rhev
Version: 7.0
Hardware: x86_64
OS: All
high
high
Target Milestone: rc
: ---
Assignee: Kevin Wolf
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-07-01 08:26 UTC by FuXiangChun
Modified: 2015-03-05 09:47 UTC (History)
8 users (show)

Fixed In Version: qemu 2.1
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-03-05 09:47:37 UTC


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2015:0624 normal SHIPPED_LIVE Important: qemu-kvm-rhev security, bug fix, and enhancement update 2015-03-05 14:37:36 UTC

Description FuXiangChun 2014-07-01 08:26:35 UTC
Description of problem:
Boot qemu-kvm process with snapshot file, then use drive-mirror copy snapshot images to file via qmp command, qemu-kvm segfaults.

Version-Release number of selected component (if applicable):
qemu-kvm-rhev-1.5.3-60.el7ev.x86_64
3.10.0-123.4.2.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1. qemu-img create -f qcow2 base.img 10M

2. qemu-img create -f qcow2 -b base.img -o backing_fmt=raw snap1.img

3 qemu-img create -f qcow2 -b snap1.img -o backing_fmt=qcow2 snap2.img

4.# truncate --size 20480000 copy.img

5.qemu-img info copy.img 
image: copy.img
file format: raw
virtual size: 20M (20480000 bytes)
disk size: 0


6./usr/libexec/qemu-kvm  \
-machine accel=kvm -name testvm1 -machine pc-i440fx-rhel7.0.0,accel=kvm,usb=off -m 4096  -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1  -uuid 5a74eeb4-09c5-4fc2-869d-0e04c13f9db0 -no-user-config  -nodefaults -chardev socket,id=charmonitor,path=testvm1.monitor,server,nowait \
 -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -no-acpi \
-boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 \
-drive file=snap2.img,if=none,id=drive-virtio-disk0,format=qcow2 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x3,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 \
-vnc :0 -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 -msg timestamp=on -monitor stdio -qmp tcp:0:4444,server,nowait 

6.qmp command
6.1 {"execute":"qmp_capabilities"}
{"return": {}}

6.2 {"execute":"drive-mirror","arguments":{"device":"drive-virtio-disk0","target":"copy.img","format":"raw", "mode":"existing","sync":"full"}}
{"return": {}}


Actual results:
(gdb) bt
#0  0x00007ffff2dac82b in __memcpy_ssse3_back () from /lib64/libc.so.6
#1  0x0000555555635ed9 in memcpy (__len=<optimized out>, __src=<optimized out>, __dest=<optimized out>)
    at /usr/include/bits/string3.h:51
#2  handle_aiocb_rw (aiocb=0x5555564f24c0) at block/raw-posix.c:746
#3  0x0000555555636975 in aio_worker (arg=0x5555564f24c0) at block/raw-posix.c:912
#4  0x000055555572aa7b in worker_thread (opaque=0x5555564f1ee0) at thread-pool.c:109
#5  0x00007ffff604ddf3 in start_thread () from /lib64/libpthread.so.0
#6  0x00007ffff2d593dd in clone () from /lib64/libc.so.6


Expected results:
copy job should work just fine

Additional info:

Comment 3 Eric Blake 2014-08-07 19:57:38 UTC
I've also tripped on a variant of this bug while trying to port active block commit to RHEL 7.0.z. My reproducer, using qemu-kvm-rhev-1.5.3-60.el7_0_0.5.x86_64

#!/bin/sh
cd /tmp
rm -f base.img snap1.img snap2.img copy.img
virsh destroy testvm1 2>/dev/null
# base.img <- snap1.img <- snap2.img, with qcow2 data in base.img
# but explicitly treating it as raw
qemu-img create -f qcow2 -o compat=0.10 base.img 10M
qemu-img create -f qcow2 -o compat=0.10 -b base.img \
  -o backing_fmt=raw snap1.img
qemu-img create -f qcow2 -o compat=0.10 -b snap1.img \
  -o backing_fmt=qcow2 snap2.img

virsh create /dev/stdin <<EOF
<domain type='kvm'>
 <name>testvm1</name>
 <memory unit='MiB'>256</memory>
 <vcpu>1</vcpu>
 <os>
   <type arch='x86_64'>hvm</type>
 </os>
 <devices>
   <disk type='file' device='disk'>
     <driver name='qemu' type='qcow2'/>
     <source file='$PWD/snap2.img'/>
     <target dev='vda' bus='virtio'/>
   </disk>
   <graphics type='vnc'/>
 </devices>
</domain>
EOF
virsh blockcopy testvm1 vda --raw /tmp/copy.img

Comment 4 Eric Blake 2014-08-07 20:00:27 UTC
I think this upstream qemu patch is relevant:

commit 5a0f6fd5c84573387056e0464a7fc0c6fb70b2dc
Author: Kevin Wolf <kwolf@redhat.com>
Date:   Tue Jul 1 16:52:21 2014 +0200

    mirror: Fix qiov size for short requests
    
    When mirroring an image of a size that is not a multiple of the
    mirror job granularity, the last request would have the right nb_sectors
    argument, but a qiov that is rounded up to the next multiple of the
    granularity. Don't do this.
    
    This fixes a segfault that is caused by raw-posix being confused by this
    and allocating a buffer with request length, but operating on it with
    qiov length.
    
    [s/Driver/Drive/ in qemu-iotests 041 as suggested by Eric
    --Stefan]
    
    Reported-by: Eric Blake <eblake@redhat.com>
    Signed-off-by: Kevin Wolf <kwolf@redhat.com>
    Tested-by: Eric Blake <eblake@redhat.com>
    Reviewed-by: Eric Blake <eblake@redhat.com>
    Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>

Comment 5 Kevin Wolf 2014-09-09 12:47:24 UTC
I can't reproduce this on current qemu-kvm-rhev. The commit that Eric mentions
was in the qemu 2.1 release, so the rebase picked it up.  The fix for bug
1132806 backported this commit to RHEL 7.1/7.0.z, so that should be covered as
well.

The worrying part is that even with that commit reverted, I still can't
reproduce it. I'll move this to ON_QA anyway so that it gets retested with a
current build.

Comment 9 Chao Yang 2014-11-23 13:38:29 UTC
Reproduced on qemu-kvm-rhev-1.5.3-60.el7ev.x86_64 with same steps as reporter.

Actual Result:
After below command:
{"execute":"drive-mirror","arguments":{"device":"drive-virtio-disk0","target":"copy.img","format":"raw", "mode":"existing","sync":"full"}}
{"return": {}}{"return": {}}

Qemu-kvm got crashed on:
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f2ee556f700 (LWP 24279)]
0x00007f2eecdff45b in __memcpy_ssse3_back () from /lib64/libc.so.6
(gdb) bt
#0  0x00007f2eecdff45b in __memcpy_ssse3_back () from /lib64/libc.so.6
#1  0x00007f2ef2143ed9 in memcpy (__len=<optimized out>, __src=<optimized out>, __dest=<optimized out>)
    at /usr/include/bits/string3.h:51
#2  handle_aiocb_rw (aiocb=0x7f2ef4c92cc0) at block/raw-posix.c:746
#3  0x00007f2ef2144975 in aio_worker (arg=0x7f2ef4c92cc0) at block/raw-posix.c:912
#4  0x00007f2ef2238a7b in worker_thread (opaque=0x7f2ef4c92430) at thread-pool.c:109
#5  0x00007f2ef00cddf5 in start_thread () from /lib64/libpthread.so.0
#6  0x00007f2eecdac01d in clone () from /lib64/libc.so.6
(gdb) 

CLI:
/usr/libexec/qemu-kvm -M pc-i440fx-rhel7.0.0 -drive file=snap2.img,if=none,id=drive-virtio-disk0,format=qcow2 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x3,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -monitor stdio -qmp tcp:0:4444,server,nowait -nodefaults


Verified passed on qemu-kvm-rhev-2.1.2-8.el7.x86_64. After drive-mirror, guest ran well.

{"execute":"drive-mirror","arguments":{"device":"drive-virtio-disk0","target":"copy.img","format":"raw", "mode":"existing","sync":"full"}}
{"return": {}}
{"timestamp": {"seconds": 1416749282, "microseconds": 57853}, "event": "BLOCK_JOB_READY", "data": {"device": "drive-virtio-disk0", "len": 197120, "offset": 197120, "speed": 0, "type": "mirror"}}

CLI is almost the same except I used -M pc-i440fx-rhel7.1.0 for fixed version test.

As per above, this issue has fixed correctly.

Comment 12 errata-xmlrpc 2015-03-05 09:47:37 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-0624.html


Note You need to log in before you can comment on or make changes to this bug.