Hide Forgot
+++ This bug was initially created as a clone of Bug #1233115 +++ Description of problem: drive-mirror fails if an invalid value of 'granularity' is used for a given image type after it starts copying the data. HDC contains a raw image. QMP conversation: (L-libvirt, Q-qemu) L:'{"execute":"drive-mirror","arguments":{"device":"drive-ide0-1-0","target":"/tmp/r7.s1","granularity":8192,"sync":"full","mode":"absolute-paths","format":"raw"},"id":"libvirt-52"}' Q:'{"return": {}, "id": "libvirt-52"}' Q: '{"timestamp": {"seconds": 1434721148, "microseconds": 853384}, "event": "BLOCK_JOB_ERROR", "data": {"device": "drive-ide0-1-0", "operation": "read", "action": "report"}}' L: '{"execute":"query-block-jobs","id":"libvirt-53"}' Q:{"timestamp": {"seconds": 1434721148, "microseconds": 859673}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "drive-ide0-1-0", "len": 406003712, "offset": 0, "speed": 0, "type": "mirror", "error": "Invalid argument"}} Q:{"return": [], "id": "libvirt-53"} How reproducible: Always Steps to Reproduce: See above Actual results: The drive-mirror command succeeds, the image file is created but left blank: Additionally two events are fired. # hexdump /tmp/img.raw 0000000 0000 0000 0000 0000 0000 0000 0000 0000 * 18331000 Expected results: The command itself fails, since it did not copy any data. Additional notes: - Tested on upstream qemu
I've played with block copy when using the --granularity option a bit more: With the following image: # file /var/lib/libvirt/images/jeos-20-64.qcow2 /var/lib/libvirt/images/jeos-20-64.qcow2: QEMU QCOW Image (v2), 10737418240 bytes # qemu-img info /var/lib/libvirt/images/jeos-20-64.qcow2 image: /var/lib/libvirt/images/jeos-20-64.qcow2 file format: qcow2 virtual size: 10G (10737418240 bytes) disk size: 3.2G cluster_size: 65536 Format specific information: compat: 0.10 refcount bits: 16 I've got the following behavior: $ virsh blockcopy --verbose --wait srcd-trans-qcow vda /tmp/target.qcow --granularity 8192 Block Copy: [ 25 %]error: Block Copy unexpectedly failed $ virsh blockcopy --verbose --wait srcd-trans-qcow vda /tmp/target.qcow --granularity 8192 Block Copy: [ 68 %]error: Block Copy unexpectedly failed $ virsh blockjob srcd-trans-qcow vda No current block job for vda $ virsh suspend srcd-trans-qcow Domain srcd-trans-qcow suspended $ virsh blockcopy --verbose --wait srcd-trans-qcow vda /tmp/target.qcow --granularity 8192 Block Copy: [100 %] Now in mirroring phase $ virsh blockjob srcd-trans-qcow vda Block Copy: [100 %] If the guest writes to the disk that is being drive-mirrored the job instantly fails. If the guest os is paused the block copy finishes just okay. It looks like that the unsupported persistent bitmap update gets detected at the point where it has to be written and thus the job fails midway.
(In reply to Peter Krempa from comment #2) All the above happens to me with upstream qemu-2.3.0. qemu-kvm-rhev-2.3.0-5.el7.x86_64 does not exhibit the above problem.
a VM with the following disk image as source: # qemu-img info /tmp/target1.qcow image: /tmp/target1.qcow file format: qcow2 virtual size: 10G (10737418240 bytes) disk size: 196K cluster_size: 65536 backing file: target.qcow (actual path: /tmp/target) backing file format: raw Format specific information: compat: 1.1 lazy refcounts: false refcount bits: 16 corrupt: false # qemu-img info /tmp/target image: /tmp/target file format: raw virtual size: 10G (10737418240 bytes) disk size: 3.2G The vm does not return from the following command: {"execute":"drive-mirror","arguments":{"device":"drive-virtio-disk0","target":"/tmp/asdf.qcow","granularity":8192,"sync":"full","mode":"absolute-paths","format":"qcow2"},"id":"libvirt-10"} and qemu-kvm keeps consuming 100% cpu and the guest OS is frozen. Backtrace after a while of eating 100% cpu (from upstream 2.3.0): Thread 4 (Thread 0x7f35f60d6700 (LWP 1164033)): #0 0x00007f3600881f89 in syscall () from /lib64/libc.so.6 #1 0x00007f36065d58c1 in qemu_event_wait () #2 0x00007f36065e3af6 in call_rcu_thread () #3 0x00007f36034a62c4 in start_thread () from /lib64/libpthread.so.0 #4 0x00007f36008862dd in clone () from /lib64/libc.so.6 Thread 3 (Thread 0x7f35f48cd700 (LWP 1164035)): #0 0x00007f36034ae05c in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x00007f36034a8815 in pthread_mutex_lock () from /lib64/libpthread.so.0 #2 0x00007f36065d54e9 in qemu_mutex_lock () #3 0x00007f36063729bc in qemu_mutex_lock_iothread () #4 0x00007f3606384324 in kvm_cpu_exec () #5 0x00007f360637157a in qemu_kvm_cpu_thread_fn () #6 0x00007f36034a62c4 in start_thread () from /lib64/libpthread.so.0 #7 0x00007f36008862dd in clone () from /lib64/libc.so.6 Thread 2 (Thread 0x7f35eefff700 (LWP 1164037)): #0 0x00007f360087d46d in poll () from /lib64/libc.so.6 #1 0x00007f360129f887 in red_worker_main () from /usr/lib64/libspice-server.so.1 #2 0x00007f36034a62c4 in start_thread () from /lib64/libpthread.so.0 #3 0x00007f36008862dd in clone () from /lib64/libc.so.6 Thread 1 (Thread 0x7f360689dac0 (LWP 1164032)): #0 0x00007f36034ae90d in lseek64 () from /lib64/libpthread.so.0 #1 0x00007f360659c358 in raw_co_get_block_status () #2 0x00007f360656458c in bdrv_co_get_block_status () #3 0x00007f3606564747 in bdrv_get_block_status_co_entry () #4 0x00007f3606564808 in bdrv_get_block_status () #5 0x00007f3606564662 in bdrv_co_get_block_status () #6 0x00007f3606564747 in bdrv_get_block_status_co_entry () #7 0x00007f3606564808 in bdrv_get_block_status () #8 0x00007f36065648d4 in bdrv_is_allocated_above () #9 0x00007f360659e8d5 in mirror_run () #10 0x00007f3606571efa in coroutine_trampoline () #11 0x00007f36007e0820 in ?? () from /lib64/libc.so.6 #12 0x00007ffe9d311e50 in ?? () #13 0x0000000000000000 in ?? () Backtrace from qemu-kvm-rhev-2.3.0-5.el7.x86_64 Thread 4 (Thread 0x7ff6df350700 (LWP 4301)): #0 syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38 #1 0x00007ff6ef00c872 in futex_wait (val=4294967295, ev=0x7ff6ef8d0984 <rcu_call_ready_event>) at util/qemu-thread-posix.c:301 #2 qemu_event_wait (ev=ev@entry=0x7ff6ef8d0984 <rcu_call_ready_event>) at util/qemu-thread-posix.c:399 #3 0x00007ff6ef01aae6 in call_rcu_thread (opaque=<optimized out>) at util/rcu.c:233 #4 0x00007ff6ed8dedf5 in start_thread (arg=0x7ff6df350700) at pthread_create.c:308 #5 0x00007ff6e77871ad in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 Thread 3 (Thread 0x7ff6dca5e700 (LWP 4323)): #0 __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135 #1 0x00007ff6ed8e0d32 in _L_lock_791 () from /lib64/libpthread.so.0 #2 0x00007ff6ed8e0c38 in __GI___pthread_mutex_lock (mutex=mutex@entry=0x7ff6ef4b3e40 <qemu_global_mutex>) at pthread_mutex_lock.c:64 #3 0x00007ff6ef00c499 in qemu_mutex_lock (mutex=mutex@entry=0x7ff6ef4b3e40 <qemu_global_mutex>) at util/qemu-thread-posix.c:73 #4 0x00007ff6eedcc1ac in qemu_mutex_lock_iothread () at /usr/src/debug/qemu-2.3.0/cpus.c:1123 #5 0x00007ff6eeddcf94 in kvm_cpu_exec (cpu=cpu@entry=0x7ff6f182f5a0) at /usr/src/debug/qemu-2.3.0/kvm-all.c:1811 #6 0x00007ff6eedcadfa in qemu_kvm_cpu_thread_fn (arg=0x7ff6f182f5a0) at /usr/src/debug/qemu-2.3.0/cpus.c:947 #7 0x00007ff6ed8dedf5 in start_thread (arg=0x7ff6dca5e700) at pthread_create.c:308 #8 0x00007ff6e77871ad in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 Thread 2 (Thread 0x7ff6d69ff700 (LWP 4349)): #0 0x00007ff6e777cb7d in poll () at ../sysdeps/unix/syscall-template.S:81 #1 0x00007ff6e86d1d37 in poll (__timeout=<optimized out>, __nfds=20, __fds=0x7ff6ac0008f8) at /usr/include/bits/poll2.h:46 #2 red_worker_main (arg=<optimized out>) at red_worker.c:12200 #3 0x00007ff6ed8dedf5 in start_thread (arg=0x7ff6d69ff700) at pthread_create.c:308 #4 0x00007ff6e77871ad in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 Thread 1 (Thread 0x7ff6eecb1c00 (LWP 4293)): #0 0x00007ff6ed8e58dd in lseek64 () at ../sysdeps/unix/syscall-template.S:81 #1 0x00007ff6eefd0444 in find_allocation (bs=<optimized out>, hole=<synthetic pointer>, data=<synthetic pointer>, start=2894340096) at block/raw-posix.c:1745 #2 raw_co_get_block_status (bs=<optimized out>, sector_num=<optimized out>, nb_sectors=16, pnum=0x7ff6f57cf574) at block/raw-posix.c:1839 #3 0x00007ff6eef966e9 in bdrv_co_get_block_status (bs=0x7ff6f182a200, sector_num=5653008, nb_sectors=16, pnum=0x7ff6f57cf574) at block.c:4109 #4 0x00007ff6eef968a7 in bdrv_get_block_status_co_entry (opaque=opaque@entry=0x7ff6f57cf450) at block.c:4170 #5 0x00007ff6eef96968 in bdrv_get_block_status (bs=0x7ff6f182a200, sector_num=sector_num@entry=5653008, nb_sectors=<optimized out>, pnum=pnum@entry=0x7ff6f57cf574) at block.c:4194 #6 0x00007ff6eef96792 in bdrv_co_get_block_status (bs=0x7ff6f1824e60, sector_num=5653008, nb_sectors=<optimized out>, pnum=0x7ff6f57cf574) at block.c:4117 #7 0x00007ff6eef968a7 in bdrv_get_block_status_co_entry (opaque=opaque@entry=0x7ff6f57cf500) at block.c:4170 #8 0x00007ff6eef96968 in bdrv_get_block_status (bs=bs@entry=0x7ff6f1824e60, sector_num=sector_num@entry=5653008, nb_sectors=nb_sectors@entry=16, pnum=pnum@entry=0x7ff6f57cf574) at block.c:4194 #9 0x00007ff6eef96a34 in bdrv_is_allocated (pnum=0x7ff6f57cf574, nb_sectors=16, sector_num=5653008, bs=0x7ff6f1824e60) at block.c:4210 #10 bdrv_is_allocated_above (top=top@entry=0x7ff6f16b51f0, base=base@entry=0x0, sector_num=sector_num@entry=5653008, nb_sectors=nb_sectors@entry=16, pnum=pnum@entry=0x7ff6f57cf640) at block.c:4240 #11 0x00007ff6eefd28ed in mirror_run (opaque=0x7ff6f2d58670) at block/mirror.c:440 #12 0x00007ff6eefa3cca in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at coroutine-ucontext.c:80 #13 0x00007ff6e76d80f0 in ?? () from /lib64/libc.so.6 #14 0x00007fffdb806b60 in ?? () #15 0x0000000000000000 in ?? () This issue happens both in the last upstream release and in the current RHEV build. I was not able to confirm it with current upstream since the git tree is not buildable at the moment.
Stefan: you fixed another issue[1] with the granularity error handling not long ago, so I'm assigning this one to you as well, hoping the fix is not too complex. 1. commit 3cbbe9fd1feaf3264f745fccb0bf5f62c583078f Author: Stefan Hajnoczi <stefanha> Date: Wed Aug 27 14:29:59 2014 +0100 blockdev: fix drive-mirror 'granularity' error message Name the 'granularity' parameter and give its expected value range. Previously the device name was mistakenly reported as the parameter name. Note that the error class is unchanged from ERROR_CLASS_GENERIC_ERROR.
The issue that I've described in comment 2 is reproducible (but only in approximately half the cases(and on a different host)) also with qemu-kvm-rhev-2.3.0-5.el7.x86_64 and the following image: # qemu-img convert -O qcow2 /home/pipo/jeos-20-64.qcow2 target1.qcow # qemu-img info target1.qcow image: target1.qcow file format: qcow2 virtual size: 10G (10737418240 bytes) disk size: 3.2G cluster_size: 65536 Format specific information: compat: 1.1 lazy refcounts: false refcount bits: 16 corrupt: false The log file reports: QEMU_MONITOR_RECV_EVENT: mon=0x7fd0e0000c00 event={" timestamp": {"seconds": 1435250286, "microseconds": 913844}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "drive-virtio-disk 0", "len": 3458793472, "offset": 938147840, "speed": 0, "type": "mirror", "error": "Invalid argument"}} and the block job vanishes.
(In reply to Peter Krempa from comment #4) > This issue happens both in the last upstream release and in the current RHEV > build. I was not able to confirm it with current upstream since the git tree > is not buildable at the moment. There is a patch queued for this but it is not upstream yet: https://lists.gnu.org/archive/html/qemu-devel/2015-05/msg02433.html It will make it into QEMU 2.4. I used the following script to reproduce the hang: $ cd qemu-kvm $ cp disk.img /tmp/target.raw $ ./qemu-img create -f qcow2 -o backing_file=/tmp/target.raw /tmp/target1.qcow2 $ cat test.py #!/usr/bin/env python2 import sys sys.path.append('scripts/qmp') import qmp mon = qmp.QEMUMonitorProtocol(sys.argv[1]) mon.connect() print mon.command('drive-mirror', device='virtio0', target='/tmp/asdf.qcow', granularity=8192, sync='full', mode='absolute-paths', format='qcow2') Run the test as follows: (terminal1)$ x86_64-softmmu/qemu-system-x86_64 -enable-kvm -cpu host -m 1024 -drive if=virtio,cache=writeback,format=qcow2,aio=threads,file=/tmp/target1.qcow2 -qmp unix:/tmp/qmp,server,nowait (terminal2)$ python2 test.py /tmp/qmp The virtual machine will be unresponsive until the QMP command completes. After applying the patch the virtual machine is more responsive.
It looks like you are encountering several bugs. I have assigned this BZ to Fam for his periodic sleep patch. That will solve the unresponsive VM issue (Comment #4). The failure when the guest writes to the disk is another bug. I am currently trying to reproduce it and will post updates.
Okay, I have reproduced the random EINVAL errors. They happen because block/mirror.c does not check IOV_MAX when putting together the scatter-gather list. I have posted a patch to qemu-devel. Once it is ready for RHEL I'll clone this bug and submit a backport.
The patch mentioned in comment 8 for periodic sleep is posted to rhvirt-patches list.
Fix included in qemu-kvm-rhev-2.3.0-13.el7
Reproduced on qemu-kvm-rhev-2.3.0-12.el7.x86_64: 1. simply start a mirror job with unsupported granularity: {"execute":"drive-mirror","arguments":{"device":"drive-virtio-disk0","target":"/tmp/asdf.qcow","granularity":8192,"sync":"full","mode":"absolute-paths","format":"qcow2"}} {"return": {}} 2. qemu abort (gdb) bt #0 0x00007ffff071a5d7 in raise () from /lib64/libc.so.6 #1 0x00007ffff071bcc8 in abort () from /lib64/libc.so.6 #2 0x0000555555815aa6 in ioq_submit () #3 0x0000555555815e8c in laio_submit () #4 0x0000555555813b66 in raw_aio_submit () #5 0x0000555555813b90 in raw_aio_writev () #6 0x00005555557d7e4a in bdrv_co_io_em () #7 0x00005555557dc98e in bdrv_aligned_pwritev () #8 0x00005555557dd4a7 in bdrv_co_do_pwritev () #9 0x00005555557f901a in qcow2_co_writev () #10 0x00005555557dc98e in bdrv_aligned_pwritev () #11 0x00005555557dd4a7 in bdrv_co_do_pwritev () #12 0x00005555557de1f7 in bdrv_co_do_rw () #13 0x00005555557e816a in coroutine_trampoline () #14 0x00007ffff072c0f0 in ?? () from /lib64/libc.so.6 #15 0x00007fffffffcf60 in ?? () #16 0x0000000000000000 in ?? () Same problem with qemu-kvm-rhev-2.3.0-13.el7.x86_64: (gdb) #0 0x00007ffff071a5d7 in raise () from /lib64/libc.so.6 #1 0x00007ffff071bcc8 in abort () from /lib64/libc.so.6 #2 0x0000555555816296 in ioq_submit () #3 0x000055555581667c in laio_submit () #4 0x0000555555814356 in raw_aio_submit () #5 0x0000555555814380 in raw_aio_writev () #6 0x00005555557d85ea in bdrv_co_io_em () #7 0x00005555557dd1ce in bdrv_aligned_pwritev () #8 0x00005555557ddcd7 in bdrv_co_do_pwritev () #9 0x00005555557f980a in qcow2_co_writev () #10 0x00005555557dd1ce in bdrv_aligned_pwritev () #11 0x00005555557ddcd7 in bdrv_co_do_pwritev () #12 0x00005555557dea17 in bdrv_co_do_rw () #13 0x00005555557e895a in coroutine_trampoline () #14 0x00007ffff072c0f0 in ?? () from /lib64/libc.so.6 #15 0x00007fffffffce70 in ?? () #16 0x0000000000000000 in ?? () remove '"granularity":8192', no abort. Fam, could you have a look?
That is because of the second issue mentioned in comment 8, which will be fixed in bug 1238585.
As comment 14, set dependency, after bug 1238585 fixed, i will verify this bug.
Verification of this bug is blocked by bug: Bug 1251487 - qemu core dump when do drive mirror
Reproduced: Host: kernel:3.10.0-314.el7.x86_64 qemu-kvm-rhev:qemu-kvm-rhev-2.3.0-5.el7.x86_64 1.start mirror job,set granularity=8192 {"execute":"drive-mirror","arguments":{"device":"drive-scsi-disk0","target":"new_mirror1.qcow2","granularity":8192,"sync":"full","mode":"absolute-paths","format":"qcow2"}} {"return": {}} {"timestamp": {"seconds": 1441530068, "microseconds": 997512}, "event": "BLOCK_JOB_ERROR", "data": {"device": "drive-scsi-disk0", "operation": "write", "action": "report"}} {"timestamp": {"seconds": 1441530069, "microseconds": 48618}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "drive-scsi-disk0", "len": 1137246208, "offset": 65536, "speed": 0, "type": "mirror", "error": "Invalid argument"}} 2.check mirror job {"execute":"query-block-jobs"} {"return": []} Verified: Host: kernel:3.10.0-314.el7.x86_64 qemu-kvm-rhev:qemu-kvm-rhev-2.3.0-22.el7.x86_64 1.start mirror job,set granularity=8192, the mirror job works well. {"execute":"drive-mirror","arguments":{"device":"drive-scsi-disk0","target":"new_mirror1.qcow2","granularity":8192,"sync":"full","mode":"absolute-paths","format":"qcow2"}} {"return": {}} {"timestamp": {"seconds": 1441516408, "microseconds": 427394}, "event": "BLOCK_JOB_READY", "data": {"device": "drive-scsi-disk0", "len": 1117069312, "offset": 1117069312, "speed": 0, "type": "mirror"}} 2.check mirror job {"execute":"query-block-jobs"} {"return": [{"io-status": "ok", "device": "drive-scsi-disk0", "busy": false, "len": 1117495296, "offset": 1117495296, "paused": false, "speed": 0, "ready": true, "type": "mirror"}]} So this bug has been fixed.
Hi Stefan, KVM QE verified this bug as Comment 17. Is this right? If I am wrong, Could you please tell KVM QE how to verify this bug? Thanks.
(In reply to zhangpei from comment #18) > KVM QE verified this bug as Comment 17. Is this right? If I am wrong, Could > you please tell KVM QE how to verify this bug? Thanks. Thanks, you have verified the bug correctly.
According to comment17-comment19, set this issue as verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2015-2546.html