Hide Forgot
Description of problem: ------------------------ qemu-kvm core dump when mirroring + streaming + guest s4 Version-Release number of selected component (if applicable): -------------------------------------------------------------- qemu-kvm 267rhev guest kernel 258 How reproducible: ------------------- 1/1 Steps to Reproduce: -------------------- 1.boot guest: /usr/libexec/qemu-kvm -enable-kvm -M rhel6.3.0 -m 4G -name rhel6.3-64 -rtc base=utc,clock=host,driftfix=slew -no-kvm-pit-reinjection -uuid 3f2ea5cd-3d29-48ff-aab2-23df1b6ae213 -drive file=/root/RHEL-Server-6.3-64-virtio.qcow2,cache=none,if=none,rerror=stop,werror=stop,id=drive-virtio-disk0,format=qcow2 -device virtio-blk-pci,drive=drive-virtio-disk0,id=device-virtio-disk0 -netdev tap,script=/etc/qemu-ifup,id=netdev0 -device virtio-net-pci,netdev=netdev0,id=device-net0 -boot order=cd -monitor stdio -usb -device usb-tablet,id=input0 -chardev socket,id=s1,path=/tmp/s1,server,nowait -device isa-serial,chardev=s1 -vnc :10 -monitor tcp::1234,server,nowait -smp 4 -qmp tcp:0:5555,server,nowait -chardev socket,id=qmp_monitor_id_qmpmonitor1,path=/tmp/qmp,server,nowait -mon chardev=qmp_monitor_id_qmpmonitor1,mode=control 2. in qemu monitor: (qemu) __com.redhat_drive-mirror drive-virtio-disk0 /root/sn1 3. in qemu monitor: (qemu) block_stream drive-virtio-disk0 4. in guest: echo disk > /sys/power/state Actual results: ---------------- qemu-kvm core dump: (gdb) bt #0 0x00007f31e4bbd63d in bdrv_co_io_em (bs=0x7f31e55cd010, sector_num=2575872, nb_sectors=1024, iov=0x7f30bcfe1e90, is_write=true) at block.c:3147 #1 0x00007f31e4bbe222 in bdrv_co_do_copy_on_readv (bs=0x7f31e55cd010, sector_num=2575872, nb_sectors=1024, qiov=0x7f30bcfe1f50, flags=<value optimized out>) at block.c:1550 #2 bdrv_co_do_readv (bs=0x7f31e55cd010, sector_num=2575872, nb_sectors=1024, qiov=0x7f30bcfe1f50, flags=<value optimized out>) at block.c:1611 #3 0x00007f31e4bdd577 in stream_populate (opaque=0x7f31e6bf7a20) at block/stream.c:76 #4 stream_run (opaque=0x7f31e6bf7a20) at block/stream.c:198 #5 0x00007f31e4bc39bb in coroutine_trampoline (i0=<value optimized out>, i1=<value optimized out>) at coroutine-ucontext.c:129 #6 0x00007f31e2517610 in ?? () from /lib64/libc-2.12.so #7 0x00007fff86aa6130 in ?? () #8 0x0000000000000000 in ?? () i install glibc debug package, but there is still unknown symbols, please let me know the missing package if necessary.
Instead s4, can you try to do lots of disk IO while streaming?
*** Bug 807898 has been marked as a duplicate of this bug. ***
(In reply to comment #2) > Instead s4, can you try to do lots of disk IO while streaming? add stress in guest: stress --cpu 8 --io 4 --vm 2 --vm-bytes 128M dd if=/dev/zero of=/root/tmp bs=1G count=10 do mirroring + streaming: (qemu) __com.redhat_drive-mirror drive-virtio-disk0 /root/sn1 (qemu) block_stream drive-virtio-disk0 streaming finishes correctly, guest work correctly. exit qemu-kvm, boot guest with /root/sn1, guest kernel panic, screenshot in attachment. [root@shu ~]# qemu-img info sn1 image: sn1 file format: qcow2 virtual size: 20G (21474836480 bytes) disk size: 20G cluster_size: 65536 backing file: /root/RHEL-Server-6.3-64-virtio.qcow2 (actual path: /root/RHEL-Server-6.3-64-virtio.qcow2) the backing file is still /root/RHEL-Server-6.3-64-virtio.qcow2, is this the problem? is there a way to modify a qcow2 file's backing file manually? i want to confirm whether setting sn1's backing file as null will solve the problem. FYI, i also tried during block streaming (not finish), quit qemu-kvm, boot with sn1 always fails, file system crash, is this a problem? when we design mirroring, when this situation happens, which side do we plan to pick?
Created attachment 573566 [details] guest kernel panic
just come into another situation: with same command line in comment 0, simply: (qemu) __com.redhat_drive-mirror drive-virtio-disk0 /root/sn1 Formatting '/root/sn1', fmt=qcow2 size=21474836480 backing_file='/root/RHEL-Server-6.3-64-virtio.qcow2' backing_fmt='qcow2' encryption=off cluster_size=65536 (qemu) block_stream drive-virtio-disk0 (qemu) quit Program terminated with signal 11, Segmentation fault. #0 0x00007f889bcacc75 in bswap16 (bs=0x7f889cdb2480, cluster_index=0) at ./bswap.h:54 54 return bswap_16(x); (gdb) bt #0 0x00007f889bcacc75 in bswap16 (bs=0x7f889cdb2480, cluster_index=0) at ./bswap.h:54 #1 be16_to_cpu (bs=0x7f889cdb2480, cluster_index=0) at ./bswap.h:127 #2 get_refcount (bs=0x7f889cdb2480, cluster_index=0) at block/qcow2-refcount.c:109 #3 0x00007f889bcae855 in alloc_clusters_noref (bs=0x7f889cdb2480, size=65536) at block/qcow2-refcount.c:549 #4 qcow2_alloc_clusters (bs=0x7f889cdb2480, size=65536) at block/qcow2-refcount.c:571 #5 0x00007f889bcaf0a6 in l2_allocate (bs=0x7f889cdb2480, offset=307757056, new_l2_table=0x7f889e6db998, new_l2_offset=0x7f889e6db9a0, new_l2_index=0x7f889e6db9ac) at block/qcow2-cluster.c:168 #6 get_cluster_table (bs=0x7f889cdb2480, offset=307757056, new_l2_table=0x7f889e6db998, new_l2_offset=0x7f889e6db9a0, new_l2_index=0x7f889e6db9ac) at block/qcow2-cluster.c:512 #7 0x00007f889bcaf536 in qcow2_alloc_cluster_offset (bs=0x7f889cdb2480, offset=307757056, n_start=0, n_end=1024, num=0x7f889e6dbabc, m=0x7f889e6dba50) at block/qcow2-cluster.c:714 #8 0x00007f889bcab20f in qcow2_co_writev (bs=0x7f889cdb2480, sector_num=<value optimized out>, remaining_sectors=1024, qiov=0x7f889bbb0e90) at block/qcow2.c:555 #9 0x00007f889bc964df in bdrv_co_do_writev (bs=0x7f889cdb2480, sector_num=601088, nb_sectors=1024, qiov=0x7f889bbb0e90, flags=<value optimized out>) at block.c:1700 #10 0x00007f889bc96581 in bdrv_co_do_rw (opaque=<value optimized out>) at block.c:3000 #11 0x00007f889bc9b9bb in coroutine_trampoline (i0=<value optimized out>, i1=<value optimized out>) at coroutine-ucontext.c:129 #12 0x00007f88995ef610 in ?? () from /lib64/libc-2.12.so #13 0x00007f889bbb0a30 in ?? () #14 0x0000000000000000 in ?? ()
Complementary to comment 0: Program terminated with signal 11, Segmentation fault. #0 0x00007f31e4bbd63d in bdrv_co_io_em (bs=0x7f31e55cd010, sector_num=2575872, nb_sectors=1024, iov=0x7f30bcfe1e90, is_write=true) at block.c:3147 3147 acb = bs->drv->bdrv_aio_writev(bs, sector_num, iov, nb_sectors, (gdb) bt #0 0x00007f31e4bbd63d in bdrv_co_io_em (bs=0x7f31e55cd010, sector_num=2575872, nb_sectors=1024, iov=0x7f30bcfe1e90, is_write=true) at block.c:3147 #1 0x00007f31e4bbe222 in bdrv_co_do_copy_on_readv (bs=0x7f31e55cd010, sector_num=2575872, nb_sectors=1024, qiov=0x7f30bcfe1f50, flags=<value optimized out>) at block.c:1550 #2 bdrv_co_do_readv (bs=0x7f31e55cd010, sector_num=2575872, nb_sectors=1024, qiov=0x7f30bcfe1f50, flags=<value optimized out>) at block.c:1611 #3 0x00007f31e4bdd577 in stream_populate (opaque=0x7f31e6bf7a20) at block/stream.c:76 #4 stream_run (opaque=0x7f31e6bf7a20) at block/stream.c:198 #5 0x00007f31e4bc39bb in coroutine_trampoline (i0=<value optimized out>, i1=<value optimized out>) at coroutine-ucontext.c:129 #6 0x00007f31e2517610 in ?? () from /lib64/libc-2.12.so #7 0x00007fff86aa6130 in ?? () #8 0x0000000000000000 in ?? ()
Note: core dump in comment 6 does not happen every time.
ok, without mirroring: (qemu) snapshot_blkdev drive-virtio-disk0 /root/sn1 qcow2 (qemu) block_stream drive-virtio-disk0 (qemu) quit qemu-kvm core dump: Program terminated with signal 6, Aborted. #0 0x00007f5400ae1885 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 64 return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig); (gdb) bt #0 0x00007f5400ae1885 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 #1 0x00007f5400ae3065 in abort () at abort.c:92 #2 0x00007f5400ada9fe in __assert_fail_base (fmt=<value optimized out>, assertion=0x7f540330b286 "c->entries[i].ref == 0", file=0x7f540330b25b "block/qcow2-cache.c", line=<value optimized out>, function=<value optimized out>) at assert.c:96 #3 0x00007f5400adaac0 in __assert_fail (assertion=0x7f540330b286 "c->entries[i].ref == 0", file=0x7f540330b25b "block/qcow2-cache.c", line=69, function=0x7f540330b2b0 "qcow2_cache_destroy") at assert.c:105 #4 0x00007f54031b4324 in qcow2_cache_destroy (bs=<value optimized out>, c=0x7f54049dcd70) at block/qcow2-cache.c:69 #5 0x00007f54031ae34a in qcow2_close (bs=0x7f54047e4010) at block/qcow2.c:628 #6 0x00007f5403197f21 in bdrv_close (bs=0x7f54047e4010) at block.c:693 #7 0x00007f5403198068 in bdrv_close_all () at block.c:717 #8 0x00007f5403184985 in kvm_main_loop () at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:2270 #9 0x00007f5403165cec in main_loop (argc=20, argv=<value optimized out>, envp=<value optimized out>) at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:4202 #10 main (argc=20, argv=<value optimized out>, envp=<value optimized out>) at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:6427 This one is similar to Bug 798499 - Guest aborted sometimes when quit it after a savevm. Ones in comment 0 and comment 6 seem to be different, however, i don't know whether these three are related, keep them here temporarily. This is a test blocker, raise priority.
*Note: comment 6 and comment 9 are using "ide-drive", not "virtio-blk-pci"
When using QMP, after block_stream finishes: {"timestamp": {"seconds": 1333017106, "microseconds": 862664}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "drive-virtio-disk0", "len": 21474836480, "offset": 21474836480, "speed": 0, "type": "stream", "error": "Operation not supported"}} using ide-drive and virtio-blk-pci both hit this. and boot guest with sn1, guest kernel panic like in attachment.
Comment 6 and comment 9 are fixed by attachment 572353 [details]. It worked in my tests, but I'll be brewing and posting this today. If streaming does not finish and qemu-kvm exits, and the destination fails to boot, it should be considered a minor bug.
Sorry, comment 6 and comment 12. I reopened bug 807898 for comment 9. Finally, comment 0 seems to be similar to comment 9 but specific to mirroring.
Created attachment 573718 [details] patch to fix the bug, RHEL version
Created attachment 573719 [details] patch to fix the bug
Closing as WONTFIX. The current blkmirror is not reparable for the mirror+stream case. It's not sure what solution we will implement, but it will not have this problem because it doesn't use block_stream.