Bug 807894

Summary: mirroring leaves bad backing file
Product: Red Hat Enterprise Linux 6 Reporter: Shaolong Hu <shu>
Component: qemu-kvmAssignee: Paolo Bonzini <pbonzini>
Status: CLOSED WONTFIX QA Contact: Virtualization Bugs <virt-bugs>
Severity: high Docs Contact:
Priority: high    
Version: 6.3CC: acathrow, areis, bsarathy, dyasny, juzhang, michen, mkenneth, pbonzini, tburke, virt-maint
Target Milestone: rcKeywords: TestBlocker
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-04-04 08:23:42 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 801449    
Bug Blocks: 806280, 806432    
Attachments:
Description Flags
guest kernel panic
none
patch to fix the bug, RHEL version
none
patch to fix the bug none

Description Shaolong Hu 2012-03-29 05:08:18 UTC
Description of problem:
------------------------
qemu-kvm core dump when mirroring + streaming + guest s4


Version-Release number of selected component (if applicable):
--------------------------------------------------------------
qemu-kvm 267rhev
guest kernel 258


How reproducible:
-------------------
1/1


Steps to Reproduce:
--------------------
1.boot guest:
/usr/libexec/qemu-kvm -enable-kvm -M rhel6.3.0 -m 4G -name rhel6.3-64 -rtc base=utc,clock=host,driftfix=slew -no-kvm-pit-reinjection -uuid 3f2ea5cd-3d29-48ff-aab2-23df1b6ae213 -drive file=/root/RHEL-Server-6.3-64-virtio.qcow2,cache=none,if=none,rerror=stop,werror=stop,id=drive-virtio-disk0,format=qcow2 -device virtio-blk-pci,drive=drive-virtio-disk0,id=device-virtio-disk0 -netdev tap,script=/etc/qemu-ifup,id=netdev0 -device virtio-net-pci,netdev=netdev0,id=device-net0 -boot order=cd -monitor stdio -usb -device usb-tablet,id=input0 -chardev socket,id=s1,path=/tmp/s1,server,nowait -device isa-serial,chardev=s1 -vnc :10 -monitor tcp::1234,server,nowait -smp 4 -qmp tcp:0:5555,server,nowait -chardev socket,id=qmp_monitor_id_qmpmonitor1,path=/tmp/qmp,server,nowait -mon chardev=qmp_monitor_id_qmpmonitor1,mode=control

2. in qemu monitor:
(qemu) __com.redhat_drive-mirror drive-virtio-disk0 /root/sn1

3. in qemu monitor:
(qemu) block_stream drive-virtio-disk0

4. in guest:
echo disk > /sys/power/state

  
Actual results:
----------------
qemu-kvm core dump:
(gdb) bt
#0  0x00007f31e4bbd63d in bdrv_co_io_em (bs=0x7f31e55cd010, sector_num=2575872, nb_sectors=1024, iov=0x7f30bcfe1e90, is_write=true) at block.c:3147
#1  0x00007f31e4bbe222 in bdrv_co_do_copy_on_readv (bs=0x7f31e55cd010, sector_num=2575872, nb_sectors=1024, qiov=0x7f30bcfe1f50, flags=<value optimized out>) at block.c:1550
#2  bdrv_co_do_readv (bs=0x7f31e55cd010, sector_num=2575872, nb_sectors=1024, qiov=0x7f30bcfe1f50, flags=<value optimized out>) at block.c:1611
#3  0x00007f31e4bdd577 in stream_populate (opaque=0x7f31e6bf7a20) at block/stream.c:76
#4  stream_run (opaque=0x7f31e6bf7a20) at block/stream.c:198
#5  0x00007f31e4bc39bb in coroutine_trampoline (i0=<value optimized out>, i1=<value optimized out>) at coroutine-ucontext.c:129
#6  0x00007f31e2517610 in ?? () from /lib64/libc-2.12.so
#7  0x00007fff86aa6130 in ?? ()
#8  0x0000000000000000 in ?? ()


i install glibc debug package, but there is still unknown symbols, please let me know the missing package if necessary.

Comment 2 Dor Laor 2012-03-29 07:15:45 UTC
Instead s4, can you try to do lots of disk IO while streaming?

Comment 3 Dor Laor 2012-03-29 07:17:53 UTC
*** Bug 807898 has been marked as a duplicate of this bug. ***

Comment 4 Shaolong Hu 2012-03-29 08:12:58 UTC
(In reply to comment #2)
> Instead s4, can you try to do lots of disk IO while streaming?

add stress in guest:

stress --cpu 8 --io 4 --vm 2 --vm-bytes 128M
dd if=/dev/zero of=/root/tmp bs=1G count=10

do mirroring + streaming:
(qemu) __com.redhat_drive-mirror drive-virtio-disk0 /root/sn1      
(qemu) block_stream drive-virtio-disk0 

streaming finishes correctly, guest work correctly.

exit qemu-kvm, boot guest with /root/sn1, guest kernel panic, screenshot in attachment.

[root@shu ~]# qemu-img info sn1
image: sn1
file format: qcow2
virtual size: 20G (21474836480 bytes)
disk size: 20G
cluster_size: 65536
backing file: /root/RHEL-Server-6.3-64-virtio.qcow2 (actual path: /root/RHEL-Server-6.3-64-virtio.qcow2)

the backing file is still /root/RHEL-Server-6.3-64-virtio.qcow2, is this the problem?

is there a way to modify a qcow2 file's backing file manually? i want to confirm whether setting sn1's backing file as null will solve the problem.


FYI, i also tried during block streaming (not finish), quit qemu-kvm, boot with sn1 always fails, file system crash, is this a problem? when we design mirroring, when this situation happens, which side do we plan to pick?

Comment 5 Shaolong Hu 2012-03-29 08:15:20 UTC
Created attachment 573566 [details]
guest kernel panic

Comment 6 Shaolong Hu 2012-03-29 08:50:22 UTC
just come into another situation:

with same command line in comment 0, simply:

(qemu) __com.redhat_drive-mirror drive-virtio-disk0 /root/sn1
Formatting '/root/sn1', fmt=qcow2 size=21474836480 backing_file='/root/RHEL-Server-6.3-64-virtio.qcow2' backing_fmt='qcow2' encryption=off cluster_size=65536 
(qemu) block_stream drive-virtio-disk0
(qemu) quit


Program terminated with signal 11, Segmentation fault.
#0  0x00007f889bcacc75 in bswap16 (bs=0x7f889cdb2480, cluster_index=0) at ./bswap.h:54
54	    return bswap_16(x);

(gdb) bt
#0  0x00007f889bcacc75 in bswap16 (bs=0x7f889cdb2480, cluster_index=0) at ./bswap.h:54
#1  be16_to_cpu (bs=0x7f889cdb2480, cluster_index=0) at ./bswap.h:127
#2  get_refcount (bs=0x7f889cdb2480, cluster_index=0) at block/qcow2-refcount.c:109
#3  0x00007f889bcae855 in alloc_clusters_noref (bs=0x7f889cdb2480, size=65536) at block/qcow2-refcount.c:549
#4  qcow2_alloc_clusters (bs=0x7f889cdb2480, size=65536) at block/qcow2-refcount.c:571
#5  0x00007f889bcaf0a6 in l2_allocate (bs=0x7f889cdb2480, offset=307757056, new_l2_table=0x7f889e6db998, new_l2_offset=0x7f889e6db9a0, new_l2_index=0x7f889e6db9ac) at block/qcow2-cluster.c:168
#6  get_cluster_table (bs=0x7f889cdb2480, offset=307757056, new_l2_table=0x7f889e6db998, new_l2_offset=0x7f889e6db9a0, new_l2_index=0x7f889e6db9ac) at block/qcow2-cluster.c:512
#7  0x00007f889bcaf536 in qcow2_alloc_cluster_offset (bs=0x7f889cdb2480, offset=307757056, n_start=0, n_end=1024, num=0x7f889e6dbabc, m=0x7f889e6dba50) at block/qcow2-cluster.c:714
#8  0x00007f889bcab20f in qcow2_co_writev (bs=0x7f889cdb2480, sector_num=<value optimized out>, remaining_sectors=1024, qiov=0x7f889bbb0e90) at block/qcow2.c:555
#9  0x00007f889bc964df in bdrv_co_do_writev (bs=0x7f889cdb2480, sector_num=601088, nb_sectors=1024, qiov=0x7f889bbb0e90, flags=<value optimized out>) at block.c:1700
#10 0x00007f889bc96581 in bdrv_co_do_rw (opaque=<value optimized out>) at block.c:3000
#11 0x00007f889bc9b9bb in coroutine_trampoline (i0=<value optimized out>, i1=<value optimized out>) at coroutine-ucontext.c:129
#12 0x00007f88995ef610 in ?? () from /lib64/libc-2.12.so
#13 0x00007f889bbb0a30 in ?? ()
#14 0x0000000000000000 in ?? ()

Comment 7 Shaolong Hu 2012-03-29 08:52:53 UTC
Complementary to comment 0:

Program terminated with signal 11, Segmentation fault.
#0  0x00007f31e4bbd63d in bdrv_co_io_em (bs=0x7f31e55cd010, sector_num=2575872, nb_sectors=1024, iov=0x7f30bcfe1e90, is_write=true) at block.c:3147
3147	        acb = bs->drv->bdrv_aio_writev(bs, sector_num, iov, nb_sectors,

(gdb) bt
#0  0x00007f31e4bbd63d in bdrv_co_io_em (bs=0x7f31e55cd010, sector_num=2575872, nb_sectors=1024, iov=0x7f30bcfe1e90, is_write=true) at block.c:3147
#1  0x00007f31e4bbe222 in bdrv_co_do_copy_on_readv (bs=0x7f31e55cd010, sector_num=2575872, nb_sectors=1024, qiov=0x7f30bcfe1f50, flags=<value optimized out>) at block.c:1550
#2  bdrv_co_do_readv (bs=0x7f31e55cd010, sector_num=2575872, nb_sectors=1024, qiov=0x7f30bcfe1f50, flags=<value optimized out>) at block.c:1611
#3  0x00007f31e4bdd577 in stream_populate (opaque=0x7f31e6bf7a20) at block/stream.c:76
#4  stream_run (opaque=0x7f31e6bf7a20) at block/stream.c:198
#5  0x00007f31e4bc39bb in coroutine_trampoline (i0=<value optimized out>, i1=<value optimized out>) at coroutine-ucontext.c:129
#6  0x00007f31e2517610 in ?? () from /lib64/libc-2.12.so
#7  0x00007fff86aa6130 in ?? ()
#8  0x0000000000000000 in ?? ()

Comment 8 Shaolong Hu 2012-03-29 09:00:03 UTC
Note: core dump in comment 6 does not happen every time.

Comment 9 Shaolong Hu 2012-03-29 09:09:37 UTC
ok, without mirroring:

(qemu) snapshot_blkdev drive-virtio-disk0 /root/sn1 qcow2  
(qemu) block_stream drive-virtio-disk0 
(qemu) quit

qemu-kvm core dump:

Program terminated with signal 6, Aborted.
#0  0x00007f5400ae1885 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
64	  return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);

(gdb) bt
#0  0x00007f5400ae1885 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1  0x00007f5400ae3065 in abort () at abort.c:92
#2  0x00007f5400ada9fe in __assert_fail_base (fmt=<value optimized out>, assertion=0x7f540330b286 "c->entries[i].ref == 0", file=0x7f540330b25b "block/qcow2-cache.c", line=<value optimized out>, 
    function=<value optimized out>) at assert.c:96
#3  0x00007f5400adaac0 in __assert_fail (assertion=0x7f540330b286 "c->entries[i].ref == 0", file=0x7f540330b25b "block/qcow2-cache.c", line=69, function=0x7f540330b2b0 "qcow2_cache_destroy") at assert.c:105
#4  0x00007f54031b4324 in qcow2_cache_destroy (bs=<value optimized out>, c=0x7f54049dcd70) at block/qcow2-cache.c:69
#5  0x00007f54031ae34a in qcow2_close (bs=0x7f54047e4010) at block/qcow2.c:628
#6  0x00007f5403197f21 in bdrv_close (bs=0x7f54047e4010) at block.c:693
#7  0x00007f5403198068 in bdrv_close_all () at block.c:717
#8  0x00007f5403184985 in kvm_main_loop () at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:2270
#9  0x00007f5403165cec in main_loop (argc=20, argv=<value optimized out>, envp=<value optimized out>) at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:4202
#10 main (argc=20, argv=<value optimized out>, envp=<value optimized out>) at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:6427



This one is similar to Bug 798499 - Guest aborted sometimes when quit it after a savevm.


Ones in comment 0 and comment 6 seem to be different, however, i don't know whether these three are related, keep them here temporarily.

This is a test blocker, raise priority.

Comment 11 Shaolong Hu 2012-03-29 10:08:34 UTC
*Note: comment 6 and comment 9 are using "ide-drive", not "virtio-blk-pci"

Comment 12 Shaolong Hu 2012-03-29 10:39:37 UTC
When using QMP, after block_stream finishes:

{"timestamp": {"seconds": 1333017106, "microseconds": 862664}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "drive-virtio-disk0", "len": 21474836480, "offset": 21474836480, "speed": 0, "type": "stream", "error": "Operation not supported"}}

using ide-drive and virtio-blk-pci both hit this.

and boot guest with sn1, guest kernel panic like in attachment.

Comment 13 Paolo Bonzini 2012-03-29 13:57:27 UTC
Comment 6 and comment 9 are fixed by attachment 572353 [details].  It worked in my tests, but I'll be brewing and posting this today.

If streaming does not finish and qemu-kvm exits, and the destination fails to boot, it should be considered a minor bug.

Comment 14 Paolo Bonzini 2012-03-29 14:18:50 UTC
Sorry, comment 6 and comment 12.

I reopened bug 807898 for comment 9.

Finally, comment 0 seems to be similar to comment 9 but specific to mirroring.

Comment 15 Paolo Bonzini 2012-03-29 15:56:26 UTC
Created attachment 573718 [details]
patch to fix the bug, RHEL version

Comment 16 Paolo Bonzini 2012-03-29 15:57:24 UTC
Created attachment 573719 [details]
patch to fix the bug

Comment 17 Paolo Bonzini 2012-04-04 08:23:42 UTC
Closing as WONTFIX.  The current blkmirror is not reparable for the mirror+stream case.  It's not sure what solution we will implement, but it will not have this problem because it doesn't use block_stream.