Bug 1258737

Summary: [PowerKVM]qemu-kvm core dump when do dd file in the guest and block mirror at the same time
Product: Red Hat Enterprise Linux 7 Reporter: Shuang Yu <shuyu>
Component: qemu-kvm-rhevAssignee: David Gibson <dgibson>
Status: CLOSED DUPLICATE QA Contact: Virtualization Bugs <virt-bugs>
Severity: unspecified Docs Contact:
Priority: high    
Version: 7.2CC: knoel, kwolf, michen, ngu, qzhang, shuyu, thuth, virt-maint, xuhan, zhengtli
Target Milestone: rc   
Target Release: ---   
Hardware: ppc64le   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-09-06 10:37:41 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1251487    
Bug Blocks:    

Description Shuang Yu 2015-09-01 07:22:07 UTC
Description of problem:
In the guest dd a large file,during the file creation start block mirror,then the guest will hit core dump.

Version-Release number of selected component (if applicable):

Host:
kernel-3.10.0-306.0.1.el7.ppc64le
qemu-kvm-rhev-2.3.0-21.el7.ppc64le
SLOF-20150313-3.gitc89b0df.el7.noarch
Guest:
3.10.0-306.0.1.el7.ppc64

How reproducible:
3/3

Steps to Reproduce:

1.Boot the guest with cmd:
# gdb /usr/libexec/qemu-kvm
(gdb) r -name live-snapshot-test -machine pseries,accel=kvm,usb=off -m 8G -smp 4 -nodefaults -monitor stdio -rtc base=utc -drive file=RHEL-7.2-20150820.0-Server-ppc64.qcow2,format=qcow2,if=none,id=drive-virtio0,werror=stop,cache=none -device virtio-blk-pci,id=virtio0,drive=drive-virtio0,disable-legacy=off,disable-modern=on -device spapr-vscsi,id=scsi0,reg=0x1000 -drive file=RHEL-7.2-20150820.0-Server-ppc64-dvd1.iso,if=none,id=drive-scsi0,readonly=on,format=raw -device scsi-cd,bus=scsi0.0,channel=0,scsi-id=0,drive=drive-scsi0,id=scsi0-0 -netdev tap,id=hostnet0,script=/etc/qemu-ifup -device spapr-vlan,netdev=hostnet0,id=net0,mac=00:52:5f:5d:5c:5d -device usb-kbd,id=input0 -device usb-mouse,id=input1 -msg timestamp=on  -usb -device usb-tablet,id=tablet1 -vga std -vnc :19 -uuid bf91fbbf-33a8-4bef-b69a-e591bb233795 -qmp tcp:0:4444,server,nowait

2.in the guest 
dd if=/dev/zero of=file bs=1M count=10240/5000

3.During the step2
{"execute":"drive-mirror","arguments":{"device":"drive-virtio0","target":"/root/test_home/shuyu/8-26-livesnapshot/sn1","format":"qcow2","mode":"absolute-paths","sync":"full","speed":1000000000, "on-source-error": "stop", "on-target-error": "stop"}}


Actual results:
Program received signal SIGABRT, Aborted.

Expected results:
The file should created successful and the mirror should finished successful,the guest can work well.

Additional info:
(gdb) bt full
#0  0x00003fffb6c6e578 in raise () from /lib64/power8/libc.so.6
No symbol table info available.
#1  0x00003fffb6c706fc in abort () from /lib64/power8/libc.so.6
No symbol table info available.
#2  0x0000000047f70ff8 in qemu_coroutine_enter (co=0x4a33d500, 
    opaque=0x0) at qemu-coroutine.c:111
        self = <optimized out>
        ret = <optimized out>
#3  0x0000000047f712ec in qemu_co_queue_run_restart (
    co=0x4a33a200) at qemu-coroutine-lock.c:59
        next = <optimized out>
#4  0x0000000047f70ec0 in qemu_coroutine_enter (co=0x4a33a200, 
    opaque=<optimized out>) at qemu-coroutine.c:118
        self = <optimized out>
        ret = COROUTINE_YIELD
#5  0x0000000047f5a8a4 in bdrv_co_aio_rw_vector (bs=0x491e0000, 
    sector_num=63200512, qiov=0x49116f38, 
    nb_sectors=<optimized out>, flags=<optimized out>, 
    cb=0x47fae200 <mirror_read_complete>, opaque=0x49116f30, 
    is_write=<optimized out>) at block.c:5025
        co = <optimized out>
        acb = 0x4901fc40
---Type <return> to continue, or q <return> to quit---
#6  0x0000000047faef4c in mirror_iteration (s=0x490b7220)
    at block/mirror.c:310
        source = 0x491e0000
        nb_chunks = <optimized out>
        end = <optimized out>
        hbitmap_next_sector = 63207808
        pnum = 7424
        nb_sectors = 7424
        sectors_per_chunk = 128
        sector_num = 63200512
        next_chunk = <optimized out>
        next_sector = 63207936
        op = 0x49116f30
        ret = <optimized out>
        delay_ns = 0
#7  mirror_run (opaque=0x490b7220) at block/mirror.c:522
        delay_ns = 0
        cnt = 16148864
        should_complete = <optimized out>
        s = 0x490b7220
        data = <optimized out>
        bs = 0x491e0000
---Type <return> to continue, or q <return> to quit---
        sector_num = <optimized out>
        end = <optimized out>
        sectors_per_chunk = <optimized out>
        length = <optimized out>
        last_pause_ns = 353621900477369
        bdi = {cluster_size = 1491533424, 
          vm_state_offset = 1491533472, is_dirty = false, 
          unallocated_blocks_are_zero = 111, 
          can_write_zeroes_with_unmap = 60, 
          needs_compressed_writes = 13}
        backing_filename = <incomplete sequence \343>
        ret = <optimized out>
        n = 128
        __PRETTY_FUNCTION__ = "mirror_run"
#8  0x0000000047f71fc8 in coroutine_trampoline (
    i0=<optimized out>, i1=<optimized out>)
    at coroutine-ucontext.c:80
        arg = {p = 0x4a33d500, i = {1244910848, 0}}
        self = 0x4a33d500
        co = 0x4a33d500
#9  0x00003fffb6c81d7c in makecontext ()
   from /lib64/power8/libc.so.6
---Type <return> to continue, or q <return> to quit---
No symbol table info available.
#10 0x0000000000000000 in ?? ()
No symbol table info available.
(gdb)

Comment 2 David Gibson 2015-09-02 05:38:58 UTC
Setting blocker? flag, since this could be a problem for expected RHEV features.

The crash seems to be caused by a re-entered co-routine deep in the block code.  I think I'll need assistance from someone familiar with the qemu block layer.

There's no immediately indication why this would trigger on Power, but not x86 (has the same test been tried on x86?).

Kevin, can you assist, or suggest someone who could?

Comment 3 Qunfang Zhang 2015-09-02 05:52:48 UTC
(In reply to David Gibson from comment #2)
> Setting blocker? flag, since this could be a problem for expected RHEV
> features.
> 
> The crash seems to be caused by a re-entered co-routine deep in the block
> code.  I think I'll need assistance from someone familiar with the qemu
> block layer.
> 
> There's no immediately indication why this would trigger on Power, but not
> x86 (has the same test been tried on x86?).
> 
> Kevin, can you assist, or suggest someone who could?

Hi, David

It seems there is a similar bug filed by x86 kvm QE:

Bug 1251487 - qemu core dump when do drive mirror

And that is already in POST status. If you confirm this is a duplicate, feel free to close it. Or we could re-test it after the bug 1251487 is MODIFIED. 

Thanks,
Qunfang

Comment 4 David Gibson 2015-09-02 06:51:39 UTC
I can't be 100% certain, but it looks very likely that this is a duplicate of bug 1251487.  Let's retest once that patch goes in to be sure.

Comment 5 Kevin Wolf 2015-09-02 08:21:07 UTC
Yes, looks much like the same thing.

On another note: In your next bug report, please include the error messages
that qemu prints before crashing ("Co-routine re-entered recursively" in this
case). As long as it doesn't print any additional information, we can get the
same information from the stack trace and then looking up the code lines, but
it takes much longer than reading an error message.

Comment 9 Shuang Yu 2015-09-06 10:37:41 UTC

*** This bug has been marked as a duplicate of bug 1251487 ***