1258737 – [PowerKVM]qemu-kvm core dump when do dd file in the guest and block mirror at the same time

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1258737 - [PowerKVM]qemu-kvm core dump when do dd file in the guest and block mirror at the same time

Summary: [PowerKVM]qemu-kvm core dump when do dd file in the guest and block mirror at...

Keywords:
Status:	CLOSED DUPLICATE of bug 1251487
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	qemu-kvm-rhev
Sub Component:
Version:	7.2
Hardware:	ppc64le
OS:	Unspecified
Priority:	high
Severity:	unspecified
Target Milestone:	rc
Target Release:	---
Assignee:	David Gibson
QA Contact:	Virtualization Bugs
Docs Contact:
URL:
Whiteboard:
Depends On:	1251487
Blocks:
TreeView+	depends on / blocked

Reported:	2015-09-01 07:22 UTC by Shuang Yu
Modified:	2016-01-15 03:00 UTC (History)
CC List:	10 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2015-09-06 10:37:41 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Shuang Yu 2015-09-01 07:22:07 UTC

Description of problem:
In the guest dd a large file,during the file creation start block mirror,then the guest will hit core dump.

Version-Release number of selected component (if applicable):

Host:
kernel-3.10.0-306.0.1.el7.ppc64le
qemu-kvm-rhev-2.3.0-21.el7.ppc64le
SLOF-20150313-3.gitc89b0df.el7.noarch
Guest:
3.10.0-306.0.1.el7.ppc64

How reproducible:
3/3

Steps to Reproduce:

1.Boot the guest with cmd:
# gdb /usr/libexec/qemu-kvm
(gdb) r -name live-snapshot-test -machine pseries,accel=kvm,usb=off -m 8G -smp 4 -nodefaults -monitor stdio -rtc base=utc -drive file=RHEL-7.2-20150820.0-Server-ppc64.qcow2,format=qcow2,if=none,id=drive-virtio0,werror=stop,cache=none -device virtio-blk-pci,id=virtio0,drive=drive-virtio0,disable-legacy=off,disable-modern=on -device spapr-vscsi,id=scsi0,reg=0x1000 -drive file=RHEL-7.2-20150820.0-Server-ppc64-dvd1.iso,if=none,id=drive-scsi0,readonly=on,format=raw -device scsi-cd,bus=scsi0.0,channel=0,scsi-id=0,drive=drive-scsi0,id=scsi0-0 -netdev tap,id=hostnet0,script=/etc/qemu-ifup -device spapr-vlan,netdev=hostnet0,id=net0,mac=00:52:5f:5d:5c:5d -device usb-kbd,id=input0 -device usb-mouse,id=input1 -msg timestamp=on  -usb -device usb-tablet,id=tablet1 -vga std -vnc :19 -uuid bf91fbbf-33a8-4bef-b69a-e591bb233795 -qmp tcp:0:4444,server,nowait

2.in the guest 
dd if=/dev/zero of=file bs=1M count=10240/5000

3.During the step2
{"execute":"drive-mirror","arguments":{"device":"drive-virtio0","target":"/root/test_home/shuyu/8-26-livesnapshot/sn1","format":"qcow2","mode":"absolute-paths","sync":"full","speed":1000000000, "on-source-error": "stop", "on-target-error": "stop"}}


Actual results:
Program received signal SIGABRT, Aborted.

Expected results:
The file should created successful and the mirror should finished successful,the guest can work well.

Additional info:
(gdb) bt full
#0  0x00003fffb6c6e578 in raise () from /lib64/power8/libc.so.6
No symbol table info available.
#1  0x00003fffb6c706fc in abort () from /lib64/power8/libc.so.6
No symbol table info available.
#2  0x0000000047f70ff8 in qemu_coroutine_enter (co=0x4a33d500, 
    opaque=0x0) at qemu-coroutine.c:111
        self = <optimized out>
        ret = <optimized out>
#3  0x0000000047f712ec in qemu_co_queue_run_restart (
    co=0x4a33a200) at qemu-coroutine-lock.c:59
        next = <optimized out>
#4  0x0000000047f70ec0 in qemu_coroutine_enter (co=0x4a33a200, 
    opaque=<optimized out>) at qemu-coroutine.c:118
        self = <optimized out>
        ret = COROUTINE_YIELD
#5  0x0000000047f5a8a4 in bdrv_co_aio_rw_vector (bs=0x491e0000, 
    sector_num=63200512, qiov=0x49116f38, 
    nb_sectors=<optimized out>, flags=<optimized out>, 
    cb=0x47fae200 <mirror_read_complete>, opaque=0x49116f30, 
    is_write=<optimized out>) at block.c:5025
        co = <optimized out>
        acb = 0x4901fc40
---Type <return> to continue, or q <return> to quit---
#6  0x0000000047faef4c in mirror_iteration (s=0x490b7220)
    at block/mirror.c:310
        source = 0x491e0000
        nb_chunks = <optimized out>
        end = <optimized out>
        hbitmap_next_sector = 63207808
        pnum = 7424
        nb_sectors = 7424
        sectors_per_chunk = 128
        sector_num = 63200512
        next_chunk = <optimized out>
        next_sector = 63207936
        op = 0x49116f30
        ret = <optimized out>
        delay_ns = 0
#7  mirror_run (opaque=0x490b7220) at block/mirror.c:522
        delay_ns = 0
        cnt = 16148864
        should_complete = <optimized out>
        s = 0x490b7220
        data = <optimized out>
        bs = 0x491e0000
---Type <return> to continue, or q <return> to quit---
        sector_num = <optimized out>
        end = <optimized out>
        sectors_per_chunk = <optimized out>
        length = <optimized out>
        last_pause_ns = 353621900477369
        bdi = {cluster_size = 1491533424, 
          vm_state_offset = 1491533472, is_dirty = false, 
          unallocated_blocks_are_zero = 111, 
          can_write_zeroes_with_unmap = 60, 
          needs_compressed_writes = 13}
        backing_filename = <incomplete sequence \343>
        ret = <optimized out>
        n = 128
        __PRETTY_FUNCTION__ = "mirror_run"
#8  0x0000000047f71fc8 in coroutine_trampoline (
    i0=<optimized out>, i1=<optimized out>)
    at coroutine-ucontext.c:80
        arg = {p = 0x4a33d500, i = {1244910848, 0}}
        self = 0x4a33d500
        co = 0x4a33d500
#9  0x00003fffb6c81d7c in makecontext ()
   from /lib64/power8/libc.so.6
---Type <return> to continue, or q <return> to quit---
No symbol table info available.
#10 0x0000000000000000 in ?? ()
No symbol table info available.
(gdb)

Comment 2 David Gibson 2015-09-02 05:38:58 UTC

Setting blocker? flag, since this could be a problem for expected RHEV features.

The crash seems to be caused by a re-entered co-routine deep in the block code.  I think I'll need assistance from someone familiar with the qemu block layer.

There's no immediately indication why this would trigger on Power, but not x86 (has the same test been tried on x86?).

Kevin, can you assist, or suggest someone who could?

Comment 3 Qunfang Zhang 2015-09-02 05:52:48 UTC

(In reply to David Gibson from comment #2)
> Setting blocker? flag, since this could be a problem for expected RHEV
> features.
> 
> The crash seems to be caused by a re-entered co-routine deep in the block
> code.  I think I'll need assistance from someone familiar with the qemu
> block layer.
> 
> There's no immediately indication why this would trigger on Power, but not
> x86 (has the same test been tried on x86?).
> 
> Kevin, can you assist, or suggest someone who could?

Hi, David

It seems there is a similar bug filed by x86 kvm QE:

Bug 1251487 - qemu core dump when do drive mirror

And that is already in POST status. If you confirm this is a duplicate, feel free to close it. Or we could re-test it after the bug 1251487 is MODIFIED. 

Thanks,
Qunfang

Comment 4 David Gibson 2015-09-02 06:51:39 UTC

I can't be 100% certain, but it looks very likely that this is a duplicate of bug 1251487.  Let's retest once that patch goes in to be sure.

Comment 5 Kevin Wolf 2015-09-02 08:21:07 UTC

Yes, looks much like the same thing.

On another note: In your next bug report, please include the error messages
that qemu prints before crashing ("Co-routine re-entered recursively" in this
case). As long as it doesn't print any additional information, we can get the
same information from the stack trace and then looking up the code lines, but
it takes much longer than reading an error message.

Comment 9 Shuang Yu 2015-09-06 10:37:41 UTC


*** This bug has been marked as a duplicate of bug 1251487 ***

Note You need to log in before you can comment on or make changes to this bug.