Bug 1171552 - Storage vm migration failed when running BurnInTes
Summary: Storage vm migration failed when running BurnInTes
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm-rhev
Version: 7.1
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: rc
: ---
Assignee: Stefan Hajnoczi
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-12-08 05:07 UTC by juzhang
Modified: 2015-03-05 09:59 UTC (History)
9 users (show)

Fixed In Version: qemu-kvm-rhev-2.1.2-17.el7
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-03-05 09:59:29 UTC
shu: needinfo-


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2015:0624 normal SHIPPED_LIVE Important: qemu-kvm-rhev security, bug fix, and enhancement update 2015-03-05 14:37:36 UTC

Description juzhang 2014-12-08 05:07:51 UTC
Description of problem:
https://lists.gnu.org/archive/html/qemu-devel/2014-11/msg03599.html 

Version-Release number of selected component (if applicable):


How reproducible:
N/A

Steps to Reproduce:
1.virsh migrate --live --copy-storage-all rhel7 qemu+ssh://10.66.32.106/system

2. win7-64 guest running burnInTest(local disk scan test)

3. Tried few times

Actual results:


Expected results:
Program received signal SIGSEGV, Segmentation fault.
0x00007f90d250db24 in get_cluster_table (bs=0x7f90d493f500, offset=1832189952, 
new_l2_table=0x7f8fbd6faa88, 
    new_l2_index=0x7f8fbd6faaa0) at block/qcow2-cluster.c:573
573         l2_offset = s->l1_table[l1_index] & L1E_OFFSET_MASK;
(gdb) bt
#0  0x00007f90d250db24 in get_cluster_table (bs=0x7f90d493f500, 
offset=1832189952, new_l2_table=0x7f8fbd6faa88, 
    new_l2_index=0x7f8fbd6faaa0) at block/qcow2-cluster.c:573
#1  0x00007f90d250e577 in handle_copied (bs=0x7f90d493f500, 
guest_offset=1832189952, host_offset=0x7f8fbd6fab18, 
    bytes=0x7f8fbd6fab20, m=0x7f8fbd6fabc8) at block/qcow2-cluster.c:927
#2  0x00007f90d250ef45 in qcow2_alloc_cluster_offset (bs=0x7f90d493f500, 
offset=1832189952, num=0x7f8fbd6fabfc, 
    host_offset=0x7f8fbd6fabc0, m=0x7f8fbd6fabc8) at block/qcow2-cluster.c:1269
#3  0x00007f90d250445f in qcow2_co_writev (bs=0x7f90d493f500, 
sector_num=3578496, remaining_sectors=2040, 
    qiov=0x7f8fbd6fae90) at block/qcow2.c:1171
#4  0x00007f90d24d4764 in bdrv_aligned_pwritev (bs=0x7f90d493f500, 
req=0x7f8fbd6facd0, offset=1832189952, bytes=1044480, 
    qiov=0x7f8fbd6fae90, flags=0) at block.c:3321
#5  0x00007f90d24d4d21 in bdrv_co_do_pwritev (bs=0x7f90d493f500, 
offset=1832189952, bytes=1044480, qiov=0x7f8fbd6fae90, 
    flags=0) at block.c:3447
#6  0x00007f90d24d3115 in bdrv_rw_co_entry (opaque=0x7f8fbd6fae10) at 
block.c:2710
#7  0x00007f90d24d31e7 in bdrv_prwv_co (bs=0x7f90d493f500, offset=1832189952, 
qiov=0x7f8fbd6fae90, is_write=true, flags=0)
    at block.c:2746
#8  0x00007f90d24d32eb in bdrv_rw_co (bs=0x7f90d493f500, sector_num=3578496, 

Additional info:
Storage Migration works well.

Comment 2 Amit Shah 2014-12-09 05:30:01 UTC
Stefan pointed to a commit that fixed this upstream:

commit 7ea2d269cb84ca7a2f4b7c3735634176f7c1dc35
Author: Alexey Kardashevskiy <aik@ozlabs.ru>
Date:   Thu Oct 9 13:50:46 2014 +1100

    block/migration: Disable cache invalidate for incoming migration
    
    When migrated using libvirt with "--copy-storage-all", at the end of
    migration there is race between NBD mirroring task trying to do flush
    and migration completion, both end up invalidating cache. Since qcow2
    driver does not handle this situation very well, random crashes happen.
    
    This disables the BDRV_O_INCOMING flag for the block device being migrated
    once the cache has been invalidated.

Does using the --copy-storage-all with libvirt reproduce this bug?

Comment 3 Stefan Hajnoczi 2014-12-10 13:58:23 UTC
It is not 100% clear whether this can be reproduced or whether the backport fixes the exact crash.  But we need the backport I have posted, so please see if it reproduces and then try the backport:

https://brewweb.devel.redhat.com/taskinfo?taskID=8358266

Comment 6 Jeff Nelson 2014-12-17 04:05:17 UTC
Fix included in qemu-kvm-rhev-2.1.2-17.el7

Comment 9 Shaolong Hu 2015-01-07 03:09:42 UTC
(In reply to Amit Shah from comment #2)
> Stefan pointed to a commit that fixed this upstream:
> 
> commit 7ea2d269cb84ca7a2f4b7c3735634176f7c1dc35
> Author: Alexey Kardashevskiy <aik@ozlabs.ru>
> Date:   Thu Oct 9 13:50:46 2014 +1100
> 
>     block/migration: Disable cache invalidate for incoming migration
>     
>     When migrated using libvirt with "--copy-storage-all", at the end of
>     migration there is race between NBD mirroring task trying to do flush
>     and migration completion, both end up invalidating cache. Since qcow2
>     driver does not handle this situation very well, random crashes happen.
>     
>     This disables the BDRV_O_INCOMING flag for the block device being
> migrated
>     once the cache has been invalidated.
> 
> Does using the --copy-storage-all with libvirt reproduce this bug?

Tried many times, did not reproduce, with burnintest(disk 100%, cpu 100%, ram 100%, network 100%), ping-pong migration, may be very small chance.

Comment 11 errata-xmlrpc 2015-03-05 09:59:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-0624.html


Note You need to log in before you can comment on or make changes to this bug.