1171552 – Storage vm migration failed when running BurnInTes

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1171552 - Storage vm migration failed when running BurnInTes

Summary: Storage vm migration failed when running BurnInTes

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	qemu-kvm-rhev
Sub Component:
Version:	7.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Stefan Hajnoczi
QA Contact:	Virtualization Bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2014-12-08 05:07 UTC by juzhang
Modified:	2015-03-05 09:59 UTC (History)
CC List:	9 users (show)
Fixed In Version:	qemu-kvm-rhev-2.1.2-17.el7
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2015-03-05 09:59:29 UTC
Target Upstream Version:
Embargoed:
Flags:	shu: needinfo-

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2015:0624	0	normal	SHIPPED_LIVE	Important: qemu-kvm-rhev security, bug fix, and enhancement update	2015-03-05 14:37:36 UTC

Description juzhang 2014-12-08 05:07:51 UTC

Description of problem:
https://lists.gnu.org/archive/html/qemu-devel/2014-11/msg03599.html 

Version-Release number of selected component (if applicable):


How reproducible:
N/A

Steps to Reproduce:
1.virsh migrate --live --copy-storage-all rhel7 qemu+ssh://10.66.32.106/system

2. win7-64 guest running burnInTest(local disk scan test)

3. Tried few times

Actual results:


Expected results:
Program received signal SIGSEGV, Segmentation fault.
0x00007f90d250db24 in get_cluster_table (bs=0x7f90d493f500, offset=1832189952, 
new_l2_table=0x7f8fbd6faa88, 
    new_l2_index=0x7f8fbd6faaa0) at block/qcow2-cluster.c:573
573         l2_offset = s->l1_table[l1_index] & L1E_OFFSET_MASK;
(gdb) bt
#0  0x00007f90d250db24 in get_cluster_table (bs=0x7f90d493f500, 
offset=1832189952, new_l2_table=0x7f8fbd6faa88, 
    new_l2_index=0x7f8fbd6faaa0) at block/qcow2-cluster.c:573
#1  0x00007f90d250e577 in handle_copied (bs=0x7f90d493f500, 
guest_offset=1832189952, host_offset=0x7f8fbd6fab18, 
    bytes=0x7f8fbd6fab20, m=0x7f8fbd6fabc8) at block/qcow2-cluster.c:927
#2  0x00007f90d250ef45 in qcow2_alloc_cluster_offset (bs=0x7f90d493f500, 
offset=1832189952, num=0x7f8fbd6fabfc, 
    host_offset=0x7f8fbd6fabc0, m=0x7f8fbd6fabc8) at block/qcow2-cluster.c:1269
#3  0x00007f90d250445f in qcow2_co_writev (bs=0x7f90d493f500, 
sector_num=3578496, remaining_sectors=2040, 
    qiov=0x7f8fbd6fae90) at block/qcow2.c:1171
#4  0x00007f90d24d4764 in bdrv_aligned_pwritev (bs=0x7f90d493f500, 
req=0x7f8fbd6facd0, offset=1832189952, bytes=1044480, 
    qiov=0x7f8fbd6fae90, flags=0) at block.c:3321
#5  0x00007f90d24d4d21 in bdrv_co_do_pwritev (bs=0x7f90d493f500, 
offset=1832189952, bytes=1044480, qiov=0x7f8fbd6fae90, 
    flags=0) at block.c:3447
#6  0x00007f90d24d3115 in bdrv_rw_co_entry (opaque=0x7f8fbd6fae10) at 
block.c:2710
#7  0x00007f90d24d31e7 in bdrv_prwv_co (bs=0x7f90d493f500, offset=1832189952, 
qiov=0x7f8fbd6fae90, is_write=true, flags=0)
    at block.c:2746
#8  0x00007f90d24d32eb in bdrv_rw_co (bs=0x7f90d493f500, sector_num=3578496, 

Additional info:
Storage Migration works well.

Comment 2 Amit Shah 2014-12-09 05:30:01 UTC

Stefan pointed to a commit that fixed this upstream:

commit 7ea2d269cb84ca7a2f4b7c3735634176f7c1dc35
Author: Alexey Kardashevskiy <aik>
Date:   Thu Oct 9 13:50:46 2014 +1100

    block/migration: Disable cache invalidate for incoming migration
    
    When migrated using libvirt with "--copy-storage-all", at the end of
    migration there is race between NBD mirroring task trying to do flush
    and migration completion, both end up invalidating cache. Since qcow2
    driver does not handle this situation very well, random crashes happen.
    
    This disables the BDRV_O_INCOMING flag for the block device being migrated
    once the cache has been invalidated.

Does using the --copy-storage-all with libvirt reproduce this bug?

Comment 3 Stefan Hajnoczi 2014-12-10 13:58:23 UTC

It is not 100% clear whether this can be reproduced or whether the backport fixes the exact crash.  But we need the backport I have posted, so please see if it reproduces and then try the backport:

https://brewweb.devel.redhat.com/taskinfo?taskID=8358266

Comment 6 Jeff Nelson 2014-12-17 04:05:17 UTC

Fix included in qemu-kvm-rhev-2.1.2-17.el7

Comment 9 Shaolong Hu 2015-01-07 03:09:42 UTC

(In reply to Amit Shah from comment #2)
> Stefan pointed to a commit that fixed this upstream:
> 
> commit 7ea2d269cb84ca7a2f4b7c3735634176f7c1dc35
> Author: Alexey Kardashevskiy <aik>
> Date:   Thu Oct 9 13:50:46 2014 +1100
> 
>     block/migration: Disable cache invalidate for incoming migration
>     
>     When migrated using libvirt with "--copy-storage-all", at the end of
>     migration there is race between NBD mirroring task trying to do flush
>     and migration completion, both end up invalidating cache. Since qcow2
>     driver does not handle this situation very well, random crashes happen.
>     
>     This disables the BDRV_O_INCOMING flag for the block device being
> migrated
>     once the cache has been invalidated.
> 
> Does using the --copy-storage-all with libvirt reproduce this bug?

Tried many times, did not reproduce, with burnintest(disk 100%, cpu 100%, ram 100%, network 100%), ping-pong migration, may be very small chance.

Comment 11 errata-xmlrpc 2015-03-05 09:59:29 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-0624.html

Note You need to log in before you can comment on or make changes to this bug.