Bug 867412 - libvirt fails to clear async job when p2p migration fails early
libvirt fails to clear async job when p2p migration fails early
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: libvirt (Show other bugs)
6.4
All Linux
low Severity medium
: rc
: ---
Assigned To: Jiri Denemark
Virtualization Bugs
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-10-17 09:16 EDT by Jiri Denemark
Modified: 2013-02-21 02:10 EST (History)
7 users (show)

See Also:
Fixed In Version: libvirt-0.10.2-5.el6
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-02-21 02:10:28 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Jiri Denemark 2012-10-17 09:16:23 EDT
Description of problem:

When p2p migration fails early because qemuMigrationIsAllowed or
qemuMigrationIsSafe say migration should be cancelled, we fail to clear
the migration-out async job. As a result of that, further APIs called
for the same domain may fail with Timed out during operation: cannot
acquire state change lock.

Version-Release number of selected component (if applicable):

libvirt-0.10.2-4.el6, introduced upstream in 0.9.5

How reproducible:

100%

Steps to Reproduce:
1. create and start a domain with disk on NFS and cache != none
2. virsh migrate --p2p $URI $DOM
3. virsh migrate --p2p $URI $DOM
  
Actual results:

Step 2 correctly results in:

error: Unsafe migration: Migration may lead to data corruption if disks use cache != none

Steps 3 will timeout after 30 seconds and report:

error: Timed out during operation: cannot acquire state change lock


Expected results:

No matter how many time we try to migrate the domain, it should still report error: Unsafe migration: Migration may lead to data corruption if disks use cache != none

Additional info:
Comment 1 Jiri Denemark 2012-10-17 09:17:47 EDT
Patch sent upstream: https://www.redhat.com/archives/libvir-list/2012-October/msg00891.html
Comment 2 zhpeng 2012-10-17 22:48:50 EDT
I can reproduce this with: libvirt-0.10.2-4.el6.x86_64

virsh # migrate aaa --p2p qemu+ssh://10.66.7.161/system --unsafe 
error: Timed out during operation: cannot acquire state change lock
Comment 3 Jiri Denemark 2012-10-18 05:17:58 EDT
Fixed upstream by v0.10.2-191-g837993d":

commit 837993d845a32bb222959a84d1c03a0c47f785be
Author: Jiri Denemark <jdenemar@redhat.com>
Date:   Wed Oct 17 14:08:17 2012 +0200

    qemu: Clear async job when p2p migration fails early
    
    When p2p migration fails early because qemuMigrationIsAllowed or
    qemuMigrationIsSafe say migration should be cancelled, we fail to clear
    the migration-out async job. As a result of that, further APIs called
    for the same domain may fail with Timed out during operation: cannot
    acquire state change lock.
    
    Reported by Guido Winkelmann.
Comment 6 zhpeng 2012-10-24 02:23:37 EDT
Test with:


# virsh dumpxml rhel63q
...
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2' cache='writeback'/>
      <source file='/virt/rhel63q.img'/>
      <target dev='vda' bus='virtio'/>
      <alias name='virtio-disk0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </disk>
...

# virsh migrate rhel63q --p2p qemu+ssh://10.66.7.161/system --unsafe

and  "default", "writethrough", "directsync" result are same.


So it's verified.
Comment 7 zhpeng 2012-11-21 22:41:26 EST
correction:

 "default", "none", "writethrough", "writeback", and "unsafe" works well
our qemu not support "directsync"  yet.
Comment 8 errata-xmlrpc 2013-02-21 02:10:28 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2013-0276.html

Note You need to log in before you can comment on or make changes to this bug.