Red Hat Bugzilla – Bug 867412
libvirt fails to clear async job when p2p migration fails early
Last modified: 2013-02-21 02:10:28 EST
Description of problem: When p2p migration fails early because qemuMigrationIsAllowed or qemuMigrationIsSafe say migration should be cancelled, we fail to clear the migration-out async job. As a result of that, further APIs called for the same domain may fail with Timed out during operation: cannot acquire state change lock. Version-Release number of selected component (if applicable): libvirt-0.10.2-4.el6, introduced upstream in 0.9.5 How reproducible: 100% Steps to Reproduce: 1. create and start a domain with disk on NFS and cache != none 2. virsh migrate --p2p $URI $DOM 3. virsh migrate --p2p $URI $DOM Actual results: Step 2 correctly results in: error: Unsafe migration: Migration may lead to data corruption if disks use cache != none Steps 3 will timeout after 30 seconds and report: error: Timed out during operation: cannot acquire state change lock Expected results: No matter how many time we try to migrate the domain, it should still report error: Unsafe migration: Migration may lead to data corruption if disks use cache != none Additional info:
Patch sent upstream: https://www.redhat.com/archives/libvir-list/2012-October/msg00891.html
I can reproduce this with: libvirt-0.10.2-4.el6.x86_64 virsh # migrate aaa --p2p qemu+ssh://10.66.7.161/system --unsafe error: Timed out during operation: cannot acquire state change lock
Fixed upstream by v0.10.2-191-g837993d": commit 837993d845a32bb222959a84d1c03a0c47f785be Author: Jiri Denemark <jdenemar@redhat.com> Date: Wed Oct 17 14:08:17 2012 +0200 qemu: Clear async job when p2p migration fails early When p2p migration fails early because qemuMigrationIsAllowed or qemuMigrationIsSafe say migration should be cancelled, we fail to clear the migration-out async job. As a result of that, further APIs called for the same domain may fail with Timed out during operation: cannot acquire state change lock. Reported by Guido Winkelmann.
In POST: http://post-office.corp.redhat.com/archives/rhvirt-patches/2012-October/msg00920.html
Test with: # virsh dumpxml rhel63q ... <disk type='file' device='disk'> <driver name='qemu' type='qcow2' cache='writeback'/> <source file='/virt/rhel63q.img'/> <target dev='vda' bus='virtio'/> <alias name='virtio-disk0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> </disk> ... # virsh migrate rhel63q --p2p qemu+ssh://10.66.7.161/system --unsafe and "default", "writethrough", "directsync" result are same. So it's verified.
correction: "default", "none", "writethrough", "writeback", and "unsafe" works well our qemu not support "directsync" yet.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2013-0276.html