Bug 916067
| Summary: | when cancel the migration with ctrl+c during block migration(full disk copy or incremental disk copy), then migration again will cause domain destroyed | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | hongming <honzhang> | ||||
| Component: | qemu-kvm | Assignee: | Paolo Bonzini <pbonzini> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | medium | ||||||
| Version: | 6.4 | CC: | areis, chayang, cwei, dgilbert, dyuan, juzhang, knoel, mkenneth, mzhan, neil, pbonzini, qzhang, rbalakri, rpacheco, shu, virt-maint, weizhan, xwei, ydu, zpeng | ||||
| Target Milestone: | rc | ||||||
| Target Release: | --- | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | qemu-kvm-0.12.1.2-2.467.el6 | Doc Type: | Bug Fix | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2015-07-22 06:02:55 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
We don't support migration with block migration, but I'll try to reproduce upstream and come back. Reproduced with QEMU. $ /usr/libexec/qemu-kvm -drive file=/vm/test.img,if=none,id=drive-virtio-disk0,format=raw,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -S -vnc :0 -monitor stdio $ /usr/libexec/qemu-kvm-rhel -drive file=/vm/test2.img,if=none,id=drive-virtio-disk0,format=raw,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -S -vnc :1 -incoming tcp:localhost:12345 -monitor stdio on the first instance: (qemu) migrate -d -b tcp:localhost:12345 (qemu) migrate_cancel (qemu) migrate -d -b tcp:localhost:12345 (qemu) qemu-system-x86_64: /home/pbonzini/work/redhat-git/qemu-kvm-rhel6/block.c:3915: bdrv_set_dirty_tracking: Assertion `!bs->dirty_bitmap' failed. Aborted Works upstream. I'm leaving it open in case OpenStack starts using migration with non-shared storage. It would be nice to see this working. We're using block migration for creating VM backups with zero downtime (simply "cont" the original VM after migration completes), but this throws a wrench into those plans. I can confirm that it works with upstream qemu as well. I was able to narrow down the issue. bad: qemu-kvm-0.12.1.2-2.295.el6_3.8.x86_64.rpm good: qemu-kvm-0.12.1.2-2.295.el6_3.2.x86_64.rpm So it broke somewhere between those two versions. Looking at the changelog, I suspect the issue is in here somewhere: * Mon Oct 15 2012 Michal Novotny <minovotn> - qemu-kvm-0.12.1.2-2.295 .el6_3.4 - kvm-bitmap-add-a-generic-bitmap-and-bitops-library.patch [bz#852458] - kvm-bitops-fix-test_and_change_bit.patch [bz#852458] - kvm-add-hierarchical-bitmap-data-type-and-test-cases.patch [bz#852458] - kvm-block-implement-dirty-bitmap-using-HBitmap.patch [bz#852458] - kvm-block-return-count-of-dirty-sectors-not-chunks.patch [bz#852458] - kvm-block-allow-customizing-the-granularity-of-the-dirty.patch [bz#852458] - kvm-mirror-use-target-cluster-size-as-granularity.patch [bz#852458] - kvm-virtio-console-Fix-failure-on-unconnected-pty.patch [bz#861049] - Resolves: bz#852458 ( copy cluster-sized blocks to the target of live storage migration) - Resolves: bz#861049 (Fedora 16 and 17 guests hang during boot) @minovotn, any ideas? *** Bug 983415 has been marked as a duplicate of this bug. *** Based on https://rhn.redhat.com/rhn/errata/details/Details.do?eid=25114 I tested this out with qemu-kvm-0.12.1.2-2.415.el6_5.3. While the behaviour has changed, it unfortunately has regressed further. In the newer version of qemu-kvm, the first migration will complete, but the source VM is frozen. When I say frozen, I mean the source VM uses 100% CPU time and does not respond over the monitor interface (not even a prompt is given) and it must be killed using kill. Since the first migration rendered the source VM inoperable, I was unable to test a second migration. *** Bug 1102566 has been marked as a duplicate of this bug. *** Fix included in qemu-kvm-0.12.1.2-2.467.el6 Reproduce on qemu-kvm-rhev-0.12.1.2-2.460.el6.x86_64 [root@dhcp-9-242 staf-kvm-devel]# virsh migrate --verbose --live --persistent 2k12r2 qemu+ssh://10.66.8.191/system --copy-storage-inc root.8.191's password: Migration: [ 6 %]^Cerror: operation aborted: migration job: canceled by client [root@dhcp-9-242 staf-kvm-devel]# virsh migrate --verbose --live --persistent 2k12r2 qemu+ssh://10.66.8.191/system --copy-storage-inc root.8.191's password: error: Unable to read from monitor: Connection reset by peer [root@dhcp-9-242 staf-kvm-devel]# virsh list --all Id Name State ---------------------------------------------------- - 2k12r2 shut off Verify on qemu-kvm-rhev-0.12.1.2-2.469.el6.x86_64 [root@dhcp-9-242 staf-kvm-devel]# virsh list --all Id Name State ---------------------------------------------------- - 2k12r2 shut off [root@dhcp-9-242 staf-kvm-devel]# virsh start 2k12r2 Domain 2k12r2 started [root@dhcp-9-242 staf-kvm-devel]# virsh migrate --verbose --live --persistent 2k12r2 qemu+ssh://10.66.8.191/system --copy-storage-inc root.8.191's password: Migration: [ 6 %]^Cerror: operation aborted: migration job: canceled by client [root@dhcp-9-242 staf-kvm-devel]# virsh migrate --verbose --live --persistent 2k12r2 qemu+ssh://10.66.8.191/system --copy-storage-inc root.8.191's password: Migration: [100 %] [root@dhcp-9-242 staf-kvm-devel]# Move to VERIFIED. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2015-1275.html |
Created attachment 703301 [details] libvirtd_debug.log Description of problem: when cancel the migration with ctrl+c during block migration(full disk copy or incremental disk copy), then migration again will cause domain destroyed Version-Release number of selected component (if applicable): libvirt-0.10.2-18.el6.x86_64 qemu-kvm-rhev-0.12.1.2-2.355.el6.x86_64 How reproducible: 100% Steps to Reproduce: 1. # virsh start rhel6.3 Domain rhel6.3 started 2. # virsh migrate --verbose --live --persistent rhel6.3 qemu+ssh://10.66.6.76/system --copy-storage-inc root.6.76's password: Migration: [ 9 %]^Cerror: operation aborted: migration job: canceled by client 3. # virsh list --all Id Name State ---------------------------------------------------- 28 rhel6.3 running 4. # virsh migrate --verbose --live --persistent rhel6.3 qemu+ssh://10.66.6.76/system --copy-storage-inc root.6.76's password: error: Unable to read from monitor: Connection reset by peer 5. # virsh list --all Id Name State ---------------------------------------------------- - rhel6.3 shut off The error log as follows 2013-02-21 08:13:24.962+0000: 9552: error : qemuMonitorIORead:513 : Unable to read from monitor: Connection reset by peer 2013-02-21 08:13:24.962+0000: 9552: debug : qemuMonitorIO:646 : Error on monitor Unable to read from monitor: Connection reset by peer 2013-02-21 08:13:24.962+0000: 9552: debug : virEventPollUpdateHandle:146 : EVENT_POLL_UPDATE_HANDLE: watch=40 events=12 2013-02-21 08:13:24.962+0000: 9552: debug : virEventPollInterruptLocked:697 : Skip interrupt, 1 -747411360 2013-02-21 08:13:24.962+0000: 9552: debug : virObjectUnref:135 : OBJECT_UNREF: obj=0x7f4ab801a640 2013-02-21 08:13:24.962+0000: 9552: debug : qemuMonitorIO:680 : Triggering error callback 2013-02-21 08:13:24.962+0000: 9552: debug : qemuProcessHandleMonitorError:342 : Received error on 0x7f4ab810bae0 'rhel6.3' Actual results: As above Expected results: domain works fine after cancel the migration Additional info: