| Summary: | migrate --copy-storage-all ignores disk full condition | ||
|---|---|---|---|
| Product: | [Community] Virtualization Tools | Reporter: | Kevin Hildebrand <ke3vin> |
| Component: | libvirt | Assignee: | Michal Privoznik <mprivozn> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | |
| Severity: | low | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | unspecified | CC: | crobinso, fielious, mprivozn, xen-maint |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2015-03-19 08:53:05 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
|
Description
Kevin Hildebrand
2011-02-17 19:22:34 UTC
Actually, on further investigation, the problem doesn't appear to be related to the disk-full condition after all. I moved /var to a large enough partition so that it was nowhere near full. Instead, it appears that migrate is fully transferring the data, but isn't writing it to disk at all. strace on qemu shows a continuous stream of reads, but no writes. According to the domain XML the image should be written to /export/vm, but the image file there is zero length. "df" doesn't show any growth in any local partitions either, so I don't know where the data is actually going. I just ran into this issue. Description of problem: virsh migrate --live --persistent --undefinesource --copy-storage-all --verbose --desturi qemu+ssh://DESTSYSTEM/system VMNAME On the destination I used qemu-img to create a raw file that had virtual sizes larger than the amount of free space. Migration will not show any errors and will succeed, but will leave a corrupted disk on the destination server. Version-Release number of selected component (if applicable): libvirtd (libvirt) 1.2.9 QEMU emulator version 2.1.2 (Debian 1:2.1+dfsg-11), Copyright (c) 2003-2008 Fabrice Bellard How reproducible: Every time Steps to Reproduce: 1. Create disk image with a virtual size greater than the amount of free storage on the destination server 2. Attempt to migrate VM with disk larger than the destination's free space 3. virsh migrate --live --persistent --undefinesource --copy-storage-all --verbose --desturi qemu+ssh://DESTSYSTEM/system VMNAME Actual results: Migration will complete successfully. The disk image will grow to fill all the free space, but since the destination is full the disk is left corrupted. Expected results: Migration should abort. Fixed upstream as:
commit 80c5f10e865cda0302519492f197cb020bd14a07
Author: Michal Privoznik <mprivozn>
AuthorDate: Tue Feb 10 16:25:27 2015 +0100
Commit: Michal Privoznik <mprivozn>
CommitDate: Thu Feb 19 14:12:38 2015 +0100
qemuMigrationDriveMirror: Listen to events
https://bugzilla.redhat.com/show_bug.cgi?id=1179678
When migrating with storage, libvirt iterates over domain disks and
instruct qemu to migrate the ones we are interested in (shared, RO and
source-less disks are skipped). The disks are migrated in series. No
new disk is transferred until the previous one hasn't been quiesced.
This is checked on the qemu monitor via 'query-jobs' command. If the
disk has been quiesced, it practically went from copying its content
to mirroring state, where all disk writes are mirrored to the other
side of migration too. Having said that, there's one inherent error in
the design. The monitor command we use reports only active jobs. So if
the job fails for whatever reason, we will not see it anymore in the
command output. And this can happen fairly simply: just try to migrate
a domain with storage. If the storage migration fails (e.g. due to
ENOSPC on the destination) we resume the host on the destination and
let it run on partly copied disk.
The proper fix is what even the comment in the code says: listen for
qemu events instead of polling. If storage migration changes state an
event is emitted and we can act accordingly: either consider disk
copied and continue the process, or consider disk mangled and abort
the migration.
Signed-off-by: Michal Privoznik <mprivozn>
commit 76c61cdca20c106960af033e5d0f5da70177af0f
Author: Michal Privoznik <mprivozn>
AuthorDate: Tue Feb 10 16:24:45 2015 +0100
Commit: Michal Privoznik <mprivozn>
CommitDate: Thu Feb 19 14:12:38 2015 +0100
qemuProcessHandleBlockJob: Take status into account
Upon BLOCK_JOB_COMPLETED event delivery, we check if the job has
completed (in qemuMonitorJSONHandleBlockJobImpl()). For better image,
the event looks something like this:
"timestamp": {"seconds": 1423582694, "microseconds": 372666}, "event":
"BLOCK_JOB_COMPLETED", "data": {"device": "drive-virtio-disk0", "len":
8412790784, "offset": 409993216, "speed": 8796093022207, "type":
"mirror", "error": "No space left on device"}}
If "len" does not equal "offset" it's considered an error, and we can
clearly see "error" field filled in. However, later in the event
processing this case was handled no differently to case of job being
aborted via separate API. It's time that we start differentiate these
two because of the future work.
Signed-off-by: Michal Privoznik <mprivozn>
commit c37943a0687a8fdb08e6eda8ae4b9f4f43f4f2ed
Author: Michal Privoznik <mprivozn>
AuthorDate: Tue Feb 10 15:32:59 2015 +0100
Commit: Michal Privoznik <mprivozn>
CommitDate: Thu Feb 19 14:12:38 2015 +0100
qemuProcessHandleBlockJob: Set disk->mirrorState more often
Currently, upon BLOCK_JOB_* event, disk->mirrorState is not updated
each time. The callback code handling the events checks if a blockjob
was started via our public APIs prior to setting the mirrorState.
However, some block jobs may be started internally (e.g. during
storage migration), in which case we don't bother with setting
disk->mirror (there's nothing we can set it to anyway), or other
fields. But it will come handy if we update the mirrorState in these
cases too. The event wasn't delivered just for fun - we've started the
job after all.
So, in this commit, the mirrorState is set to whatever job status
we've obtained. Of course, there are some actions on some statuses
that we want to perform. But instead of if {} else if {} else {} ...
enumeration, let's move to switch().
Signed-off-by: Michal Privoznik <mprivozn>
v1.2.12-155-g80c5f10
It's fixed in v1.2.13.
|