Note: This bug is displayed in read-only format because
the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
DescriptionDaniel Berrangé
2017-09-22 10:02:46 UTC
Description of problem:
When triggering a migration across hosts libvirt does not do any significant
validation of the storage hosting the VM's disk images. As long as the same
disk paths are visible on both hosts, it'll let the migration run.
This can lead to trouble. For example if someone with non-shared storage,
pre-creates the disk image on the target host with right size but then
triggers a migration *without* block copy enabled, libvirt will happily
let the migration run. When the CPUs start on the target host and the
guest does I/O it'll find its disk image is all-zeros which usually
leads to unhappiness! What's particularly bad is that the user who
triggered migration won't realize the mistake they made until migration
has already finished and the source guest destroyed. So only option to
recover is to reboot from scratch.
Validating correct storage setup is quite hard to do to cover all scenarios
correctly. For example, with non-shared storage, it is possible the user
has manually copied the disk image across ahead of time - this is fine
if the disk image is read-only since there's no I/O to synchronize.
Identifying whether block device backed storage is shared or not is
non-trivial.
We could however do some simple checks to identify common mistakes
with filesystem based disk backends. ie, if the virtual disk is
writable, and is stored on an ext[2,3,4] / xfs / fat filesystem,
then there's likely no way migration can do the right thing if
block-copy is not requested. We should report an error here rather
than let users shoot themselves in the foot. This would be
particularly useful for virt-manager users, since virt-manager
always does migration without block-copy no matter what.
While looking at the code, we already report error:
static bool
qemuMigrationSrcIsSafe(virDomainDefPtr def,
size_t nmigrate_disks,
const char **migrate_disks,
unsigned int flags)
{
bool storagemigration = flags & (VIR_MIGRATE_NON_SHARED_DISK |
VIR_MIGRATE_NON_SHARED_INC);
size_t i;
int rc;
for (i = 0; i < def->ndisks; i++) {
virDomainDiskDefPtr disk = def->disks[i];
const char *src = virDomainDiskGetSource(disk);
/* Our code elsewhere guarantees shared disks are either readonly (in
* which case cache mode doesn't matter) or used with cache=none or used with cache=directsync */
if (virStorageSourceIsEmpty(disk->src) ||
disk->src->readonly ||
disk->src->shared ||
disk->cachemode == VIR_DOMAIN_DISK_CACHE_DISABLE ||
disk->cachemode == VIR_DOMAIN_DISK_CACHE_DIRECTSYNC)
continue;
/* disks which are migrated by qemu are safe too */
if (storagemigration &&
qemuMigrationAnyCopyDisk(disk, nmigrate_disks, migrate_disks))
continue;
if (virDomainDiskGetType(disk) == VIR_STORAGE_TYPE_FILE) {
if ((rc = virFileIsSharedFS(src)) < 0)
return false;
else if (rc == 0)
continue;
if ((rc = virStorageFileIsClusterFS(src)) < 0)
return false;
else if (rc == 1)
continue;
} else if (disk->src->type == VIR_STORAGE_TYPE_NETWORK &&
disk->src->protocol == VIR_STORAGE_NET_PROTOCOL_RBD) {
continue;
}
virReportError(VIR_ERR_MIGRATE_UNSAFE, "%s",
_("Migration may lead to data corruption if disks"
" use cache != none or cache != directsync"));
return false;
}
return true;
}
Of course this can be overridden by --unsafe flag. Isn't this enough?
Ah, that's more than I thought we did, but there's a couple of issues there
- If virFileIsSharedFS() returns 0 (ie a local ext3 FS), then we jump to next loop iteration. We should check if block migration is requested here, and raise error if it is not.
- If the disk is marked <shareable/> then we don't even bother with the IsSharedFS() check, which is again a problem if that detects a local FS with no block migration eanbled.
I've just pushed the patch upstream:
commit ed11e9cd95bd9cae6cdfab14ae0936930bbb63e6
Author: Michal Privoznik <mprivozn>
AuthorDate: Mon Feb 26 09:35:25 2018 +0100
Commit: Michal Privoznik <mprivozn>
CommitDate: Mon Feb 26 11:32:05 2018 +0100
qemuMigrationSrcIsSafe: Check local storage more thoroughly
https://bugzilla.redhat.com/show_bug.cgi?id=1494454
If a domain disk is stored on local filesystem (e.g. ext4) but is
not being migrated it is very likely that domain is not able to
run on destination. Regardless of share/cache mode.
Signed-off-by: Michal Privoznik <mprivozn>
Reviewed-by: Daniel P. Berrangé <berrange>
v4.1.0-rc1-1-ged11e9cd9
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://access.redhat.com/errata/RHSA-2018:3113