Bug 751631
Summary: | Default block cache mode for migration | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Dor Laor <dlaor> |
Component: | libvirt | Assignee: | Jiri Denemark <jdenemar> |
Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> |
Severity: | unspecified | Docs Contact: | |
Priority: | high | ||
Version: | 6.2 | CC: | acathrow, ajia, dallan, dyuan, juzhang, michen, mzhan, peterx, quintela, rwu, tburke, veillard, wdai, weizhan |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | libvirt-0.9.10-4.el6 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2012-06-20 06:36:09 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Dor Laor
2011-11-06 15:28:26 UTC
What would the matrix look like? I presume for block devices : only allow if cache=noe for file systems : We can guess at a few that should work - gfs2, gpfs, gluster But can we define a definite list now or should this be a config option for w whitelist of filesystems? Also this is flagged for 6.2 but it seems too big a change for that release. I guess we will need to be careful and provide a way to bypass, because existing users for which this "worked fine until now" will get rightfully annoyed if their guest are stuck the day they need to migrate. But we need a way to raise awareness of the risk. Maybe provide a force flag to bypass the check, and possibly set up cache='none' on the migrated guest automatically if they force the migration to avoid the issue and of course notify the user then. A priori if a guest has been migrated once it's likely to be migrated again in the future. Ideally though we should not destroy performances for a relatively rare event and the best would be to be able to switch off caching dynamically, flush on the host and then proceed with the migration. I assume it's not possible now but that would be the right way to go. Daniel I do think we need to protect against it, no matter how rare the issue might be, that's exactly the nature of data integrity issues. cache==none actually is the preferred way for performance too so it won't restrict users. It is possible to make sure we flush all of the host page cache but it will need qemu involvement to make sure it gets triggered exactly on the downtime period during migration and it might cause long stalls of the IO in case the cache was big. So at the end of the day, my recommendation is not to allow it at all and potentially add some override flag for dummies. Verify pass on non-cluster filesystem libvirt-0.9.10-3.el6.x86_64 qemu-kvm-0.12.1.2-2.232.el6.x86_64 kernel-2.6.32-225.el6.x86_64 Start a guest with cache=writeback ... <disk type='file' device='disk'> <driver name='qemu' type='raw' cache='writeback'/> <source file='/mnt/kvm-rhel6u2-x86_64-new.img'> <seclabel relabel='no'/> </source> <target dev='hda' bus='ide'/> </disk> ... Then do migration, libvirt will report error # virsh migrate --live mig qemu+ssh://{target ip}/system error: Unsafe migration: Migration may lead to data corruption if disks use cache != none Migration succeeds with --unsafe option with cache=writeback # virsh migrate --live mig qemu+ssh://{target ip}/system --unsafe succeed without error Migration succeeds with cache=none without --unsafe ... <disk type='file' device='disk'> <driver name='qemu' type='raw' cache='none'/> <source file='/mnt/kvm-rhel6u2-x86_64-new.img'> <seclabel relabel='no'/> </source> <target dev='hda' bus='ide'/> </disk> ... # virsh migrate --live mig qemu+ssh://{target ip}/system succeed without error For cluster filesystem, how can I build environment and test? Retest on qemu-kvm-0.12.1.2-2.232.el6.x86_64 kernel-2.6.32-225.el6.x86_64 libvirt-0.9.10-3.el6.x86_64 with 2 disks and one of them is cdrom with readonly mode <disk type='file' device='disk'> <driver name='qemu' type='qcow2' cache='none'/> <source file='/mnt/qcow2.img'> <seclabel relabel='no'/> </source> <target dev='hda' bus='ide'/> <alias name='ide0-0-0'/> <address type='drive' controller='0' bus='0' unit='0'/> </disk> <disk type='file' device='cdrom'> <driver name='qemu' type='raw'/> <source file='/var/lib/libvirt/images/cdrom.img' startupPolicy='optional'/> <target dev='hdc' bus='ide'/> <readonly/> <alias name='ide0-1-0'/> <address type='drive' controller='0' bus='1' unit='0'/> </disk> Migration will report error error: Unsafe migration: Migration may lead to data corruption if disks use cache != none So I think it still has bug on it The following additional patch fixes the above issue: http://post-office.corp.redhat.com/archives/rhvirt-patches/2012-March/msg00261.html Verify pass on comment 11 scenario, it can migrate succeed without error. Still left the test on cluster filesystem, will wait for dor's reply on how to test it. Verify pass according to Dor's suggestion, only need to test on non-cluster filesystem Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2012-0748.html |