Changing the subject as this isn't an RFE from libvirt's POV. Our current migration safety check is simply buggy wrt newer QEMU.
*** Bug 1660576 has been marked as a duplicate of this bug. ***
According to the (private) description this was implemented by qemu commit dd577a26ff03b6. Stefan, could you please elaborate how we can detect that given qemu does support dropping of the cache? From the other linked BZs I've noticed only 'file.x-check-cache-dropped=on' which is experimental and thus we can't rely that it will not vanish sooner or later.
(In reply to Peter Krempa from comment #3) > According to the (private) description this was implemented by qemu commit > dd577a26ff03b6. Stefan, could you please elaborate how we can detect that > given qemu does support dropping of the cache? > > From the other linked BZs I've noticed only 'file.x-check-cache-dropped=on' > which is experimental and thus we can't rely that it will not vanish sooner > or later. There is no externally visible property, so: if (qemuCaps->version >= 3000000) { virQEMUCapsSet(qemuCaps, QEMU_CAPS_MIGRATE_DISK_WITH_PAGE_CACHE); } If this is unacceptable then we may need to add an auxiliary feature set query to QEMU so that features like this one can be reported. They are rare but sometimes there just isn't an externally-visible way of detecting a feature.
We were discussing something similar with Markus some time ago. The idea was to add some fields into the schema which can then be queried using query-qmp-schema as both qemu and libvirt already have infrastructure for that. Such a machinery could be also useful for detecting fixes to code which are even harder to detect in most cases. The case at that time was a fix required for istantiating a qcow2 format node when the format was also specified in the image itself which produced an error. Generally we prefer detectable fixes as it's less likely to break with downstreams where the version numbers don't matter much in relation with the actual code.
(In reply to Peter Krempa from comment #5) > We were discussing something similar with Markus some time ago. The idea was > to add some fields into the schema which can then be queried using > query-qmp-schema as both qemu and libvirt already have infrastructure for > that. > > Such a machinery could be also useful for detecting fixes to code which are > even harder to detect in most cases. The case at that time was a fix > required for istantiating a qcow2 format node when the format was also > specified in the image itself which produced an error. > > Generally we prefer detectable fixes as it's less likely to break with > downstreams where the version numbers don't matter much in relation with the > actual code. Absolutely, we need to add this ability to QEMU. I guess we're in an okay position: users can already take advantage of migration by specifying the --unsafe option. That's a workaround, but eager users can make use of it. I will try adding proper schema support to QEMU and you can wait with the libvirt implementation of this BZ until that becomes available.
I have reduced the priority from "high" to "medium" because this will require a QEMU change but a workaround is available to libvirt users in the meantime.
(In reply to Stefan Hajnoczi from comment #6) > I guess we're in an okay position: users can already take advantage of > migration by specifying the --unsafe option. That's a workaround, but eager > users can make use of it. That's not a great workaround for users using a mgmt app - only for virsh. The mgmt apps won't usually let users specify arbitrary extra API flags to pass to virDomainMigrate, and an mgmt app is unlikely to want to add UNSAFE flag itself, as that disables other potentially still relevant safety checks.
(In reply to Daniel Berrange from comment #8) > (In reply to Stefan Hajnoczi from comment #6) > > I guess we're in an okay position: users can already take advantage of > > migration by specifying the --unsafe option. That's a workaround, but eager > > users can make use of it. > > That's not a great workaround for users using a mgmt app - only for virsh. > The mgmt apps won't usually let users specify arbitrary extra API flags to > pass to virDomainMigrate, and an mgmt app is unlikely to want to add UNSAFE > flag itself, as that disables other potentially still relevant safety checks. If it's urgent then libvirt could implement the version number check I posted above. Either way, I'll try to extend QEMU's schema so we can test for features like this properly in the future.
This was implemented in QEMU via commit f357fcd890a8d6ced6d261338b859a41414561e9 and made detectable via commit dd577a26ff03b6829721b1ffbbf9e7c411b72378. We would like to have this in RHEL-AV-8.1. Is that feasible?
Patches sent upstream for review: https://www.redhat.com/archives/libvir-list/2019-August/msg00472.html
Fixed upstream by: commit 4748f8df29e3ac4fe321ee60c1738cbfe378295e Refs: v5.6.0-172-g4748f8df29 Author: Jiri Denemark <jdenemar> AuthorDate: Tue Aug 13 13:16:20 2019 +0200 Commit: Jiri Denemark <jdenemar> CommitDate: Wed Aug 14 09:36:43 2019 +0200 qemu: Clarify error message in qemuMigrationSrcIsSafe The original message was logically incorrect: cache != none or cache != directsync is always true. But even replacing "or" with "and" doesn't make it more readable for humans. Signed-off-by: Jiri Denemark <jdenemar> Acked-By: Peter Krempa <pkrempa> commit 598ec0db685eb63c99fcb7905ffd73bb714f1cf4 Refs: v5.6.0-173-g598ec0db68 Author: Jiri Denemark <jdenemar> AuthorDate: Tue Aug 13 15:17:36 2019 +0200 Commit: Jiri Denemark <jdenemar> CommitDate: Wed Aug 14 09:36:43 2019 +0200 qemu: Check for drop-cache capability QEMU 4.0.0 and newer automatically drops caches at the end of migration. Let's check for this capability so that we can allow migration when disk cache is turned on. Signed-off-by: Jiri Denemark <jdenemar> Acked-By: Peter Krempa <pkrempa> commit 4514abbd41571b4d4f4e3b6f4cb950978433dd10 Refs: v5.6.0-174-g4514abbd41 Author: Jiri Denemark <jdenemar> AuthorDate: Tue Aug 13 15:17:53 2019 +0200 Commit: Jiri Denemark <jdenemar> CommitDate: Wed Aug 14 09:36:43 2019 +0200 qemu: Allow migration with disk cache on When QEMU supports flushing caches at the end of migration, we can safely allow migration even if disk/driver/@cache is not none nor directsync. Signed-off-by: Jiri Denemark <jdenemar> Acked-By: Peter Krempa <pkrempa>
Test with qemu-kvm-4.1.0-4 and libvirt-5.6.0-2.virtcov.el8.x86_64. Covered disk type=file|block, migration can succeed with any disk cache mode Test with qemu-kvm-3.1.0-30 and libvirt-5.6.0-2.virtcov.el8.x86_64. Covered disk type=file|block, migration fails as expected when disk cache is not none|directsync
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:3723