Bug 1660575 - Don't block migration with writeback cache mode on new enough QEMU
Summary: Don't block migration with writeback cache mode on new enough QEMU
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux Advanced Virtualization
Classification: Red Hat
Component: libvirt
Version: 8.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: rc
: 8.0
Assignee: Jiri Denemark
QA Contact: Han Han
URL:
Whiteboard:
: 1660576 (view as bug list)
Depends On: 1568285 1653542
Blocks: 1623566 1672519
TreeView+ depends on / blocked
 
Reported: 2018-12-18 17:00 UTC by Ademar Reis
Modified: 2019-11-06 07:12 UTC (History)
15 users (show)

Fixed In Version: libvirt-5.6.0-2.el8
Doc Type: Enhancement
Doc Text:
Clone Of: 1653542
Environment:
Last Closed: 2019-11-06 07:12:13 UTC
Type: Feature Request
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:3723 0 None None None 2019-11-06 07:12:37 UTC

Comment 1 Daniel Berrangé 2019-01-11 10:25:44 UTC
Changing the subject as this isn't an RFE from libvirt's POV. Our current migration safety check is simply buggy wrt newer QEMU.

Comment 2 Martin Tessun 2019-02-05 08:30:57 UTC
*** Bug 1660576 has been marked as a duplicate of this bug. ***

Comment 3 Peter Krempa 2019-02-11 11:57:21 UTC
According to the (private) description this was implemented by qemu commit dd577a26ff03b6. Stefan, could you please elaborate how we can detect that given qemu does support dropping of the cache?

From the other linked BZs I've noticed only 'file.x-check-cache-dropped=on' which is experimental and thus we can't rely that it will not vanish sooner or later.

Comment 4 Stefan Hajnoczi 2019-02-12 06:09:10 UTC
(In reply to Peter Krempa from comment #3)
> According to the (private) description this was implemented by qemu commit
> dd577a26ff03b6. Stefan, could you please elaborate how we can detect that
> given qemu does support dropping of the cache?
> 
> From the other linked BZs I've noticed only 'file.x-check-cache-dropped=on'
> which is experimental and thus we can't rely that it will not vanish sooner
> or later.

There is no externally visible property, so:

  if (qemuCaps->version >= 3000000) {
      virQEMUCapsSet(qemuCaps, QEMU_CAPS_MIGRATE_DISK_WITH_PAGE_CACHE);
  }

If this is unacceptable then we may need to add an auxiliary feature set query to QEMU so that features like this one can be reported.  They are rare but sometimes there just isn't an externally-visible way of detecting a feature.

Comment 5 Peter Krempa 2019-02-12 09:02:49 UTC
We were discussing something similar with Markus some time ago. The idea was to add some fields into the schema which can then be queried using query-qmp-schema as both qemu and libvirt already have infrastructure for that.

Such a machinery could be also useful for detecting fixes to code which are even harder to detect in most cases. The case at that time was a fix required for istantiating a qcow2 format node when the format was also specified in the image itself which produced an error.

Generally we prefer detectable fixes as it's less likely to break with downstreams where the version numbers don't matter much in relation with the actual code.

Comment 6 Stefan Hajnoczi 2019-02-12 10:47:06 UTC
(In reply to Peter Krempa from comment #5)
> We were discussing something similar with Markus some time ago. The idea was
> to add some fields into the schema which can then be queried using
> query-qmp-schema as both qemu and libvirt already have infrastructure for
> that.
> 
> Such a machinery could be also useful for detecting fixes to code which are
> even harder to detect in most cases. The case at that time was a fix
> required for istantiating a qcow2 format node when the format was also
> specified in the image itself which produced an error.
> 
> Generally we prefer detectable fixes as it's less likely to break with
> downstreams where the version numbers don't matter much in relation with the
> actual code.

Absolutely, we need to add this ability to QEMU.

I guess we're in an okay position: users can already take advantage of migration by specifying the --unsafe option.  That's a workaround, but eager users can make use of it.

I will try adding proper schema support to QEMU and you can wait with the libvirt implementation of this BZ until that becomes available.

Comment 7 Stefan Hajnoczi 2019-02-12 10:47:47 UTC
I have reduced the priority from "high" to "medium" because this will require a QEMU change but a workaround is available to libvirt users in the meantime.

Comment 8 Daniel Berrangé 2019-02-12 10:57:02 UTC
(In reply to Stefan Hajnoczi from comment #6)
> I guess we're in an okay position: users can already take advantage of
> migration by specifying the --unsafe option.  That's a workaround, but eager
> users can make use of it.

That's not a great workaround for users using a mgmt app - only for virsh.  The mgmt apps won't usually let users specify arbitrary extra API flags to pass to virDomainMigrate, and an mgmt app is unlikely to want to add UNSAFE flag itself, as that disables other potentially still relevant safety checks.

Comment 9 Stefan Hajnoczi 2019-02-13 08:47:56 UTC
(In reply to Daniel Berrange from comment #8)
> (In reply to Stefan Hajnoczi from comment #6)
> > I guess we're in an okay position: users can already take advantage of
> > migration by specifying the --unsafe option.  That's a workaround, but eager
> > users can make use of it.
> 
> That's not a great workaround for users using a mgmt app - only for virsh. 
> The mgmt apps won't usually let users specify arbitrary extra API flags to
> pass to virDomainMigrate, and an mgmt app is unlikely to want to add UNSAFE
> flag itself, as that disables other potentially still relevant safety checks.

If it's urgent then libvirt could implement the version number check I posted above.

Either way, I'll try to extend QEMU's schema so we can test for features like this properly in the future.

Comment 10 Ademar Reis 2019-07-30 18:32:17 UTC
This was implemented in QEMU via commit f357fcd890a8d6ced6d261338b859a41414561e9 and made detectable via commit dd577a26ff03b6829721b1ffbbf9e7c411b72378.

We would like to have this in RHEL-AV-8.1. Is that feasible?

Comment 11 Jiri Denemark 2019-08-13 16:09:43 UTC
Patches sent upstream for review: https://www.redhat.com/archives/libvir-list/2019-August/msg00472.html

Comment 12 Jiri Denemark 2019-08-14 07:49:02 UTC
Fixed upstream by:

commit 4748f8df29e3ac4fe321ee60c1738cbfe378295e
Refs: v5.6.0-172-g4748f8df29
Author:     Jiri Denemark <jdenemar>
AuthorDate: Tue Aug 13 13:16:20 2019 +0200
Commit:     Jiri Denemark <jdenemar>
CommitDate: Wed Aug 14 09:36:43 2019 +0200

    qemu: Clarify error message in qemuMigrationSrcIsSafe

    The original message was logically incorrect: cache != none or cache !=
    directsync is always true. But even replacing "or" with "and" doesn't
    make it more readable for humans.

    Signed-off-by: Jiri Denemark <jdenemar>
    Acked-By: Peter Krempa <pkrempa>

commit 598ec0db685eb63c99fcb7905ffd73bb714f1cf4
Refs: v5.6.0-173-g598ec0db68
Author:     Jiri Denemark <jdenemar>
AuthorDate: Tue Aug 13 15:17:36 2019 +0200
Commit:     Jiri Denemark <jdenemar>
CommitDate: Wed Aug 14 09:36:43 2019 +0200

    qemu: Check for drop-cache capability

    QEMU 4.0.0 and newer automatically drops caches at the end of migration.
    Let's check for this capability so that we can allow migration when disk
    cache is turned on.

    Signed-off-by: Jiri Denemark <jdenemar>
    Acked-By: Peter Krempa <pkrempa>

commit 4514abbd41571b4d4f4e3b6f4cb950978433dd10
Refs: v5.6.0-174-g4514abbd41
Author:     Jiri Denemark <jdenemar>
AuthorDate: Tue Aug 13 15:17:53 2019 +0200
Commit:     Jiri Denemark <jdenemar>
CommitDate: Wed Aug 14 09:36:43 2019 +0200

    qemu: Allow migration with disk cache on

    When QEMU supports flushing caches at the end of migration, we can
    safely allow migration even if disk/driver/@cache is not none nor
    directsync.

    Signed-off-by: Jiri Denemark <jdenemar>
    Acked-By: Peter Krempa <pkrempa>

Comment 15 Fangge Jin 2019-08-22 09:31:46 UTC
Test with qemu-kvm-4.1.0-4 and libvirt-5.6.0-2.virtcov.el8.x86_64.
Covered disk type=file|block, migration can succeed with any disk cache mode

Test with qemu-kvm-3.1.0-30  and libvirt-5.6.0-2.virtcov.el8.x86_64.
Covered disk type=file|block, migration fails as expected when disk cache is not none|directsync

Comment 17 errata-xmlrpc 2019-11-06 07:12:13 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:3723


Note You need to log in before you can comment on or make changes to this bug.