Bug 2089431

Summary: [RFE] RFE to allow enabling ZEROCOPY live migration through libvirt
Product: Red Hat Enterprise Linux 9 Reporter: Nils Koenig <nkoenig>
Component: libvirtAssignee: Jiri Denemark <jdenemar>
libvirt sub component: Live Migration QA Contact: Fangge Jin <fjin>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: chhu, dzheng, fjin, jdenemar, lcheng, lmen, virt-maint, xuzhang
Version: 9.1Keywords: FutureFeature, Triaged
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: libvirt-8.5.0-1.el9 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-11-15 10:04:39 UTC Type: Feature Request
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version: 8.5.0
Embargoed:
Bug Depends On: 1968509    
Bug Blocks: 2089433, 2092752    

Comment 2 Jiri Denemark 2022-06-24 12:33:22 UTC
Pushed upstream as

commit 8744beecb36600e773c8a8c4823db2bf4b3e262d
Refs: v8.4.0-289-g8744beecb3
Author:     Jiri Denemark <jdenemar>
AuthorDate: Wed Jun 22 16:35:50 2022 +0200
Commit:     Jiri Denemark <jdenemar>
CommitDate: Thu Jun 23 16:45:39 2022 +0200

    Add VIR_MIGRATE_ZEROCOPY flag

    The flag can be used to enable zero-copy mechanism for migrating memory
    pages.

    Signed-off-by: Jiri Denemark <jdenemar>
    Reviewed-by: Ján Tomko <jtomko>

commit efa3baeae70fbdf4ab032ca485cb9272ee96bd50
Refs: v8.4.0-290-gefa3baeae7
Author:     Jiri Denemark <jdenemar>
AuthorDate: Wed Jun 22 16:36:53 2022 +0200
Commit:     Jiri Denemark <jdenemar>
CommitDate: Thu Jun 23 16:45:39 2022 +0200

    virsh: Add support for VIR_MIGRATE_ZEROCOPY flag

    Signed-off-by: Jiri Denemark <jdenemar>
    Reviewed-by: Ján Tomko <jtomko>

commit d375993ab314a41bca7ef6c846e07afc18c37774
Refs: v8.4.0-291-gd375993ab3
Author:     Jiri Denemark <jdenemar>
AuthorDate: Wed Jun 22 16:37:31 2022 +0200
Commit:     Jiri Denemark <jdenemar>
CommitDate: Thu Jun 23 16:45:39 2022 +0200

    qemu_migration: Implement VIR_MIGRATE_ZEROCOPY flag

    Resolves: https://gitlab.com/libvirt/libvirt/-/issues/306

    Signed-off-by: Jiri Denemark <jdenemar>
    Reviewed-by: Ján Tomko <jtomko>

Comment 6 Fangge Jin 2022-07-12 10:09:56 UTC
Scenario 1(negative): parallel + zerocopy + native_tls + non-p2p
# virsh migrate uefi qemu+tcp://***/system --live --postcopy --bandwidth 10 --auto-converge  --zerocopy --parallel --tls
error: operation failed: job 'migration out' failed: Requested Zero Copy feature is not available: Invalid argument

Comment 7 Fangge Jin 2022-07-12 10:20:49 UTC
Scenario 2(negative): non_parallel + zerocopy 
# virsh migrate uefi qemu+tcp://***/system --live --postcopy --bandwidth 10 --auto-converge  --zerocopy --parallel --tls [--p2p]
error: operation failed: job 'migration out' failed: Requested Zero Copy feature is not available: Invalid argument

Scenario 3: parallel + zerocopy
virsh migrate uefi qemu+tcp://dell-per640-09.lab.eng.pek2.redhat.com/system --live --postcopy --bandwidth 10 --auto-converge --zerocopy --parallel [--p2p]

Scenario 4: parallel + zerocopy, abort migration, then migrate again.

Comment 8 Fangge Jin 2022-07-14 03:16:17 UTC
(In reply to Fangge Jin from comment #7)
> Scenario 2(negative): non_parallel + zerocopy 
> # virsh migrate uefi qemu+tcp://***/system --live --postcopy --bandwidth 10
> --auto-converge  --zerocopy --parallel --tls [--p2p]
> error: operation failed: job 'migration out' failed: Requested Zero Copy
> feature is not available: Invalid argument
Correct a typo here:
It should be:
# virsh migrate uefi qemu+tcp://***/system --live --postcopy --bandwidth 10 --auto-converge  --zerocopy --tls [--p2p]
error: Requested operation is not valid: zero-copy is only available for parallel migration

Comment 9 Fangge Jin 2022-07-14 10:48:30 UTC
Hi Jirka

I did more testing today and found one issue about memlock limit. Could you please help to confirm whether this is a bug?
Steps:
1. Start vm, and check prlimit:
   # prlimit -p 38921 -l
   RESOURCE DESCRIPTION                             SOFT      HARD UNITS
   MEMLOCK  max locked-in-memory address space 134217728 134217728 bytes

2. Migrate vm, and check prlimit before migration completes:
   # virsh migrate uefi qemu+tcp://***/system --live --postcopy --bandwidth 10 --auto-converge  --zerocopy --p2p --parallel
   # prlimit -p 38921 -l
   RESOURCE DESCRIPTION                              SOFT       HARD UNITS
   MEMLOCK  max locked-in-memory address space 2147483648 2147483648 bytes

3. Kill source virtqemud before migration completes, migration will fail. But prlimit is not restored:
   # prlimit -p 38921 -l
   RESOURCE DESCRIPTION                              SOFT       HARD UNITS
   MEMLOCK  max locked-in-memory address space 2147483648 2147483648 bytes
   
Additional info:
1. If I abort migration by "virsh domjobabort", prlimit can be restored.

Comment 10 Jiri Denemark 2022-07-14 13:41:20 UTC
(In reply to Fangge Jin from comment #9)
> 3. Kill source virtqemud before migration completes, migration will fail.
> But prlimit is not restored:
>    # prlimit -p 38921 -l
>    RESOURCE DESCRIPTION                              SOFT       HARD UNITS
>    MEMLOCK  max locked-in-memory address space 2147483648 2147483648 bytes

Do you just kill the daemon or do you even start it again. If you only kill it
and keep it stop, there's nothing that could restore the limit back. But if
you start the daemon again and the limit still stays the same, we have a bug
somewhere. The "qemu_migration: Restore original memory locking limit" commit
should be handling this, but I might have missed something there...

Comment 11 Fangge Jin 2022-07-14 14:12:40 UTC
I just kill the daemon, but systemd will restart the daemon immediately and automatically.

Comment 12 Fangge Jin 2022-07-14 14:13:14 UTC
(In reply to Fangge Jin from comment #11)
I just kill the daemon, but systemd will restart the daemon immediately and automatically when the daemon is killed.

Comment 13 Jiri Denemark 2022-07-14 14:30:08 UTC
OK, thanks for confirming. Could you please file a separate BZ for this issue?

Comment 14 Fangge Jin 2022-07-15 02:21:08 UTC
Bug filed for issue in Comment 9
Bug 2107424 - "mem lock limit" of qemu process is not restored when kill src virtqemud during zerocopy migration.

Comment 16 errata-xmlrpc 2022-11-15 10:04:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Low: libvirt security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:8003