Bug 2107892

Summary: Migrate parameters are not restored if kill virtproxyd/virtqemud during migration
Product: Red Hat Enterprise Linux 9 Reporter: Fangge Jin <fjin>
Component: libvirtAssignee: Jiri Denemark <jdenemar>
libvirt sub component: General QA Contact: Fangge Jin <fjin>
Status: CLOSED ERRATA Docs Contact:
Severity: unspecified    
Priority: unspecified CC: dzheng, jdenemar, lcheng, lmen, virt-maint
Version: 9.1Keywords: Triaged
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: libvirt-8.5.0-4.el9 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-11-15 10:04:47 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version: 8.6.0
Embargoed:
Attachments:
Description Flags
libvirt and qemu log none

Description Fangge Jin 2022-07-17 09:19:50 UTC
Created attachment 1897802 [details]
libvirt and qemu log

Description of problem:
Do migration with --parallel-connections, kill dest virtproxy(for p2p migration) or src virtqemud(for p2p/non-p2p migration) during migration. Then do migration again without --parallel-connections, migration will fail.

Version-Release number of selected component (if applicable):
libvirt-8.5.0-1.el9.x86_64
qemu-kvm-7.0.0-8.el9.x86_64


How reproducible:
100%

Steps to Reproduce:
1. Start a vm

2. Do migration with --parallel-connections:
   # virsh migrate vm1 qemu+tcp://******/system --live --postcopy --parallel --compressed --comp-methods xbzrle --bandwidth 4 --parallel-connections 4 [--p2p]

3. Kill dest virtproxyd if it is p2p migration, or kill src virtqemud if it is p2p or non-p2p migration

4. Check migration status, it failed. And check migration parameters, the parameters are not restored:
   # virsh qemu-monitor-command vm1 '{"execute":"query-migrate-parameters"}'
{"return":{"cpu-throttle-tailslow":false,"xbzrle-cache-size":67108864,"cpu-throttle-initial":20,"announce-max":550,"decompress-threads":2,"compress-threads":8,"compress-level":1,"multifd-channels":4,"multifd-zstd-level":1,"announce-initial":50,"block-incremental":false,"compress-wait-thread":true,"downtime-limit":300,"tls-authz":"","multifd-compression":"none","announce-rounds":5,"announce-step":100,"tls-creds":"","multifd-zlib-level":1,"max-cpu-throttle":99,"max-postcopy-bandwidth":0,"tls-hostname":"","throttle-trigger-threshold":50,"max-bandwidth":4194304,"x-checkpoint-delay":20000,"cpu-throttle-increment":10},"id":"libvirt-14"}



5. Do migration again without --parallel-connections, it failed:
   #  virsh migrate vm1 qemu+tcp://******/system --live --postcopy --parallel --compressed --comp-methods xbzrle --bandwidth 4 [--p2p]


Actual results:
As step5, migration failed.

Expected results:
Step5 can succeed

Additional info:
Dest qemu log:
2022-07-17T08:02:17.058157Z qemu-kvm: socket_accept_incoming_migration: Extra incoming migration connection; ignoring
2022-07-17T08:02:17.058189Z qemu-kvm: socket_accept_incoming_migration: Extra incoming migration connection; ignoring

Src qemu log:
2022-07-17 08:02:17.068+0000: initiating migration
2022-07-17T08:02:17.081109Z qemu-kvm: Unable to write to socket: Connection reset by peer
2022-07-17T08:02:17.118964Z qemu-kvm: Unable to read from socket: Bad file descriptor
2022-07-17T08:02:17.118977Z qemu-kvm: Unable to read from socket: Bad file descriptor
2022-07-17T08:02:17.118982Z qemu-kvm: Unable to read from socket: Bad file descriptor

Comment 1 Jiri Denemark 2022-07-18 08:48:45 UTC
*** Bug 2107893 has been marked as a duplicate of this bug. ***

Comment 2 Jiri Denemark 2022-07-22 15:14:33 UTC
Patches sent upstream for review: https://listman.redhat.com/archives/libvir-list/2022-July/233114.html

Comment 3 Jiri Denemark 2022-07-25 13:06:02 UTC
BTW, migration capabilities are not reset either, but that's not such a big
issue as the unused ones are disabled when a new migration starts.

Comment 4 Jiri Denemark 2022-07-26 09:38:22 UTC
Fixed upstream by

commit c7238941357f0d2e94524cf8c5ad7d9c82dcf2f9
Refs: v8.5.0-186-gc723894135
Author:     Jiri Denemark <jdenemar>
AuthorDate: Tue Jul 19 13:48:44 2022 +0200
Commit:     Jiri Denemark <jdenemar>
CommitDate: Tue Jul 26 10:09:00 2022 +0200

    qemu_migration: Store original migration params in status XML

    We keep original values of migration parameters so that we can restore
    them at the end of migration to make sure later migration does not use
    some random values. However, this does not really work when libvirt
    daemon is restarted on the source host because we failed to explicitly
    save the status XML after getting the migration parameters from QEMU.
    Actually it might work if the status XML is written later for some other
    reason such as domain state change, but that's not how it should work.

    https://bugzilla.redhat.com/show_bug.cgi?id=2107892

    Signed-off-by: Jiri Denemark <jdenemar>
    Reviewed-by: Michal Privoznik <mprivozn>

commit c0824fd03802085db698c10fe62c98cc95a57941
Refs: v8.5.0-187-gc0824fd038
Author:     Jiri Denemark <jdenemar>
AuthorDate: Thu Jul 21 15:59:51 2022 +0200
Commit:     Jiri Denemark <jdenemar>
CommitDate: Tue Jul 26 10:09:00 2022 +0200

    qemu_migration_params: Refactor qemuMigrationParamsApply

    qemuMigrationParamsApply restricts when capabilities can be set, but
    this is not useful in all cases. Let's create new helpers for setting
    migration capabilities and parameters which can be reused in more places
    without the restriction.

    https://bugzilla.redhat.com/show_bug.cgi?id=2107892

    Signed-off-by: Jiri Denemark <jdenemar>
    Reviewed-by: Michal Privoznik <mprivozn>

commit c47f1abb81194461377a0c608a7ecd87f9ce9146
Refs: [fixes], v8.5.0-188-gc47f1abb81
Author:     Jiri Denemark <jdenemar>
AuthorDate: Thu Jul 21 16:49:09 2022 +0200
Commit:     Jiri Denemark <jdenemar>
CommitDate: Tue Jul 26 10:09:01 2022 +0200

    qemu_migration_params: Refactor qemuMigrationParamsReset

    Because qemuMigrationParamsReset used to call qemuMigrationParamsApply
    for resetting migration capabilities and parameters, it did not work
    well since commit v5.1.0-83-ga1dec315c9 which only allowed capabilities
    to be set from an async job. However, when reconnecting to running
    domains after daemon restart we do not have an async job. Thus the
    capabilities were not properly reset in case the daemon was restarted
    during an ongoing migration. We need to avoid calling
    qemuMigrationParamsApply to make sure both parameters and capabilities
    can be reset by a normal job.

    https://bugzilla.redhat.com/show_bug.cgi?id=2107892

    Signed-off-by: Jiri Denemark <jdenemar>
    Reviewed-by: Michal Privoznik <mprivozn>

Comment 5 Jiri Denemark 2022-07-26 13:17:24 UTC
Backported: https://gitlab.com/redhat/rhel/src/libvirt/-/merge_requests/40

Comment 6 Fangge Jin 2022-08-01 08:33:23 UTC
Versions:
libvirt-client-8.5.0-4.el9.x86_64
qemu-kvm-7.0.0-9.el9.x86_64

Test matrix:
p2p, kill src virtqemud, pass
p2p, kill dest virtqemud, pass
p2p, kill dest virtproxyd, fail
non-p2p, kill src virtqemud, pass
non-p2p, kill dest virtqemud, pass
non-p2p, kill dest virtproxyd, pass

Comment 12 Fangge Jin 2022-08-11 05:59:44 UTC
Verified with
libvirt-8.5.0-5.el9.x86_64
qemu-kvm-7.0.0-10.el9.x86_64


Test matrix:
p2p, kill src virtqemud, pass
p2p, kill dest virtqemud, pass
p2p, kill dest virtproxyd, fail(Bug 2114866 )
non-p2p, kill src virtqemud, pass
non-p2p, kill dest virtqemud, pass
non-p2p, kill dest virtproxyd, pass

Comment 14 errata-xmlrpc 2022-11-15 10:04:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Low: libvirt security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:8003