Bug 2169732
Summary: | Multifd migration fails under a weak network/socket ordering race | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 9 | Reporter: | Li Xiaohui <xiaohli> |
Component: | qemu-kvm | Assignee: | Peter Xu <peterx> |
qemu-kvm sub component: | Live Migration | QA Contact: | Li Xiaohui <xiaohli> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | high | ||
Priority: | high | CC: | berrange, chayang, coli, fjin, iholder, jinzhao, juzhang, leobras, mdean, nilal, peterx, quintela, sgott, virt-maint |
Version: | 9.2 | Keywords: | Triaged |
Target Milestone: | rc | ||
Target Release: | 9.2 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | qemu-kvm-7.2.0-9.el9 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | 2137740 | Environment: | |
Last Closed: | 2023-05-09 07:23:46 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 2137740 | ||
Bug Blocks: |
Description
Li Xiaohui
2023-02-14 14:07:30 UTC
QE bot(pre verify): Set 'Verified:Tested,SanityOnly' as gating/tier1 test pass. Verify this bug on kernel-5.14.0-270.el9.x86_64 && qemu-kvm-7.2.0-10.el9.x86_64 Background: 1.When the guest is running on the source host, enable multifd capability on source and destination; 2.Before migration, creat network packet loss on the source host: # tc qdisc add dev switch root netem loss 40% 3.Then start migrating the guest from source to destination host; after step 3, migration is active. the progress of migration is very slow and migration can't converge. But anyway, guest still works well, no erros from qemu, and qemu won't hang. Then test below scenarios: 1) cancel migration when multifd migration can't converge during network packet loss: Result: Cancel migration successfully, VM works well on src host: src hmp: (qemu) migrate_cancel (qemu) 2023-02-23T06:37:36.993227Z qemu-kvm: multifd_send_pages: channel 1 has already quit! 2023-02-23T06:37:36.993316Z qemu-kvm: multifd_send_sync_main: multifd_send_pages fail 2023-02-23T06:37:36.993326Z qemu-kvm: failed to save SaveStateEntry with id(name): 1(ram): -1 2023-02-23T06:37:36.995036Z qemu-kvm: Unable to write to socket: Broken pipe dst hmp: (qemu) 2023-02-23T06:37:36.757048Z qemu-kvm: check_section_footer: Read section footer failed: -5 2023-02-23T06:37:36.758076Z qemu-kvm: load of migration failed: Invalid argument 2) recovery network packet loss, then continue multifd migration. # tc qdisc delete dev switch root netem loss 40% Result: Migration succeeds, VM works well after migration. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: qemu-kvm security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2023:2162 |