Bug 1890373
Summary: | kernel version update cause qemu live migration failed | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 8 | Reporter: | 张东旭 <xu910121> |
Component: | kernel | Assignee: | Andrew Jones <drjones> |
kernel sub component: | KVM | QA Contact: | Zhijian Li (Fujitsu) <zhijli> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | medium | ||
Priority: | medium | CC: | bstinson, carl, drjones, eric.auger, hyasuhar, jinzhao, juzhang, jwboyer, knoel, lcapitulino, mmizuma, qzhang, virt-maint, yidliu, yihyu, zhenyzha, zhijli |
Version: | 8.4 | Keywords: | TestOnly, Triaged |
Target Milestone: | rc | ||
Target Release: | 8.4 | ||
Hardware: | aarch64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | kernel-4.18.0-283.el8 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2021-05-18 14:16:08 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1875540, 1907826 | ||
Bug Blocks: | 1885655, 1897024 |
Description
张东旭
2020-10-22 02:50:18 UTC
The bug also reproduces upstream and I'm currently testing a fix for it. Posted fix upstream https://lists.cs.columbia.edu/pipermail/kvmarm/2020-October/042955.html It worked for me. live migration case: old kernel version to new kernel version new kernel version to old kernel version all succeed. Patches now upstream f81cb2c3ad41 KVM: arm64: Don't hide ID registers from userspace 01fe5ace92dd KVM: arm64: Consolidate REG_HIDDEN_GUEST/USER 912dee572691 KVM: arm64: Check RAZ visibility in ID register accessors c512298eed03 KVM: arm64: Remove AA64ZFR0_EL1 accessors A KVM selftest test is also now upstream fd02029a9e01 KVM: selftests: Add aarch64 get-reg-list test 31d212959179 KVM: selftests: Add blessed SVE registers to get-reg-list To test, build the KVM selftests on AArch64 and run the aarch64/get-reg-list test. On a kernel without f81cb2c3ad41 ("KVM: arm64: Don't hide ID registers from userspace") the test will fail, complaining about a missing register. On a kernel with the patch the test will exit silently with success (exit code 0). An additional test, aarch64/get-reg-list-sve, can be run to confirm no regressions to the visibility of the register occur when SVE is enabled. That test must be run on a machine that supports SVE. Since these patches are now all upstream, then they should get picked up by the AArch64 KVM rebase, so I'm making this bug a dependency on the rebase bug. I'm also marking it as TestOnly and removing the OtherQA flag, since we have some Virt QE resources that can run KVM selftests. Reproduced this bug with the following version by kselftests: kernel-core-4.18.0-240.el8.aarch64 qemu-kvm-core-5.1.0-13.module+el8.3.0+8382+afc3bbea.aarch64 testsuite commit: 585e5b17b92dead8a3aca4e3c9876fbca5f7e0ba Test steps: $ cd linux/tools/testing/selftests/kvm $ make && ./aarch64/get-reg-list make --no-builtin-rules ARCH=arm64 -C ../../../.. headers_install make[1]: Entering directory '/home/lizhijian/workspace/linux' INSTALL ./usr/include make[1]: Leaving directory '/home/lizhijian/workspace/linux' Number blessed registers: 311 Number registers: 310 There are 1 missing registers. The following lines are missing registers: ARM64_SYS_REG(3, 0, 0, 4, 4), ==== Test Assertion Failure ==== aarch64/get-reg-list.c:453: !missing_regs && !failed_get && !failed_set && !failed_reject pid=819317 tid=819317 - Argument list too long 1 0x0000000000401623: main at get-reg-list.c:450 2 0x0000ffff90260be3: ?? ??:0 3 0x00000000004019a3: _start at :? There are 1 missing registers; 0 registers failed get; 0 registers failed set; 0 registers failed reject Test result: NG Bug 1898489 has been closed wont-fix, but we'll still be backporting more fixes for 8.4, including the patches for this bug. I'll update the bug dependency when the new bug is written. I can also just post the patches for this bug independently if needed. Bug reproduction: Host: [root@fujitsu-fx700-01-n00 home]# uname -r 4.18.0-80.11.1.el8_0.aarch64 qemu-kvm-2.12.0-63.module+el8+2833+c7d6d092.aarch64 [root@fujitsu-fx700-01-n01 home]# uname -r 4.18.0-147.5.1.el8_1.aarch64 qemu-kvm-2.12.0-88.module+el8.1.0+4233+bc44be3f.aarch64 1.Boot a guest on source host with qemu command line: /usr/libexec/qemu-kvm \ -name 'avocado-vt-vm1' \ -sandbox on \ -machine virt-rhel7.6.0,gic-version=host,graphics=on \ -nodefaults \ -m 8192 \ -smp 8,maxcpus=8,cores=4,threads=1,sockets=2 \ -cpu 'host' \ -vnc :10 \ -enable-kvm \ -monitor stdio \ 2.Boot a incoming guest on destination host, launch incoming mode (qemu) migrate_incoming tcp:0:5801 3.Start migration on source host (qemu) migrate -d tcp:10.16.207.95:5801 4.Result Migration completed on source end, but qemu crash on destination end Result: source: (qemu) info status VM status: paused (postmigrate) (qemu) info migrate globals: store-global-state: on only-migratable: off send-configuration: on send-section-footer: on decompress-error-check: on capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off postcopy-ram: off x-colo: off release-ram: off return-path: off pause-before-switchover: off x-multifd: off dirty-bitmaps: off late-block-activate: off Migration status: completed total time: 4484 milliseconds downtime: 13 milliseconds setup: 3 milliseconds transferred ram: 18745 kbytes throughput: 34.45 mbps remaining ram: 0 kbytes total ram: 8519872 kbytes duplicate: 2129962 pages skipped: 0 pages normal: 6 pages normal bytes: 24 kbytes dirty sync count: 3 page size: 4 kbytes destination: (qemu) qemu-kvm: Invalid value 233 expecting positive value <= 232 qemu-kvm: Failed to load cpu:cpreg_vmstate_array_len qemu-kvm: error while loading state for instance 0x0 of device 'cpu' qemu-kvm: load of migration failed: Invalid argument Patches listed in comment 8 were submitted as part of [RHEL8.4 virt 1907826 PATCH 00/16] KVM/ARM: v5.10/v5.11 fixes Patch(es) available on kernel-4.18.0-278.el8.dt3 Verified by kselftests on aarch64: kernel: 4.18.0-278.el8.dt3.aarch64 qemu-kvm: qemu-kvm-5.2.0-4.module+el8.4.0+9676+589043b9.src.rpm testsuite commit: 585e5b17b92dead8a3aca4e3c9876fbca5f7e0ba Test steps: ``` [root@fujitsu-fx700-01-n00 ~]# cd linux/tools/testing/selftests/kvm [root@fujitsu-fx700-01-n00 kvm]# make && ./aarch64/get-reg-list make --no-builtin-rules ARCH=arm64 -C ../../../.. headers_install make[1]: Entering directory '/root/linux' INSTALL ./usr/include [snip] [root@fujitsu-fx700-01-n00 kvm]# echo $? 0 ``` Thanks Yiding and Zhijian for the efforts! Verified by kselftests on aarch64: kernel: 4.18.0-283.el8.aarch64 qemu-kvm: qemu-kvm-5.2.0-5.module+el8.4.0+9775+0937c167.src.rpm testsuite commit: f40ddce88593482919761f74910f42f4b84c004b Test steps: ``` [root@hpe-apollo80-01-n00 ~]# cd linux/tools/testing/selftests/kvm/ [root@hpe-apollo80-01-n00 kvm]# make && ./aarch64/get-reg-list make --no-builtin-rules ARCH=arm64 -C ../../../.. headers_install make[1]: Entering directory '/root/linux' [snip] [root@hpe-apollo80-01-n00 kvm]# echo $? 0 ``` Thanks Yiding for the effort! Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: kernel security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:1578 |