Bug 1890373
| Summary: | kernel version update cause qemu live migration failed | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | 张东旭 <xu910121> |
| Component: | kernel | Assignee: | Andrew Jones <drjones> |
| kernel sub component: | KVM | QA Contact: | Zhijian Li (Fujitsu) <zhijli> |
| Status: | CLOSED ERRATA | Docs Contact: | |
| Severity: | medium | ||
| Priority: | medium | CC: | bstinson, carl, drjones, eric.auger, hyasuhar, jinzhao, juzhang, jwboyer, knoel, lcapitulino, mmizuma, qzhang, virt-maint, yidliu, yihyu, zhenyzha, zhijli |
| Version: | 8.4 | Keywords: | TestOnly, Triaged |
| Target Milestone: | rc | Flags: | pm-rhel:
mirror+
|
| Target Release: | 8.4 | ||
| Hardware: | aarch64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | kernel-4.18.0-283.el8 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-05-18 14:16:08 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 1875540, 1907826 | ||
| Bug Blocks: | 1885655, 1897024 | ||
|
Description
张东旭
2020-10-22 02:50:18 UTC
The bug also reproduces upstream and I'm currently testing a fix for it. Posted fix upstream https://lists.cs.columbia.edu/pipermail/kvmarm/2020-October/042955.html It worked for me. live migration case: old kernel version to new kernel version new kernel version to old kernel version all succeed. Patches now upstream
f81cb2c3ad41 KVM: arm64: Don't hide ID registers from userspace
01fe5ace92dd KVM: arm64: Consolidate REG_HIDDEN_GUEST/USER
912dee572691 KVM: arm64: Check RAZ visibility in ID register accessors
c512298eed03 KVM: arm64: Remove AA64ZFR0_EL1 accessors
A KVM selftest test is also now upstream
fd02029a9e01 KVM: selftests: Add aarch64 get-reg-list test
31d212959179 KVM: selftests: Add blessed SVE registers to get-reg-list
To test, build the KVM selftests on AArch64 and run the aarch64/get-reg-list test. On a kernel without f81cb2c3ad41 ("KVM: arm64: Don't hide ID registers from userspace") the test will fail, complaining about a missing register. On a kernel with the patch the test will exit silently with success (exit code 0). An additional test, aarch64/get-reg-list-sve, can be run to confirm no regressions to the visibility of the register occur when SVE is enabled. That test must be run on a machine that supports SVE.
Since these patches are now all upstream, then they should get picked up by the AArch64 KVM rebase, so I'm making this bug a dependency on the rebase bug. I'm also marking it as TestOnly and removing the OtherQA flag, since we have some Virt QE resources that can run KVM selftests.
Reproduced this bug with the following version by kselftests:
kernel-core-4.18.0-240.el8.aarch64
qemu-kvm-core-5.1.0-13.module+el8.3.0+8382+afc3bbea.aarch64
testsuite commit: 585e5b17b92dead8a3aca4e3c9876fbca5f7e0ba
Test steps:
$ cd linux/tools/testing/selftests/kvm
$ make && ./aarch64/get-reg-list
make --no-builtin-rules ARCH=arm64 -C ../../../.. headers_install
make[1]: Entering directory '/home/lizhijian/workspace/linux'
INSTALL ./usr/include
make[1]: Leaving directory '/home/lizhijian/workspace/linux'
Number blessed registers: 311
Number registers: 310
There are 1 missing registers.
The following lines are missing registers:
ARM64_SYS_REG(3, 0, 0, 4, 4),
==== Test Assertion Failure ====
aarch64/get-reg-list.c:453: !missing_regs && !failed_get && !failed_set && !failed_reject
pid=819317 tid=819317 - Argument list too long
1 0x0000000000401623: main at get-reg-list.c:450
2 0x0000ffff90260be3: ?? ??:0
3 0x00000000004019a3: _start at :?
There are 1 missing registers; 0 registers failed get; 0 registers failed set; 0 registers failed reject
Test result: NG
Bug 1898489 has been closed wont-fix, but we'll still be backporting more fixes for 8.4, including the patches for this bug. I'll update the bug dependency when the new bug is written. I can also just post the patches for this bug independently if needed. Bug reproduction:
Host:
[root@fujitsu-fx700-01-n00 home]# uname -r
4.18.0-80.11.1.el8_0.aarch64
qemu-kvm-2.12.0-63.module+el8+2833+c7d6d092.aarch64
[root@fujitsu-fx700-01-n01 home]# uname -r
4.18.0-147.5.1.el8_1.aarch64
qemu-kvm-2.12.0-88.module+el8.1.0+4233+bc44be3f.aarch64
1.Boot a guest on source host with qemu command line:
/usr/libexec/qemu-kvm \
-name 'avocado-vt-vm1' \
-sandbox on \
-machine virt-rhel7.6.0,gic-version=host,graphics=on \
-nodefaults \
-m 8192 \
-smp 8,maxcpus=8,cores=4,threads=1,sockets=2 \
-cpu 'host' \
-vnc :10 \
-enable-kvm \
-monitor stdio \
2.Boot a incoming guest on destination host, launch incoming mode
(qemu) migrate_incoming tcp:0:5801
3.Start migration on source host
(qemu) migrate -d tcp:10.16.207.95:5801
4.Result
Migration completed on source end, but qemu crash on destination end
Result:
source:
(qemu) info status
VM status: paused (postmigrate)
(qemu) info migrate
globals:
store-global-state: on
only-migratable: off
send-configuration: on
send-section-footer: on
decompress-error-check: on
capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off postcopy-ram: off x-colo: off release-ram: off return-path: off pause-before-switchover: off x-multifd: off dirty-bitmaps: off late-block-activate: off
Migration status: completed
total time: 4484 milliseconds
downtime: 13 milliseconds
setup: 3 milliseconds
transferred ram: 18745 kbytes
throughput: 34.45 mbps
remaining ram: 0 kbytes
total ram: 8519872 kbytes
duplicate: 2129962 pages
skipped: 0 pages
normal: 6 pages
normal bytes: 24 kbytes
dirty sync count: 3
page size: 4 kbytes
destination:
(qemu) qemu-kvm: Invalid value 233 expecting positive value <= 232
qemu-kvm: Failed to load cpu:cpreg_vmstate_array_len
qemu-kvm: error while loading state for instance 0x0 of device 'cpu'
qemu-kvm: load of migration failed: Invalid argument
Patches listed in comment 8 were submitted as part of [RHEL8.4 virt 1907826 PATCH 00/16] KVM/ARM: v5.10/v5.11 fixes Patch(es) available on kernel-4.18.0-278.el8.dt3 Verified by kselftests on aarch64: kernel: 4.18.0-278.el8.dt3.aarch64 qemu-kvm: qemu-kvm-5.2.0-4.module+el8.4.0+9676+589043b9.src.rpm testsuite commit: 585e5b17b92dead8a3aca4e3c9876fbca5f7e0ba Test steps: ``` [root@fujitsu-fx700-01-n00 ~]# cd linux/tools/testing/selftests/kvm [root@fujitsu-fx700-01-n00 kvm]# make && ./aarch64/get-reg-list make --no-builtin-rules ARCH=arm64 -C ../../../.. headers_install make[1]: Entering directory '/root/linux' INSTALL ./usr/include [snip] [root@fujitsu-fx700-01-n00 kvm]# echo $? 0 ``` Thanks Yiding and Zhijian for the efforts! Verified by kselftests on aarch64: kernel: 4.18.0-283.el8.aarch64 qemu-kvm: qemu-kvm-5.2.0-5.module+el8.4.0+9775+0937c167.src.rpm testsuite commit: f40ddce88593482919761f74910f42f4b84c004b Test steps: ``` [root@hpe-apollo80-01-n00 ~]# cd linux/tools/testing/selftests/kvm/ [root@hpe-apollo80-01-n00 kvm]# make && ./aarch64/get-reg-list make --no-builtin-rules ARCH=arm64 -C ../../../.. headers_install make[1]: Entering directory '/root/linux' [snip] [root@hpe-apollo80-01-n00 kvm]# echo $? 0 ``` Thanks Yiding for the effort! Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: kernel security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:1578 |