RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1890373 - kernel version update cause qemu live migration failed
Summary: kernel version update cause qemu live migration failed
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: kernel
Version: 8.4
Hardware: aarch64
OS: Linux
medium
medium
Target Milestone: rc
: 8.4
Assignee: Andrew Jones
QA Contact: Zhijian Li (Fujitsu)
URL:
Whiteboard:
Depends On: 1875540 1907826
Blocks: 1885655 1897024
TreeView+ depends on / blocked
 
Reported: 2020-10-22 02:50 UTC by 张东旭
Modified: 2021-12-07 22:35 UTC (History)
17 users (show)

Fixed In Version: kernel-4.18.0-283.el8
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-05-18 14:16:08 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description 张东旭 2020-10-22 02:50:18 UTC
On AArch64, qemu live migration with different kernel version:
old kernel version:4.18.0-80.11.el8 (migration source)
new kernel version:4.18.0-147.5.el8 (migration destination)

when I use qemu live migration source VM (host kernel 4.18.0-80.11.el8) to destination VM (host kernel 4.18.0-147.5.el8), qemu live migration will failed with messages:
qemu-kvm: Invalid value 233 expecting positive value <= 232
qemu-kvm: Failed to load cpu:cpreg_vmstate_array_len

migration source and destination hosts have same hardware and same qemu version.just kernel version is different, and the hardware on either side of the migration not support SVE.

I found new version kernel apply this patch:
KVM: arm64/sve: System register context switch and access support
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/arch/arm64/kvm/sys_regs.c?id=73433762fcaeb9d59e84d299021c6b15466c96dd

mybe this patch cause live migration failed.

Is there some good suggestions,which can make sure old version kernel live migration to new version kernel with qemu?
thaks a lot.

Comment 1 Andrew Jones 2020-10-22 06:56:07 UTC
The bug also reproduces upstream and I'm currently testing a fix for it.

Comment 4 Andrew Jones 2020-10-30 13:09:52 UTC
Posted fix upstream https://lists.cs.columbia.edu/pipermail/kvmarm/2020-October/042955.html

Comment 5 张东旭 2020-11-04 02:43:31 UTC
It worked for me.
live migration case:
old kernel version to new kernel version
new kernel version to old kernel version
all succeed.

Comment 8 Andrew Jones 2020-11-10 08:50:03 UTC
Patches now upstream

f81cb2c3ad41 KVM: arm64: Don't hide ID registers from userspace
01fe5ace92dd KVM: arm64: Consolidate REG_HIDDEN_GUEST/USER
912dee572691 KVM: arm64: Check RAZ visibility in ID register accessors
c512298eed03 KVM: arm64: Remove AA64ZFR0_EL1 accessors

A KVM selftest test is also now upstream

fd02029a9e01 KVM: selftests: Add aarch64 get-reg-list test
31d212959179 KVM: selftests: Add blessed SVE registers to get-reg-list

To test, build the KVM selftests on AArch64 and run the aarch64/get-reg-list test. On a kernel without f81cb2c3ad41 ("KVM: arm64: Don't hide ID registers from userspace") the test will fail, complaining about a missing register. On a kernel with the patch the test will exit silently with success (exit code 0). An additional test, aarch64/get-reg-list-sve, can be run to confirm no regressions to the visibility of the register occur when SVE is enabled. That test must be run on a machine that supports SVE.

Since these patches are now all upstream, then they should get picked up by the AArch64 KVM rebase, so I'm making this bug a dependency on the rebase bug. I'm also marking it as TestOnly and removing the OtherQA flag, since we have some Virt QE resources that can run KVM selftests.

Comment 11 Zhijian Li (Fujitsu) 2020-11-13 07:42:06 UTC
Reproduced this bug with the following version by kselftests:

kernel-core-4.18.0-240.el8.aarch64
qemu-kvm-core-5.1.0-13.module+el8.3.0+8382+afc3bbea.aarch64
testsuite commit: 585e5b17b92dead8a3aca4e3c9876fbca5f7e0ba

Test steps:
$ cd linux/tools/testing/selftests/kvm
$ make && ./aarch64/get-reg-list
make --no-builtin-rules ARCH=arm64 -C ../../../.. headers_install
make[1]: Entering directory '/home/lizhijian/workspace/linux'
  INSTALL ./usr/include
make[1]: Leaving directory '/home/lizhijian/workspace/linux'
Number blessed registers:   311
Number registers:           310

There are 1 missing registers.
The following lines are missing registers:

	ARM64_SYS_REG(3, 0, 0, 4, 4),

==== Test Assertion Failure ====
  aarch64/get-reg-list.c:453: !missing_regs && !failed_get && !failed_set && !failed_reject
  pid=819317 tid=819317 - Argument list too long
     1	0x0000000000401623: main at get-reg-list.c:450
     2	0x0000ffff90260be3: ?? ??:0
     3	0x00000000004019a3: _start at :?
  There are 1 missing registers; 0 registers failed get; 0 registers failed set; 0 registers failed reject


Test result:  NG

Comment 21 Andrew Jones 2020-12-14 19:29:36 UTC
Bug 1898489 has been closed wont-fix, but we'll still be backporting more fixes for 8.4, including the patches for this bug. I'll update the bug dependency when the new bug is written. I can also just post the patches for this bug independently if needed.

Comment 23 xianwang 2021-01-12 02:02:32 UTC
Bug reproduction:
Host:
[root@fujitsu-fx700-01-n00 home]# uname -r
4.18.0-80.11.1.el8_0.aarch64
qemu-kvm-2.12.0-63.module+el8+2833+c7d6d092.aarch64

[root@fujitsu-fx700-01-n01 home]# uname -r
4.18.0-147.5.1.el8_1.aarch64
qemu-kvm-2.12.0-88.module+el8.1.0+4233+bc44be3f.aarch64


1.Boot a guest on source host with qemu command line:
/usr/libexec/qemu-kvm \
    -name 'avocado-vt-vm1'  \
    -sandbox on  \
    -machine virt-rhel7.6.0,gic-version=host,graphics=on \
    -nodefaults \
    -m 8192  \
    -smp 8,maxcpus=8,cores=4,threads=1,sockets=2  \
    -cpu 'host' \
    -vnc :10  \
    -enable-kvm \
    -monitor stdio \
2.Boot a incoming guest on destination host, launch incoming mode
(qemu) migrate_incoming tcp:0:5801
3.Start migration on source host
(qemu) migrate -d tcp:10.16.207.95:5801
4.Result
Migration completed on source end, but qemu crash on destination end
Result:
source:
(qemu) info status 
VM status: paused (postmigrate)
(qemu) info migrate
globals:
store-global-state: on
only-migratable: off
send-configuration: on
send-section-footer: on
decompress-error-check: on
capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off postcopy-ram: off x-colo: off release-ram: off return-path: off pause-before-switchover: off x-multifd: off dirty-bitmaps: off late-block-activate: off 
Migration status: completed
total time: 4484 milliseconds
downtime: 13 milliseconds
setup: 3 milliseconds
transferred ram: 18745 kbytes
throughput: 34.45 mbps
remaining ram: 0 kbytes
total ram: 8519872 kbytes
duplicate: 2129962 pages
skipped: 0 pages
normal: 6 pages
normal bytes: 24 kbytes
dirty sync count: 3
page size: 4 kbytes

destination:
(qemu) qemu-kvm: Invalid value 233 expecting positive value <= 232
qemu-kvm: Failed to load cpu:cpreg_vmstate_array_len
qemu-kvm: error while loading state for instance 0x0 of device 'cpu'
qemu-kvm: load of migration failed: Invalid argument

Comment 24 Eric Auger 2021-01-12 10:52:14 UTC
Patches listed in comment 8 were submitted as part of
[RHEL8.4 virt 1907826 PATCH 00/16] KVM/ARM: v5.10/v5.11 fixes

Comment 25 Jan Stancek 2021-01-28 07:23:38 UTC
Patch(es) available on kernel-4.18.0-278.el8.dt3

Comment 26 Yiding Liu (Fujitsu) 2021-02-01 08:37:21 UTC
Verified by kselftests on aarch64:

kernel: 4.18.0-278.el8.dt3.aarch64
qemu-kvm: qemu-kvm-5.2.0-4.module+el8.4.0+9676+589043b9.src.rpm
testsuite commit: 585e5b17b92dead8a3aca4e3c9876fbca5f7e0ba

Test steps:
```
[root@fujitsu-fx700-01-n00 ~]# cd linux/tools/testing/selftests/kvm
[root@fujitsu-fx700-01-n00 kvm]# make && ./aarch64/get-reg-list
make --no-builtin-rules ARCH=arm64 -C ../../../.. headers_install
make[1]: Entering directory '/root/linux'
  INSTALL ./usr/include
[snip]
[root@fujitsu-fx700-01-n00 kvm]# echo $?
0
```

Comment 27 Qunfang Zhang 2021-02-02 02:04:10 UTC
Thanks Yiding and Zhijian for the efforts!

Comment 30 Yiding Liu (Fujitsu) 2021-02-18 03:26:24 UTC
Verified by kselftests on aarch64:

kernel: 4.18.0-283.el8.aarch64
qemu-kvm: qemu-kvm-5.2.0-5.module+el8.4.0+9775+0937c167.src.rpm
testsuite commit: f40ddce88593482919761f74910f42f4b84c004b

Test steps:
```
[root@hpe-apollo80-01-n00 ~]# cd linux/tools/testing/selftests/kvm/
[root@hpe-apollo80-01-n00 kvm]# make && ./aarch64/get-reg-list
make --no-builtin-rules ARCH=arm64 -C ../../../.. headers_install
make[1]: Entering directory '/root/linux'
[snip]
[root@hpe-apollo80-01-n00 kvm]# echo $?
0
```

Comment 31 Qunfang Zhang 2021-02-18 05:20:25 UTC
Thanks Yiding for the effort!

Comment 34 errata-xmlrpc 2021-05-18 14:16:08 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: kernel security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:1578


Note You need to log in before you can comment on or make changes to this bug.