RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1961519 - qemu-kvm: error while loading state for instance 0x0 of device 'cpu'
Summary: qemu-kvm: error while loading state for instance 0x0 of device 'cpu'
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: qemu-kvm
Version: 8.5
Hardware: aarch64
OS: Linux
low
low
Target Milestone: rc
: ---
Assignee: Eric Auger
QA Contact: Xinjian Ma(Fujitsu)
URL:
Whiteboard:
Depends On:
Blocks: 1875540 1885765
TreeView+ depends on / blocked
 
Reported: 2021-05-18 07:27 UTC by Zhijian Li (Fujitsu)
Modified: 2023-03-14 14:25 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-10-18 07:28:52 UTC
Type: Bug
Target Upstream Version:
Embargoed:
pm-rhel: mirror+


Attachments (Terms of Use)

Description Zhijian Li (Fujitsu) 2021-05-18 07:27:34 UTC
Description of problem:
qemu-kvm: error while loading state for instance 0x0 of device 'cpu' when migrate from rhel8.5 to rhel8.4

Version-Release number of selected component (if applicable):
source host:
kernel: kernel-4.18.0-304.7.el8.kpq1.aarch64
qemu-kvm: qemu-kvm-6.0.0-16.module+el8.5.0+10848+2dccc46d.aarch64
[root@fujitsu-fx700-01-n01 ~]# lscpu
Architecture:        aarch64
Byte Order:          Little Endian
CPU(s):              48
On-line CPU(s) list: 0-47
Thread(s) per core:  1
Core(s) per cluster: 12
Socket(s):           1
Cluster(s):          4
NUMA node(s):        4
Vendor ID:           FUJITSU
BIOS Vendor ID:      FUJITSU
Model:               0
Model name:          A64FX
BIOS Model name:     461F0010
Stepping:            0x1
BogoMIPS:            200.00
NUMA node0 CPU(s):   0-11
NUMA node1 CPU(s):   12-23
NUMA node2 CPU(s):   24-35
NUMA node3 CPU(s):   36-47
Flags:               fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve


destination host:
kernel: 4.18.0-305.6.el8.aarch64
qemu-kvm: qemu-kvm-5.2.0-16.module+el8.4.0+10806+b7d97207.aarch64
[root@hpe-apollo80-01-n01 ~]# lscpu
Architecture:        aarch64
Byte Order:          Little Endian
CPU(s):              48
On-line CPU(s) list: 0-47
Thread(s) per core:  1
Core(s) per cluster: 12
Socket(s):           1
Cluster(s):          4
NUMA node(s):        4
Vendor ID:           FUJITSU
BIOS Vendor ID:      FUJITSU
Model:               0
Model name:          A64FX
BIOS Model name:     461F0010
Stepping:            0x1
BogoMIPS:            200.00
NUMA node0 CPU(s):   0-11
NUMA node1 CPU(s):   12-23
NUMA node2 CPU(s):   24-35
NUMA node3 CPU(s):   36-47
Flags:               fp asimd evtstrm sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm fcma dcpop sve


How reproducible:
1.start an incoming vm on destination side
[root@hpe-apollo80-01-n00 ~]# /usr/libexec/qemu-kvm     -name 'avocado-vt-vm1'      -sandbox on      -machine virt-rhel8.2.0,gic-version=host,graphics=on     -nodefaults     -m 1024      -smp 2      -cpu 'host'     -vnc :10      -enable-kvm     -monitor stdio -incoming tcp:0:8888

2. start a vm on source side and start migration
 [root@fujitsu-fx700-01-n00 auto_test_tool]#  /usr/libexec/qemu-kvm     -name 'avocado-vt-vm1'      -sandbox on      -machine virt-rhel8.2.0,gic-version=host,graphics=on     -nodefaults     -m 1024      -smp 2      -cpu 'host'     -vnc :10      -enable-kvm     -monitor stdio
QEMU 6.0.0 monitor - type 'help' for more information
(qemu) migrate -d tcp:10.19.241.163:8888
(qemu) info status
VM status: paused (postmigrate)
(qemu) info migrate
globals:
store-global-state: on
only-migratable: off
send-configuration: on
send-section-footer: on
decompress-error-check: on
clear-bitmap-shift: 18
Migration status: completed
total time: 6248 ms
downtime: 6 ms
setup: 2 ms
transferred ram: 2616 kbytes
throughput: 3.50 mbps
remaining ram: 0 kbytes
total ram: 1179904 kbytes
duplicate: 294970 pages
skipped: 0 pages
normal: 6 pages
normal bytes: 24 kbytes
dirty sync count: 3
page size: 4 kbytes
multifd bytes: 0 kbytes
pages-per-second: 1047930


Actual results:
destination host:
QEMU 5.2.0 monitor - type 'help' for more information
(qemu) qemu-kvm: error while loading state for instance 0x0 of device 'cpu'
qemu-kvm: load of migration failed: Operation not permitted

Expected results:
at least one VM is running

Comment 1 Qunfang Zhang 2021-05-18 08:30:00 UTC
@Zhijian, feel free to change qa contact if needed, thanks.

Comment 2 Li Xiaohui 2021-05-19 07:00:44 UTC
Hit same issue when migrate on hosts that have different cpu models but same kernel and qemu versions:
(qemu) qemu-kvm: error while loading state for instance 0x0 of device 'cpu'
qemu-kvm: load of migration failed: Operation not permitted

hosts info: kernel-5.12.0-1.el9.aarch64 & qemu-img-6.0.0-1.el9.aarch64
host cpu info:
src host ->    Model name: X-Gene,         Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid
dst host ->    Model name: ThunderX 88XX,  Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid
guest info: kernel-5.12.0-1.el9.aarch64

BTW this issue not only happened on the rhel9.0, but also on rhelav-8.5.0.



> I think the CPUs on these two machines aren't exactly same. Migration between not exactly same CPUs are expected to fail. 
> By the way, the issue isn't what this bugzilla is tracking. Please create another bugzilla to track the failure from the test 
> case#2 and assign to me, so that I can investigate

Hi Gavin, above is said by you in https://bugzilla.redhat.com/show_bug.cgi?id=1923881#c14.
Here I want to confirm with you about two questions so that I could take better tests in following rhel8.5.0 and rhel9.0:
1.What CPUs of src and dst host do you advise QE to do migration tests? In other words shall QE do migration test only on same CPUs? 
2.Could QE use different host CPUs to test migration? And if found some product issues on different CPUs, shall we file bzs to track them and will you fix them on arm?

Comment 3 Guowen Shan 2021-05-20 00:26:39 UTC
Hi Xiaohui,

The CPU models can be different even the information from "/proc/cpuinfo" matches. However,
the CPU models should be different if there is any difference in "/proc/cpuinfo".

Yes, please do migration between two machines, which have same CPU models. The safe way
is to do migration between two same machines.

As migration between different CPU models are to be failed, it's not helpful to do such
kind of tests. Migration between different CPU models aren't supported at all.

Thanks,
Gavin

Comment 4 Eric Auger 2021-05-20 08:44:51 UTC
FYI I have just sent the patches downstream for arm-virt 8.5 machine type (Bug 1957667 - [aarch64] Add 8.5 machine type, [RHEL-AV-8.5.0 PATCH 0/4] Add 8.5 arm-virt machine type). With those I have tested migration between qemu 8.5 and 8.4 using  virt-rhel8.4.0 and virt-rhel8.3.0 between 2 Seattle machines and it worked for me.

Comment 5 Andrew Jones 2021-05-20 11:05:48 UTC
(In reply to Li Xiaohui from comment #2)
> 1.What CPUs of src and dst host do you advise QE to do migration tests? In
> other words shall QE do migration test only on same CPUs? 

Please test on as many different AArch64 cpu types as we can find, *BUT* migration must be done between two machines of the *EXACT SAME* type, whatever type that might be.


> 2.Could QE use different host CPUs to test migration?

No

> And if found some
> product issues on different CPUs, shall we file bzs to track them and will
> you fix them on arm?

No, as we expect failures when migrating between different CPU types, we will not fix any bugs filed for that type of migration.

Note, this is not an AArch64 KVM specific requirement, but rather a migration when using CPU passthrough requirement (which, currently, is the only configuration supported by AArch64 KVM). When using CPU passthrough we can't migrate to a different CPU type because it will confuse QEMU and/or the guest. This is true for x86 CPU passthrough migration as well.

Additional note, it's not even supported to migrate between identical CPU types, but with different host kernel versions. When the host kernel version is different, that means KVM is different, and as KVM partially emulates the CPU that the guest sees, there's a chance that the CPU will change for the guest, just like it would if the actual hardware was to change.

We do strive, however, to allow migrations between hosts with identical CPU types where the source host is running an older kernel version than the destination host. We will investigate and possibly fix bugs found when doing migrations like that.

To be clear, please test these types of configurations:


    Source                                           Destination
    ======                                           ===========

 Host CPU type   Host kernel version             Host CPU type   Host kernel version
 --------------  -------------------             -------------   -------------------
 xyz             RHEL-8.4                        xyz             RHEL-8.4


    Source                                           Destination
    ======                                           ===========

 Host CPU type   Host kernel version             Host CPU type   Host kernel version
 --------------  -------------------             -------------   -------------------
 xyz             RHEL-8.4                        xyz             RHEL-8.5



Or, in general,

src_cpu=xyz,src_kernel=X  => dst_cpu=xyz,dst_kernel=X                 (supported and should work)
src_cpu=xyz,src_kernel=X  => dst_cpu=xyz,dst_kernel=Y, where Y > X    (not supported, but should work)


Thanks,
drew

Comment 6 Li Xiaohui 2021-05-20 13:01:37 UTC
Thank you Gavin, Andrew, bringing such detailed and clear explanations, I have a clear understanding of how to do migration testing on arm.

Comment 7 Andrew Jones 2021-05-20 15:43:03 UTC
(In reply to Li Xiaohui from comment #6)
> Thank you Gavin, Andrew, bringing such detailed and clear explanations, I
> have a clear understanding of how to do migration testing on arm.

Also, given a supported configuration (src_cpu==dst_cpu && src_kernel==dst_kernel), I'd prefer we test "ping-pong" migrations as opposed to just migrations. A "ping-pong" migration migrates to another host and then back again (src -> dst -> src).

Thanks,
drew

Comment 8 Li Xiaohui 2021-05-21 02:37:54 UTC
(In reply to Andrew Jones from comment #7)
> (In reply to Li Xiaohui from comment #6)
> > Thank you Gavin, Andrew, bringing such detailed and clear explanations, I
> > have a clear understanding of how to do migration testing on arm.
> 
> Also, given a supported configuration (src_cpu==dst_cpu &&
> src_kernel==dst_kernel), I'd prefer we test "ping-pong" migrations as
> opposed to just migrations. A "ping-pong" migration migrates to another host
> and then back again (src -> dst -> src).

Well noted, we will cover it. Thanks again.

> 
> Thanks,
> drew

Comment 10 John Ferlan 2021-09-09 15:13:56 UTC
Bulk update: Move RHEL-AV bugs to RHEL8.

Leaving in RHEL8 since this is tagged as a Fujitsu bug; however, if resolved and this needs to be include/tested for RHEL9, then a clone will be necessary

Comment 11 Guowen Shan 2021-09-20 23:27:01 UTC
This bug should be invalid one, as we never support the migration between
different CPU models. The hostnames tell that these two machines have
different CPUs:

  source:      fujitsu-fx700-01-n00
  destination: hpe-apollo80-01-n01

So I guess we can close this one as "invalid".

Comment 12 Xinjian Ma(Fujitsu) 2021-09-27 02:38:24 UTC
Sync with zhijian, 

Migration failed between different CPU models itself is ok, but we expect "at least one VM is running" without any additional action.
Here when migration failed, Neither on source host nor destination host, the VM is not running.

Comment 13 Eric Auger 2021-09-27 08:11:29 UTC
(In reply to Xinjian Ma(Fujitsu) from comment #12)
> Sync with zhijian, 
> 
> Migration failed between different CPU models itself is ok, but we expect
> "at least one VM is running" without any additional action.
> Here when migration failed, Neither on source host nor destination host, the
> VM is not running.

This is a fair assumption indeed. I will look at improving the code then. I suggest we move it to 8.6 then, as it may require some upstream changes, agreed?

Comment 15 Xinjian Ma(Fujitsu) 2021-09-27 08:47:40 UTC
> This is a fair assumption indeed. I will look at improving the code then. I suggest we move it to 8.6 then, as it may require some upstream changes, agreed?
Thanks and I totally agree

Comment 21 Eric Auger 2021-10-18 07:28:52 UTC
Temporary loans expired. Closing the BZ as NOTABUG following comment #17. If you consider this shall not be closed, please reopen and provide a test framework.


Note You need to log in before you can comment on or make changes to this bug.