Bug 1961519
| Summary: | qemu-kvm: error while loading state for instance 0x0 of device 'cpu' | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | Zhijian Li (Fujitsu) <zhijli> |
| Component: | qemu-kvm | Assignee: | Eric Auger <eric.auger> |
| qemu-kvm sub component: | Live Migration | QA Contact: | Xinjian Ma(Fujitsu) <xinma> |
| Status: | CLOSED NOTABUG | Docs Contact: | |
| Severity: | low | ||
| Priority: | low | CC: | dgilbert, drjones, eric.auger, gshan, lcapitulino, mmizuma, qzhang, virt-maint, xiaohli, xinma, yidliu, zhenyzha |
| Version: | 8.5 | Keywords: | OtherQA, Triaged |
| Target Milestone: | rc | Flags: | pm-rhel:
mirror+
|
| Target Release: | --- | ||
| Hardware: | aarch64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-10-18 07:28:52 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1875540, 1885765 | ||
|
Description
Zhijian Li (Fujitsu)
2021-05-18 07:27:34 UTC
@Zhijian, feel free to change qa contact if needed, thanks. Hit same issue when migrate on hosts that have different cpu models but same kernel and qemu versions: (qemu) qemu-kvm: error while loading state for instance 0x0 of device 'cpu' qemu-kvm: load of migration failed: Operation not permitted hosts info: kernel-5.12.0-1.el9.aarch64 & qemu-img-6.0.0-1.el9.aarch64 host cpu info: src host -> Model name: X-Gene, Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid dst host -> Model name: ThunderX 88XX, Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid guest info: kernel-5.12.0-1.el9.aarch64 BTW this issue not only happened on the rhel9.0, but also on rhelav-8.5.0. > I think the CPUs on these two machines aren't exactly same. Migration between not exactly same CPUs are expected to fail. > By the way, the issue isn't what this bugzilla is tracking. Please create another bugzilla to track the failure from the test > case#2 and assign to me, so that I can investigate Hi Gavin, above is said by you in https://bugzilla.redhat.com/show_bug.cgi?id=1923881#c14. Here I want to confirm with you about two questions so that I could take better tests in following rhel8.5.0 and rhel9.0: 1.What CPUs of src and dst host do you advise QE to do migration tests? In other words shall QE do migration test only on same CPUs? 2.Could QE use different host CPUs to test migration? And if found some product issues on different CPUs, shall we file bzs to track them and will you fix them on arm? Hi Xiaohui, The CPU models can be different even the information from "/proc/cpuinfo" matches. However, the CPU models should be different if there is any difference in "/proc/cpuinfo". Yes, please do migration between two machines, which have same CPU models. The safe way is to do migration between two same machines. As migration between different CPU models are to be failed, it's not helpful to do such kind of tests. Migration between different CPU models aren't supported at all. Thanks, Gavin FYI I have just sent the patches downstream for arm-virt 8.5 machine type (Bug 1957667 - [aarch64] Add 8.5 machine type, [RHEL-AV-8.5.0 PATCH 0/4] Add 8.5 arm-virt machine type). With those I have tested migration between qemu 8.5 and 8.4 using virt-rhel8.4.0 and virt-rhel8.3.0 between 2 Seattle machines and it worked for me. (In reply to Li Xiaohui from comment #2) > 1.What CPUs of src and dst host do you advise QE to do migration tests? In > other words shall QE do migration test only on same CPUs? Please test on as many different AArch64 cpu types as we can find, *BUT* migration must be done between two machines of the *EXACT SAME* type, whatever type that might be. > 2.Could QE use different host CPUs to test migration? No > And if found some > product issues on different CPUs, shall we file bzs to track them and will > you fix them on arm? No, as we expect failures when migrating between different CPU types, we will not fix any bugs filed for that type of migration. Note, this is not an AArch64 KVM specific requirement, but rather a migration when using CPU passthrough requirement (which, currently, is the only configuration supported by AArch64 KVM). When using CPU passthrough we can't migrate to a different CPU type because it will confuse QEMU and/or the guest. This is true for x86 CPU passthrough migration as well. Additional note, it's not even supported to migrate between identical CPU types, but with different host kernel versions. When the host kernel version is different, that means KVM is different, and as KVM partially emulates the CPU that the guest sees, there's a chance that the CPU will change for the guest, just like it would if the actual hardware was to change. We do strive, however, to allow migrations between hosts with identical CPU types where the source host is running an older kernel version than the destination host. We will investigate and possibly fix bugs found when doing migrations like that. To be clear, please test these types of configurations: Source Destination ====== =========== Host CPU type Host kernel version Host CPU type Host kernel version -------------- ------------------- ------------- ------------------- xyz RHEL-8.4 xyz RHEL-8.4 Source Destination ====== =========== Host CPU type Host kernel version Host CPU type Host kernel version -------------- ------------------- ------------- ------------------- xyz RHEL-8.4 xyz RHEL-8.5 Or, in general, src_cpu=xyz,src_kernel=X => dst_cpu=xyz,dst_kernel=X (supported and should work) src_cpu=xyz,src_kernel=X => dst_cpu=xyz,dst_kernel=Y, where Y > X (not supported, but should work) Thanks, drew Thank you Gavin, Andrew, bringing such detailed and clear explanations, I have a clear understanding of how to do migration testing on arm. (In reply to Li Xiaohui from comment #6) > Thank you Gavin, Andrew, bringing such detailed and clear explanations, I > have a clear understanding of how to do migration testing on arm. Also, given a supported configuration (src_cpu==dst_cpu && src_kernel==dst_kernel), I'd prefer we test "ping-pong" migrations as opposed to just migrations. A "ping-pong" migration migrates to another host and then back again (src -> dst -> src). Thanks, drew (In reply to Andrew Jones from comment #7) > (In reply to Li Xiaohui from comment #6) > > Thank you Gavin, Andrew, bringing such detailed and clear explanations, I > > have a clear understanding of how to do migration testing on arm. > > Also, given a supported configuration (src_cpu==dst_cpu && > src_kernel==dst_kernel), I'd prefer we test "ping-pong" migrations as > opposed to just migrations. A "ping-pong" migration migrates to another host > and then back again (src -> dst -> src). Well noted, we will cover it. Thanks again. > > Thanks, > drew Bulk update: Move RHEL-AV bugs to RHEL8. Leaving in RHEL8 since this is tagged as a Fujitsu bug; however, if resolved and this needs to be include/tested for RHEL9, then a clone will be necessary This bug should be invalid one, as we never support the migration between different CPU models. The hostnames tell that these two machines have different CPUs: source: fujitsu-fx700-01-n00 destination: hpe-apollo80-01-n01 So I guess we can close this one as "invalid". Sync with zhijian, Migration failed between different CPU models itself is ok, but we expect "at least one VM is running" without any additional action. Here when migration failed, Neither on source host nor destination host, the VM is not running. (In reply to Xinjian Ma(Fujitsu) from comment #12) > Sync with zhijian, > > Migration failed between different CPU models itself is ok, but we expect > "at least one VM is running" without any additional action. > Here when migration failed, Neither on source host nor destination host, the > VM is not running. This is a fair assumption indeed. I will look at improving the code then. I suggest we move it to 8.6 then, as it may require some upstream changes, agreed? > This is a fair assumption indeed. I will look at improving the code then. I suggest we move it to 8.6 then, as it may require some upstream changes, agreed?
Thanks and I totally agree
Temporary loans expired. Closing the BZ as NOTABUG following comment #17. If you consider this shall not be closed, please reopen and provide a test framework. |