2304078 – F40 cpu hotplug crashes the guest

Bug 2304078 - F40 cpu hotplug crashes the guest

Summary: F40 cpu hotplug crashes the guest

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	qemu
Sub Component:
Version:	40
Hardware:	ppc64le
OS:	All
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	Richard W.M. Jones
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2024-08-12 08:51 UTC by IBM Bug Proxy
Modified:	2024-09-11 08:11 UTC (History)
CC List:	11 users (show)
Fixed In Version:	qemu-8.2.6-3.fc40
Clone Of:
Environment:
Last Closed:	2024-08-28 02:36:34 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
XML file for defining the guest (3.76 KB, application/octet-stream) 2024-08-12 08:51 UTC, IBM Bug Proxy	no flags	Details
Sosreport for the Fedora39 host (14.81 MB, application/octet-stream) 2024-08-12 08:51 UTC, IBM Bug Proxy	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
IBM Linux Technology Center	205620	0	None	None	None	2024-08-12 08:51:43 UTC

Description IBM Bug Proxy 2024-08-12 08:51:18 UTC

Comment 1 IBM Bug Proxy 2024-08-12 08:51:31 UTC

== Comment: #0 - Anushree Mathur <Anushree.Mathur2> - 2024-02-20 06:06:13 ==
HOST ENV:
Kernel - 6.8.0-rc5
OS : Fedora39
qemu : # qemu-system-ppc64 --version
QEMU emulator version 8.1.3 (qemu-8.1.3-3.fc39)
Copyright (c) 2003-2023 Fabrice Bellard and the QEMU Project developers
libvirt : libvirtd (libvirt) 9.7.0


GUEST ENV :
OS : Fedora39
Kernel : 6.7.4-200.fc39.ppc64le


I have been trying for the CPU hotplugging to the guest with maxvcpus as 128 and current value I am giving as 4! but when I try to hotplug 68 vcpus to the guest, it crahses and we get error message as: 
[  303.808494] KVM: Create Guest vcpu hcall failed, rc=-44
error: Unable to read from monitor: Connection reset by peer
 

Steps to reproduce:

1) virsh define bug.xml

2) virsh start Fedora39 --console

3) virsh setvcpus Fedora39 68

Output : 
[  662.102542] KVM: Create Guest vcpu hcall failed, rc=-44
error: Unable to read from monitor: Connection reset by peer


If resources are less, in my thinking it should fail gracefully! 
Attaching the XML file that i have used and will post the observations on MDC system there i saw this same failure on higher number.

== Comment: #1 - Anushree Mathur <Anushree.Mathur2> - 2024-02-20 10:09:53 ==


== Comment: #2 - Kautuk Consul <Kautuk.Consul> - 2024-02-21 23:06:32 ==
Hi Anushree.Mathur2,

-44 means "not enough resources". This means that the hypercall that creates an L2 vcpu cannot be created because of lack of resources such as memory. This is a phyp issue.

For powerpc systems I do not think that the maxvcpus value in the XML is actually intimated to phyp, so it cannot pre-allocate resources for 128 vcpus.

Can you try the following steps: 
i) Can you please increase the amount of RAM allocated to the LPAR and then test ? ii) Try first hotplugging 10 vcpus and then gradually increase them to 68. Maybe lesser number of vcpu hotplugs will work. This will also prove that phyp has enough resources to hotplug lesser vcpus and maybe 68 is too high for it ?

== Comment: #3 - Kautuk Consul <Kautuk.Consul> - 2024-02-21 23:09:53 ==
Also, if you have more than 1 L2 guests then can you try these steps with only one of them?

== Comment: #4 - Anushree Mathur <Anushree.Mathur2> - 2024-02-21 23:13:48 ==
Thanks for the reply Kautuk! I have tried with lesser values, it worked fine! I agree that may be resources are not enough, but shouldn't it be failing gracefully!
Guest crash should not happen according to me!

== Comment: #5 - Kautuk Consul <Kautuk.Consul> - 2024-02-22 01:36:06 ==
Hi Anushree.Mathur2,

Your point is valid. In an ideal scenario the guest should keep running with the successfully allocated resources and should maybe only print 1 error.

In the qemu source code any vcpu is created (before or after initially running the guest) by creating a vcpu thread which calls
kvm_vcpu_thread_fn() -> kvm_init_vcpu(cpu, &error_fatal).

Inside the kvm_init_vcpu() function, we call kvm_get_vcpu() -> kvm_vm_ioctl(s, KVM_CREATE_VCPU, (void *)vcpu_id).

If the KVM_CREATE_VCPU ioctl fails we then will call the following function:
error_setg_errno(errp, -ret, "kvm_init_vcpu: kvm_get_vcpu failed (%lu)", kvm_arch_vcpu_id(cpu));

In error_setg_errno -> error_setg_errno_internal -> error_setv -> error_handle we have the following code that does an exit() from the guest:
    if (errp == &error_fatal) {
        error_report_err(err);
        exit(1);
    }

Since we called kvm_init_vcpu with &error_fatal as an argument, the guest then exits out with 1 as exit code.

From the source code it is evident that:
i)   If a vcpu allocation fails then the qemu application will regard it as a critical error and it will quit the entire qemu process and its guest.
ii)  The creation of a vcpu dynamically using "virsh setvcpus" is not handled in a different way than normal vcpu creation. Even if we handle it in a different way,
     we need to understand that non-allocation of a vcpu is a serious resource allocation error. If we let the guest continue to execute then it could quit due to various
     other insufficient resources later. We don't want that and we need to make sure that the platform has enough resources before running our workloads on our L2 guests.
iii) With the current qemu architecture it would anyway be slightly difficult to differentiate between static and dynamic vcpu creation. It could lead to instability due to
     regressions in various other error scenarios.

Since phyp is not able to allocate the required number of vcpus I recommend that:
i)  We can increase the RAM allocated to the LPAR.
ii) We can contact the phyp team to show them that while the maxvcpus is 128 this "virsh setvcpus Fedora39 68" is failing due to which the L2 guest is exiting because phyp
    is not allocating the requested resources.

In a nutshell phyp is not allocating critical resources reasonably when we are well within the 128 vcpus range. Since we treat these resources as critical our qemu hypervisor process is killing our L2 guests.

== Comment: #6 - Kautuk Consul <Kautuk.Consul> - 2024-02-22 03:06:32 ==
On second thoughts, since I don't see the maxvcpus being sent to the L0 phyp in any 
manner according to the formal V2 API specification I think maybe phyp people will 
also ask us to increase the RAM of the LPAR to make the 128 vcpus succeed.

So then my suggestion would be to simply increase the RAM of the LPAR to try.

Maybe in later versions of the v2 API specification we can add support for
sending the maxvcpus to L0 so that phyp can pre-allocate the correct resources
and thus act reasonable with respect to the guest xml. But that takes time.

So for now just increase the RAM of the LPAR and try. You can also maybe come up 
with some sort of rough calculation for ascertaining the RAM required in an LPAR
based on the number of VCPUs (and other resources) needed by the L2s that you
intend to run on that LPAR ?

== Comment: #7 - Kautuk Consul <Kautuk.Consul> - 2024-02-22 22:24:30 ==
Hi @sthoufee.com, can you please mirror this bug to pHYP team ?

After talking to the team, we decided will need to indeed implement maxvcpus intimation to L0 pHYP and their handling of this element by pre-allocating vcpus for this LPAR. After they implement this we can go ahead and put our kernel changes for this.

== Comment: #8 - Anushree Mathur <Anushree.Mathur2> - 2024-02-26 23:04:50 ==
(In reply to comment #6)
> On second thoughts, since I don't see the maxvcpus being sent to the L0 phyp
> in any 
> manner according to the formal V2 API specification I think maybe phyp
> people will 
> also ask us to increase the RAM of the LPAR to make the 128 vcpus succeed.
> 
> So then my suggestion would be to simply increase the RAM of the LPAR to try.
> 
> Maybe in later versions of the v2 API specification we can add support for
> sending the maxvcpus to L0 so that phyp can pre-allocate the correct
> resources
> and thus act reasonable with respect to the guest xml. But that takes time.
> 
> So for now just increase the RAM of the LPAR and try. You can also maybe
> come up 
> with some sort of rough calculation for ascertaining the RAM required in an
> LPAR
> based on the number of VCPUs (and other resources) needed by the L2s that you
> intend to run on that LPAR ?

On MDC when I tried with more RAM it worked fine for 128 vcpus!On my system also I will try once with increasing RAM but I totally agree it is resource depletion problem only!

== Comment: #10 - Application Cdeadmin <cdeadmin.com> - 2024-02-27 08:23:12 ==
Active defects need to be in an open state.

Hypervisor said they would comment

== Comment: #11 - Application Cdeadmin <cdeadmin.com> - 2024-02-27 08:23:14 ==
<===This is bridged from RTC description===>
This is the description of the defect added by LTC - EWM bridge automatically, bridged from LTC bug https://bugzilla.linux.ibm.com/show_bug.cgi?id=205620.
</===End of RTC description===>

== Comment: #12 - Application Cdeadmin <cdeadmin.com> - 2024-02-27 08:33:08 ==
This is the description of the defect added by LTC - EWM bridge automatically, bridged from LTC bug https://bugzilla.linux.ibm.com/show_bug.cgi?id=205620.

== Comment: #15 - Application Cdeadmin <cdeadmin.com> - 2024-02-29 09:53:09 ==
Per discussion between PHYP and the Linux team on 2/27 (US) / 2/28 (India), we are currently waiting for a comeback from the Linux team on how they would like to proceed here.  I believe @svaidyan.com and @npiggin.com are leading this effort.

== Comment: #25 - Anushree Mathur <Anushree.Mathur2> - 2024-04-19 01:27:38 ==
Hi harsh,
I have tried with following combinations for CPU hotplug after applying this (https://lists.linux.ibm.com/mailinglists/pipermail/ltc-kvm-dev/2024-April/000173.html) patch:

On host
root@ltcden6-lp4:~# virsh setvcpus hotplug 66

root@ltcden6-lp4:~# virsh setvcpus hotplug 67

root@ltcden6-lp4:~# virsh setvcpus hotplug 68
error: internal error: unable to execute QEMU command 'device_add': kvmppc_cpu_realize: vcpu hotplug failed with -12

On guest:
The guest is not crashing now.
[root@localhost ~]# lscpu
Architecture:            ppc64le
  Byte Order:            Little Endian
CPU(s):                  67
  On-line CPU(s) list:   0-66
Model name:              POWER10 (architected), altivec supported
  Model:                 2.0 (pvr 0080 0200)
  Thread(s) per core:    1
  Core(s) per socket:    22
  Socket(s):             3
Virtualization features: 
  Hypervisor vendor:     KVM
  Virtualization type:   para
Caches (sum of all):     
  L1d:                   2.1 MiB (67 instances)
  L1i:                   3.1 MiB (67 instances)
NUMA:                    
  NUMA node(s):          2
  NUMA node0 CPU(s):     0-31,64-66
  NUMA node1 CPU(s):     32-63
Vulnerabilities:         
  Gather data sampling:  Not affected
  Itlb multihit:         Not affected
  L1tf:                  Mitigation; RFI Flush, L1D private per thread
  Mds:                   Not affected
  Meltdown:              Mitigation; RFI Flush, L1D private per thread
  Mmio stale data:       Not affected
  Retbleed:              Not affected
  Spec rstack overflow:  Not affected
  Spec store bypass:     Mitigation; Kernel entry/exit barrier (eieio)
  Spectre v1:            Mitigation; __user pointer sanitization, ori31 speculat
                         ion barrier enabled
  Spectre v2:            Mitigation; Software count cache flush (hardware accele
                         rated), Software link stack flush
  Srbds:                 Not affected
  Tsx async abort:       Not affected
[root@localhost ~]# 

HOST ENV
kernel version: 6.8.5-301.fc40.ppc64le 
qemu : QEMU emulator version 8.2.2 (qemu-8.2.2-1.fc40)
libvirt : libvirtd (libvirt) 10.1.0

Thanks,
Anushree-Mathur

== Comment: #26 - Harsh Prateek Bora <Harsh.Prateek.Bora> - 2024-05-06 00:37:29 ==
Patch has been posted upstream:

https://lists.nongnu.org/archive/html/qemu-ppc/2024-04/msg00264.html

== Comment: #27 - Anushree Mathur <Anushree.Mathur2> - 2024-05-16 00:48:38 ==
Hi Harsh,
The patch that we verified yesterday could you please put the upstream link for that patch once you post it!

== Comment: #28 - Harsh Prateek Bora <Harsh.Prateek.Bora> - 2024-05-16 01:00:14 ==
Hi Anushree,
Thanks for helping with patch v2 validation, it has been posted upstream now:

https://lore.kernel.org/qemu-devel/20240516053211.145504-1-harshpb@linux.ibm.com/T/#t

I have included your "Tested-by" in patch 4/4 of the series which is the actual fix for ppc.

Thanks
Harsh

== Comment: #33 - Harsh Prateek Bora <Harsh.Prateek.Bora> - 2024-08-04 23:22:29 ==
Fix had been merged upstream, please validate and close. Thanks.

https://github.com/qemu/qemu/commit/cfb52d07f53aa916003d43f69c945c2b42bc6374

Comment 2 IBM Bug Proxy 2024-08-12 08:51:48 UTC

Created attachment 2043961 [details]
XML file for defining the guest

Comment 3 IBM Bug Proxy 2024-08-12 08:51:54 UTC

Created attachment 2043962 [details]
Sosreport for the Fedora39 host

Comment 4 Richard W.M. Jones 2024-08-12 09:59:13 UTC

This bug is filed against Rawhide, but the Summary suggests that you desire a backport
to F39.  Can you be clear on what version(s) should be fixed?

Note that Rawhide will get 9.1.0 soon after it is released upstream, so if you just
want to fix this in Rawhide then there's nothing to be done.

Comment 5 IBM Bug Proxy 2024-08-13 05:30:41 UTC

------- Comment From Harsh.Prateek.Bora 2024-08-13 01:23 EDT-------
I think the fixes would need to be backported to F40.
FWIW, the fix is a 3 patch series and has a pre-req patch from Salil as well:

Pre-req patch:
08c3286822 accel/kvm: Extract common KVM vCPU {creation,parking} code

Fix patches:
c6a3d7bc9e accel/kvm: Introduce kvm_create_and_park_vcpu() helper
18530e7c57 cpu-common.c: export cpu_get_free_index to be reused later
cfb52d07f5 target/ppc: handle vcpu hotplug failure gracefully

Comment 6 IBM Bug Proxy 2024-08-22 06:20:43 UTC

------- Comment From sthoufee.com 2024-08-22 02:18 EDT-------
(In reply to comment #37)
> This bug is filed against Rawhide, but the Summary suggests that you desire
> a backport
> to F39.  Can you be clear on what version(s) should be fixed?
> Note that Rawhide will get 9.1.0 soon after it is released upstream, so if
> you just
> want to fix this in Rawhide then there's nothing to be done.

we want to fix this issue in F40.

Comment 7 Richard W.M. Jones 2024-08-22 07:43:37 UTC

https://koji.fedoraproject.org/koji/taskinfo?taskID=122297312

Comment 8 Fedora Update System 2024-08-22 09:46:56 UTC

FEDORA-2024-dd1467eb6f (qemu-8.2.6-2.fc40) has been submitted as an update to Fedora 40.
https://bodhi.fedoraproject.org/updates/FEDORA-2024-dd1467eb6f

Comment 9 Fedora Update System 2024-08-23 02:57:34 UTC

FEDORA-2024-dd1467eb6f has been pushed to the Fedora 40 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2024-dd1467eb6f`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2024-dd1467eb6f

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 10 Fedora Update System 2024-08-25 01:27:43 UTC

FEDORA-2024-d18acd2287 has been pushed to the Fedora 40 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2024-d18acd2287`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2024-d18acd2287

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 11 Fedora Update System 2024-08-28 02:36:34 UTC

FEDORA-2024-d18acd2287 (qemu-8.2.6-3.fc40) has been pushed to the Fedora 40 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 12 IBM Bug Proxy 2024-09-03 08:40:38 UTC

------- Comment From Anushree.Mathur2 2024-09-03 04:32 EDT-------
I have validated the patch by upgrading the system using the following link and it is working fine:

On host:
# virsh setvcpus local 800

# qemu-system-ppc64 --version
QEMU emulator version 8.2.6 (qemu-8.2.6-3.fc40)

Message on the host console:
KVM: Create Guest vcpu hcall failed, rc=-44

L2 continues to run! Closing this bug now.

Anushree Mathur

Comment 13 IBM Bug Proxy 2024-09-09 06:31:02 UTC

------- Comment From Anushree.Mathur2 2024-09-09 02:23 EDT-------
I have validated the same scenario on fedora41 OS and it is working fine!
Analysis:

On host:
:~# virsh setvcpus check 800
error: internal error: unable to execute QEMU command 'device_add': kvmppc_cpu_realize: vcpu hotplug failed with -12

:~# cat /etc/os-release |grep  Fedora
NAME="Fedora Linux"
PRETTY_NAME="Fedora Linux 41 (Server Edition Prerelease)"
REDHAT_BUGZILLA_PRODUCT="Fedora"
REDHAT_SUPPORT_PRODUCT="Fedora"

:~# qemu-system-ppc64 --version
QEMU emulator version 9.0.93 (qemu-9.1.0-0.2.rc3.fc41)
Copyright (c) 2003-2024 Fabrice Bellard and the QEMU Project developers

:~# uname -a
Linux ltcden6-lp8.aus.stglabs.ibm.com 6.11.0-0.rc5.43.fc41.ppc64le #1 SMP Sun Aug 25 20:26:26 UTC 2024 ppc64le GNU/Linux

Message on the host console:
KVM: Create Guest vcpu hcall failed, rc=-44

uname -a
Linux localhost.localdomain 6.11.0-0.rc5.43.fc41.ppc64le #1 SMP Sun Aug 25 20:26:26 UTC 2024 ppc64le GNU/Linux

L2 continues to run! Closing this bug now.

Thanks
Anushree Mathur

Comment 14 IBM Bug Proxy 2024-09-11 08:11:14 UTC

------- Comment From Anushree.Mathur2 2024-09-11 04:02 EDT-------
Fedora has taken our fixes as mentioned here: https://src.fedoraproject.org/rpms/qemu.

Thanks
Anushree Mathur

Note You need to log in before you can comment on or make changes to this bug.