== Comment: #0 - Anushree Mathur <Anushree.Mathur2> - 2024-02-20 06:06:13 == HOST ENV: Kernel - 6.8.0-rc5 OS : Fedora39 qemu : # qemu-system-ppc64 --version QEMU emulator version 8.1.3 (qemu-8.1.3-3.fc39) Copyright (c) 2003-2023 Fabrice Bellard and the QEMU Project developers libvirt : libvirtd (libvirt) 9.7.0 GUEST ENV : OS : Fedora39 Kernel : 6.7.4-200.fc39.ppc64le I have been trying for the CPU hotplugging to the guest with maxvcpus as 128 and current value I am giving as 4! but when I try to hotplug 68 vcpus to the guest, it crahses and we get error message as: [ 303.808494] KVM: Create Guest vcpu hcall failed, rc=-44 error: Unable to read from monitor: Connection reset by peer Steps to reproduce: 1) virsh define bug.xml 2) virsh start Fedora39 --console 3) virsh setvcpus Fedora39 68 Output : [ 662.102542] KVM: Create Guest vcpu hcall failed, rc=-44 error: Unable to read from monitor: Connection reset by peer If resources are less, in my thinking it should fail gracefully! Attaching the XML file that i have used and will post the observations on MDC system there i saw this same failure on higher number. == Comment: #1 - Anushree Mathur <Anushree.Mathur2> - 2024-02-20 10:09:53 == == Comment: #2 - Kautuk Consul <Kautuk.Consul> - 2024-02-21 23:06:32 == Hi Anushree.Mathur2, -44 means "not enough resources". This means that the hypercall that creates an L2 vcpu cannot be created because of lack of resources such as memory. This is a phyp issue. For powerpc systems I do not think that the maxvcpus value in the XML is actually intimated to phyp, so it cannot pre-allocate resources for 128 vcpus. Can you try the following steps: i) Can you please increase the amount of RAM allocated to the LPAR and then test ? ii) Try first hotplugging 10 vcpus and then gradually increase them to 68. Maybe lesser number of vcpu hotplugs will work. This will also prove that phyp has enough resources to hotplug lesser vcpus and maybe 68 is too high for it ? == Comment: #3 - Kautuk Consul <Kautuk.Consul> - 2024-02-21 23:09:53 == Also, if you have more than 1 L2 guests then can you try these steps with only one of them? == Comment: #4 - Anushree Mathur <Anushree.Mathur2> - 2024-02-21 23:13:48 == Thanks for the reply Kautuk! I have tried with lesser values, it worked fine! I agree that may be resources are not enough, but shouldn't it be failing gracefully! Guest crash should not happen according to me! == Comment: #5 - Kautuk Consul <Kautuk.Consul> - 2024-02-22 01:36:06 == Hi Anushree.Mathur2, Your point is valid. In an ideal scenario the guest should keep running with the successfully allocated resources and should maybe only print 1 error. In the qemu source code any vcpu is created (before or after initially running the guest) by creating a vcpu thread which calls kvm_vcpu_thread_fn() -> kvm_init_vcpu(cpu, &error_fatal). Inside the kvm_init_vcpu() function, we call kvm_get_vcpu() -> kvm_vm_ioctl(s, KVM_CREATE_VCPU, (void *)vcpu_id). If the KVM_CREATE_VCPU ioctl fails we then will call the following function: error_setg_errno(errp, -ret, "kvm_init_vcpu: kvm_get_vcpu failed (%lu)", kvm_arch_vcpu_id(cpu)); In error_setg_errno -> error_setg_errno_internal -> error_setv -> error_handle we have the following code that does an exit() from the guest: if (errp == &error_fatal) { error_report_err(err); exit(1); } Since we called kvm_init_vcpu with &error_fatal as an argument, the guest then exits out with 1 as exit code. From the source code it is evident that: i) If a vcpu allocation fails then the qemu application will regard it as a critical error and it will quit the entire qemu process and its guest. ii) The creation of a vcpu dynamically using "virsh setvcpus" is not handled in a different way than normal vcpu creation. Even if we handle it in a different way, we need to understand that non-allocation of a vcpu is a serious resource allocation error. If we let the guest continue to execute then it could quit due to various other insufficient resources later. We don't want that and we need to make sure that the platform has enough resources before running our workloads on our L2 guests. iii) With the current qemu architecture it would anyway be slightly difficult to differentiate between static and dynamic vcpu creation. It could lead to instability due to regressions in various other error scenarios. Since phyp is not able to allocate the required number of vcpus I recommend that: i) We can increase the RAM allocated to the LPAR. ii) We can contact the phyp team to show them that while the maxvcpus is 128 this "virsh setvcpus Fedora39 68" is failing due to which the L2 guest is exiting because phyp is not allocating the requested resources. In a nutshell phyp is not allocating critical resources reasonably when we are well within the 128 vcpus range. Since we treat these resources as critical our qemu hypervisor process is killing our L2 guests. == Comment: #6 - Kautuk Consul <Kautuk.Consul> - 2024-02-22 03:06:32 == On second thoughts, since I don't see the maxvcpus being sent to the L0 phyp in any manner according to the formal V2 API specification I think maybe phyp people will also ask us to increase the RAM of the LPAR to make the 128 vcpus succeed. So then my suggestion would be to simply increase the RAM of the LPAR to try. Maybe in later versions of the v2 API specification we can add support for sending the maxvcpus to L0 so that phyp can pre-allocate the correct resources and thus act reasonable with respect to the guest xml. But that takes time. So for now just increase the RAM of the LPAR and try. You can also maybe come up with some sort of rough calculation for ascertaining the RAM required in an LPAR based on the number of VCPUs (and other resources) needed by the L2s that you intend to run on that LPAR ? == Comment: #7 - Kautuk Consul <Kautuk.Consul> - 2024-02-22 22:24:30 == Hi @sthoufee.com, can you please mirror this bug to pHYP team ? After talking to the team, we decided will need to indeed implement maxvcpus intimation to L0 pHYP and their handling of this element by pre-allocating vcpus for this LPAR. After they implement this we can go ahead and put our kernel changes for this. == Comment: #8 - Anushree Mathur <Anushree.Mathur2> - 2024-02-26 23:04:50 == (In reply to comment #6) > On second thoughts, since I don't see the maxvcpus being sent to the L0 phyp > in any > manner according to the formal V2 API specification I think maybe phyp > people will > also ask us to increase the RAM of the LPAR to make the 128 vcpus succeed. > > So then my suggestion would be to simply increase the RAM of the LPAR to try. > > Maybe in later versions of the v2 API specification we can add support for > sending the maxvcpus to L0 so that phyp can pre-allocate the correct > resources > and thus act reasonable with respect to the guest xml. But that takes time. > > So for now just increase the RAM of the LPAR and try. You can also maybe > come up > with some sort of rough calculation for ascertaining the RAM required in an > LPAR > based on the number of VCPUs (and other resources) needed by the L2s that you > intend to run on that LPAR ? On MDC when I tried with more RAM it worked fine for 128 vcpus!On my system also I will try once with increasing RAM but I totally agree it is resource depletion problem only! == Comment: #10 - Application Cdeadmin <cdeadmin.com> - 2024-02-27 08:23:12 == Active defects need to be in an open state. Hypervisor said they would comment == Comment: #11 - Application Cdeadmin <cdeadmin.com> - 2024-02-27 08:23:14 == <===This is bridged from RTC description===> This is the description of the defect added by LTC - EWM bridge automatically, bridged from LTC bug https://bugzilla.linux.ibm.com/show_bug.cgi?id=205620. </===End of RTC description===> == Comment: #12 - Application Cdeadmin <cdeadmin.com> - 2024-02-27 08:33:08 == This is the description of the defect added by LTC - EWM bridge automatically, bridged from LTC bug https://bugzilla.linux.ibm.com/show_bug.cgi?id=205620. == Comment: #15 - Application Cdeadmin <cdeadmin.com> - 2024-02-29 09:53:09 == Per discussion between PHYP and the Linux team on 2/27 (US) / 2/28 (India), we are currently waiting for a comeback from the Linux team on how they would like to proceed here. I believe @svaidyan.com and @npiggin.com are leading this effort. == Comment: #25 - Anushree Mathur <Anushree.Mathur2> - 2024-04-19 01:27:38 == Hi harsh, I have tried with following combinations for CPU hotplug after applying this (https://lists.linux.ibm.com/mailinglists/pipermail/ltc-kvm-dev/2024-April/000173.html) patch: On host root@ltcden6-lp4:~# virsh setvcpus hotplug 66 root@ltcden6-lp4:~# virsh setvcpus hotplug 67 root@ltcden6-lp4:~# virsh setvcpus hotplug 68 error: internal error: unable to execute QEMU command 'device_add': kvmppc_cpu_realize: vcpu hotplug failed with -12 On guest: The guest is not crashing now. [root@localhost ~]# lscpu Architecture: ppc64le Byte Order: Little Endian CPU(s): 67 On-line CPU(s) list: 0-66 Model name: POWER10 (architected), altivec supported Model: 2.0 (pvr 0080 0200) Thread(s) per core: 1 Core(s) per socket: 22 Socket(s): 3 Virtualization features: Hypervisor vendor: KVM Virtualization type: para Caches (sum of all): L1d: 2.1 MiB (67 instances) L1i: 3.1 MiB (67 instances) NUMA: NUMA node(s): 2 NUMA node0 CPU(s): 0-31,64-66 NUMA node1 CPU(s): 32-63 Vulnerabilities: Gather data sampling: Not affected Itlb multihit: Not affected L1tf: Mitigation; RFI Flush, L1D private per thread Mds: Not affected Meltdown: Mitigation; RFI Flush, L1D private per thread Mmio stale data: Not affected Retbleed: Not affected Spec rstack overflow: Not affected Spec store bypass: Mitigation; Kernel entry/exit barrier (eieio) Spectre v1: Mitigation; __user pointer sanitization, ori31 speculat ion barrier enabled Spectre v2: Mitigation; Software count cache flush (hardware accele rated), Software link stack flush Srbds: Not affected Tsx async abort: Not affected [root@localhost ~]# HOST ENV kernel version: 6.8.5-301.fc40.ppc64le qemu : QEMU emulator version 8.2.2 (qemu-8.2.2-1.fc40) libvirt : libvirtd (libvirt) 10.1.0 Thanks, Anushree-Mathur == Comment: #26 - Harsh Prateek Bora <Harsh.Prateek.Bora> - 2024-05-06 00:37:29 == Patch has been posted upstream: https://lists.nongnu.org/archive/html/qemu-ppc/2024-04/msg00264.html == Comment: #27 - Anushree Mathur <Anushree.Mathur2> - 2024-05-16 00:48:38 == Hi Harsh, The patch that we verified yesterday could you please put the upstream link for that patch once you post it! == Comment: #28 - Harsh Prateek Bora <Harsh.Prateek.Bora> - 2024-05-16 01:00:14 == Hi Anushree, Thanks for helping with patch v2 validation, it has been posted upstream now: https://lore.kernel.org/qemu-devel/20240516053211.145504-1-harshpb@linux.ibm.com/T/#t I have included your "Tested-by" in patch 4/4 of the series which is the actual fix for ppc. Thanks Harsh == Comment: #33 - Harsh Prateek Bora <Harsh.Prateek.Bora> - 2024-08-04 23:22:29 == Fix had been merged upstream, please validate and close. Thanks. https://github.com/qemu/qemu/commit/cfb52d07f53aa916003d43f69c945c2b42bc6374
Created attachment 2043961 [details] XML file for defining the guest
Created attachment 2043962 [details] Sosreport for the Fedora39 host
This bug is filed against Rawhide, but the Summary suggests that you desire a backport to F39. Can you be clear on what version(s) should be fixed? Note that Rawhide will get 9.1.0 soon after it is released upstream, so if you just want to fix this in Rawhide then there's nothing to be done.
------- Comment From Harsh.Prateek.Bora 2024-08-13 01:23 EDT------- I think the fixes would need to be backported to F40. FWIW, the fix is a 3 patch series and has a pre-req patch from Salil as well: Pre-req patch: 08c3286822 accel/kvm: Extract common KVM vCPU {creation,parking} code Fix patches: c6a3d7bc9e accel/kvm: Introduce kvm_create_and_park_vcpu() helper 18530e7c57 cpu-common.c: export cpu_get_free_index to be reused later cfb52d07f5 target/ppc: handle vcpu hotplug failure gracefully
------- Comment From sthoufee.com 2024-08-22 02:18 EDT------- (In reply to comment #37) > This bug is filed against Rawhide, but the Summary suggests that you desire > a backport > to F39. Can you be clear on what version(s) should be fixed? > Note that Rawhide will get 9.1.0 soon after it is released upstream, so if > you just > want to fix this in Rawhide then there's nothing to be done. we want to fix this issue in F40.
https://koji.fedoraproject.org/koji/taskinfo?taskID=122297312
FEDORA-2024-dd1467eb6f (qemu-8.2.6-2.fc40) has been submitted as an update to Fedora 40. https://bodhi.fedoraproject.org/updates/FEDORA-2024-dd1467eb6f
FEDORA-2024-dd1467eb6f has been pushed to the Fedora 40 testing repository. Soon you'll be able to install the update with the following command: `sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2024-dd1467eb6f` You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2024-dd1467eb6f See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.
FEDORA-2024-d18acd2287 has been pushed to the Fedora 40 testing repository. Soon you'll be able to install the update with the following command: `sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2024-d18acd2287` You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2024-d18acd2287 See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.
FEDORA-2024-d18acd2287 (qemu-8.2.6-3.fc40) has been pushed to the Fedora 40 stable repository. If problem still persists, please make note of it in this bug report.
------- Comment From Anushree.Mathur2 2024-09-03 04:32 EDT------- I have validated the patch by upgrading the system using the following link and it is working fine: On host: # virsh setvcpus local 800 # qemu-system-ppc64 --version QEMU emulator version 8.2.6 (qemu-8.2.6-3.fc40) Message on the host console: KVM: Create Guest vcpu hcall failed, rc=-44 L2 continues to run! Closing this bug now. Anushree Mathur
------- Comment From Anushree.Mathur2 2024-09-09 02:23 EDT------- I have validated the same scenario on fedora41 OS and it is working fine! Analysis: On host: :~# virsh setvcpus check 800 error: internal error: unable to execute QEMU command 'device_add': kvmppc_cpu_realize: vcpu hotplug failed with -12 :~# cat /etc/os-release |grep Fedora NAME="Fedora Linux" PRETTY_NAME="Fedora Linux 41 (Server Edition Prerelease)" REDHAT_BUGZILLA_PRODUCT="Fedora" REDHAT_SUPPORT_PRODUCT="Fedora" :~# qemu-system-ppc64 --version QEMU emulator version 9.0.93 (qemu-9.1.0-0.2.rc3.fc41) Copyright (c) 2003-2024 Fabrice Bellard and the QEMU Project developers :~# uname -a Linux ltcden6-lp8.aus.stglabs.ibm.com 6.11.0-0.rc5.43.fc41.ppc64le #1 SMP Sun Aug 25 20:26:26 UTC 2024 ppc64le GNU/Linux Message on the host console: KVM: Create Guest vcpu hcall failed, rc=-44 uname -a Linux localhost.localdomain 6.11.0-0.rc5.43.fc41.ppc64le #1 SMP Sun Aug 25 20:26:26 UTC 2024 ppc64le GNU/Linux L2 continues to run! Closing this bug now. Thanks Anushree Mathur
------- Comment From Anushree.Mathur2 2024-09-11 04:02 EDT------- Fedora has taken our fixes as mentioned here: https://src.fedoraproject.org/rpms/qemu. Thanks Anushree Mathur