Bug 1834200
Summary: | cpu_x86_cpuid: Assertion `cpu->core_id <= 255' failed | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux Advanced Virtualization | Reporter: | Yumei Huang <yuhuang> |
Component: | qemu-kvm | Assignee: | Eduardo Habkost <ehabkost> |
qemu-kvm sub component: | QMP Monitor and CLI | QA Contact: | Yumei Huang <yuhuang> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | medium | ||
Priority: | medium | CC: | ailan, babu.moger, chayang, dgilbert, ehabkost, juzhang, mrezanin, virt-maint, wehuang, wei.huang2 |
Version: | 8.2 | ||
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | qemu-kvm-5.2.0-1.module+el8.4.0+9091+650b220a | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2021-05-25 06:42:08 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | commit in BZ |
Embargoed: |
Description
Yumei Huang
2020-05-11 09:55:09 UTC
Eduardo - setting the ITR == 8.2.1 - feel free to reset to '---'... Although seems bug 1828750 may be related. Seems commit ed78467a2 is where the assert was first added (and it's been there a while too since 3.0) Dave, do you think we can get an AMD engineer to fix this upstream? Yeh, lets add a needinfo on Wei. I wonder what the right fix is her e- just limit max_cpus to 255? (In reply to Dr. David Alan Gilbert from comment #4) > Yeh, lets add a needinfo on Wei. > I wonder what the right fix is her e- just limit max_cpus to 255? nr_cores, more specifically. I see two possibilities: * declaring nr_cores > 256 as never supported (or deprecated); or * omitting the CPUID[8000001E] node if nr_cores is too large. I'm sure there are other CPUID leaves that break unpredictably if nr_cores or nr_threads is too large, but we never noticed because they don't have any asserts. It would be nice to fix all of them. Simple way to reproduce the bug upstream without using the monitor or the EPYC CPU model: $ qemu-system-x86_64 -machine q35,accel=kvm,kernel-irqchip=split -device intel-iommu,intremap=on,eim=on -smp 1,maxcpus=258,cores=258,threads=1,sockets=1 -cpu qemu64,xlevel=0x8000001e -device qemu64-x86_64-cpu,apic-id=257 qemu-system-x86_64: warning: Number of hotpluggable cpus requested (258) exceeds the recommended cpus supported by KVM (240) qemu-system-x86_64: /home/ehabkost/rh/proj/virt/qemu/target/i386/cpu.c:5888: cpu_x86_cpuid: Assertion `cpu->core_id <= 255' failed. Aborted (core dumped) Hit same issue on rhel8.3 slow train. qemu-kvm-4.2.0-25.module+el8.3.0+6986+29a4dcd7 kernel-4.18.0-215.el8.x86_64 I can reproduce it with RHEL 8.3. Let me ping Babu if he had worked on it. Otherwise I will take a look myself. Looking at the code again, I think best way to handle this is to omitting the CPUID[8000001E] if nr_cores is too large. We can't build the CPUID "8000001E" with core_id greater than 255. To support more than 255 cores we need x2apic support. In that case topology is coming from CPUID 0xB which appears to work fine. Eduardo, Can I add a check in cpu_x86_cpuid under case 0x8000001E: to return all zeros if core_id > 255. Or let me know where to add this check(or checks). (In reply to Babu Moger from comment #10) > Looking at the code again, I think best way to handle this is to omitting > the CPUID[8000001E] if nr_cores is too large. We can't build the CPUID > "8000001E" with core_id greater than 255. To support more than 255 cores we > need x2apic support. In that case topology is coming from CPUID 0xB which > appears to work fine. > > Eduardo, Can I add a check in cpu_x86_cpuid under case 0x8000001E: to return > all zeros if core_id > 255. Or let me know where to add this check(or > checks). This sounds like the simplest solution. Especially if we want to make a quick and safe bug fix to be backported to downstream releases. Supporting larger core_id sizes and refactoring the CPUID[0x8000001E] code can be implemented later. Posted the patch https://lore.kernel.org/qemu-devel/159257395689.52908.4409314503988289481.stgit@naples-babu.amd.com/ Please review. thanks Hi Eduardo, What is the plan for this BZ? I did a backport test with Babu's patch from upstream and it did fix the problem (both virt-rhel and virt-av). I can submit the backport patch if needed, possibly for both rhel-av-8.3.1 and rhel-8.4.0? I think it might be too late for rhel-8.3.0? Thanks, -Wei (In reply to WEI HUANG from comment #14) > Hi Eduardo, > > What is the plan for this BZ? I did a backport test with Babu's patch from > upstream and it did fix the problem (both virt-rhel and virt-av). I can > submit the backport patch if needed, possibly for both rhel-av-8.3.1 and > rhel-8.4.0? I think it might be too late for rhel-8.3.0? > > Thanks, > -Wei Sorry for taking so long to reply. It was already too late for 8.3.0, but we can target this for 8.4.0 because the fix will be included via rebase. Fixed upstream by: commit 35ac5dfbcaa4b31470b4e201d26143b8b9a0a1e7 Author: Babu Moger <babu.moger> Date: Mon Sep 21 17:47:28 2020 -0500 target/i386: Remove core_id assert check in CPUID 0x8000001E With x2apic enabled, configurations can have more that 255 cores. Noticed the device add test is hitting an assert when during cpu hotplug with core_id > 255. This is due to assert check in the CPUID 0x8000001E. Remove the assert check and fix the problem. Fixes the bug: Link: https://bugzilla.redhat.com/show_bug.cgi?id=1834200 Signed-off-by: Babu Moger <babu.moger> Message-Id: <160072824160.9666.8890355282135970684.stgit.com> Signed-off-by: Eduardo Habkost <ehabkost> Verify: qemu-kvm-5.2.0-2.module+el8.4.0+9186+ec44380f host kernel: 4.18.0-268.el8.x86_64 guest kernel: 4.18.0-269.el8.x86_64 The issue is gone, guest works well. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (virt:av bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:2098 |