Bug 1159264

Summary: qemu-kvm process quit when smp >8 and maxcpus=240 (qemu-kvm: max_cpus is too large. APIC ID of last CPU is 380)
Product: Red Hat Enterprise Linux 7 Reporter: FuXiangChun <xfu>
Component: qemu-kvm-rhevAssignee: Radim Krčmář <rkrcmar>
Status: CLOSED NOTABUG QA Contact: Virtualization Bugs <virt-bugs>
Severity: high Docs Contact:
Priority: high    
Version: 7.1CC: ehabkost, hhuang, juzhang, linchen, michen, virt-maint
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-10-31 16:48:07 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description FuXiangChun 2014-10-31 10:35:37 UTC
Description of problem:
QE tested scenarios:

smp 8 and maxcpus=240 -> works
smp 10 and maxcpus=240 ->fail
smp 12 and maxcpus=240 ->fail
...........


Version-Release number of selected component (if applicable):
3.10.0-195.el7.x86_64(host and guest)
qemu-kvm-rhev-2.1.2-5.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1./usr/libexec/qemu-kvm -m 10G -smp 12,cores=6,threads=1,sockets=2,maxcpus=240
2.
3.

Actual results:
qemu-kvm: max_cpus is too large. APIC ID of last CPU is 317

Expected results:


Additional info:

qemu-1.5.x didn't hit this issue.

Comment 2 Radim Krčmář 2014-10-31 16:48:07 UTC
APIC IDs do not have to be continuous.

They contain the topology in them -- a part of the ID is reserved only for cores, so if their number is not a power of two, bits get wasted and we can effectively express less CPUs.

1.5.x does not check for it [see patch below], so it should fail when max_cpus is reached ... I prefer to keep the early warning in RHEV.


---
commit f03bd716a2935532379cff1c71c6f0f399921b70
Author: Eduardo Habkost <ehabkost>
Date:   Fri Mar 14 16:33:54 2014 -0300

    pc: Refuse max_cpus if it results in too large APIC ID
    
    This changes the PC initialization code to reject max_cpus if it results
    in an APIC ID that's too large, instead of aborting or erroring out when
    it is already too late.

Comment 5 Eduardo Habkost 2014-10-31 17:42:22 UTC
Our VCPU count limit is also an APIC ID limit, unfortunately, for multiple reasons:

1) Some KVM data structures are APIC ID-based, so increasing the APIC ID limit would require careful evaluation just like increasing the global VCPU count limit.

2) APIC IDs larger than 254 require x2apic support, and would use different code paths inside host-side emulation code and on the guest code. I don't even know which guest OSes would support it. It requires careful testing and evaluation, and needs to be treated as completely new feature.