| Summary: | RFE: Determine the value of nr_cpus/maxcpus for systems with a large number of CPUs | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Xunlei Pang <xlpang> |
| Component: | kexec-tools | Assignee: | Baoquan He <bhe> |
| Status: | CLOSED NOTABUG | QA Contact: | Emma Wu <xiawu> |
| Severity: | low | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 7.4 | CC: | bhe, dyoung, kdump-bugs, lmiksik, piliu, ruyang, xiawu, xlpang |
| Target Milestone: | rc | Keywords: | FutureFeature |
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | kexec-tools-2.0.15-10.el7 | Doc Type: | Bug Fix |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2018-12-03 01:43:18 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Bug Depends On: | |||
| Bug Blocks: | 1469549, 1473055, 1477926, 1494043, 1549423, 1653509 | ||
|
Description
Xunlei Pang
2016-11-29 07:06:11 UTC
Known affected factors:
*)reserved crash memory
*)number of cpus
*)architectures
different memory overhead adding one extra cpu,
different reserved crash memory policy when using crashkernel=auto.
*)large number of io devices
assuming required large crash memory, relate it with reserved crash memory.
e.g. HP x86_64: 24TB RAM, 768-CPU, crashkernel=512M,high crashkernel=256M,low
this manual configuration can work.
Did some investigation and tests.
1. extra memory consumption for extra cpus
a) X86_64
tested on
amd, 2-smt 12-cpu, memory: 10GB, crashkernel=161MB, f25
and 4-core kvm.fedora, f24
result: less than 2MB with one extra cpu.
b) s390x
tested on ibm-z10-47.rhts.eng.bos.redhat.com 1994MB crashkernel=161MB, 3.10.0-514.2.2.el7.s390x
result: less than 2MB with one extra logical cpu.
c) ppc64("maxcpus=X")
tested on a 4-smt 7-core 28-cpu machine.
result: around 5MB with one extra logical cpu.
2. For crashkernel=auto, the reserved memory policy is like:
X86_64:
<2G:0M, 2G-:(161MB + 64MB/1TB)
ppc64:
<2G:0M, 2G-4G:384M,4G-16G:512M,16G-64G:1G,64G-128G:2G,128G-:4G
s390x:
<4G:0M, 4G-:(161MB + 64MB/1TB)
arm64 on rhelsa:
http://git.engineering.redhat.com/git/users/panand/linux.git/commit/?h=pegas-devel&id=124025600b404c0be9c42ee9b598ce10df6ad486
160MB or 2GB(with CMA hugepage)
3. From data described in 1) and 2), we can see clearly that more cpus means more memory consumption, e.g. 1024-cpu system will consume around 2GB more memory, even more for ppc64 systems, this is obviously unacceptable for kdump,
especially when it comes to our "crashkernel=auto" feature. So we must have an upper bound for the number of cpus booted by kdump,
4. A theoretical formula
We assume that an extra cpu every 256MB step is safe enough to be with the flexible of 2MB variant, i.e. we allow one extra cpu for x86/s390(not test arm64 yet) every 256MB crash memory in case there is insufficient memory, and allow one extra cpu for ppc64 every 512MB crash memory.
We also limit the max cpu for kdump to 16, as the upper bound.
Formula:
An extra cpu every 256MB step(safe enough to be with the flexible of 2MB variant), round down to power of two.
int __weak arch_calc_cpus_from_crashmemory(): // x86, arm64, s390x
(0M, 256M]: 1
(256M, 1G]: 2
(1G, 2G]: 4
(2G, 4G]: 8
(4G, UMAX]: 16
Normally systems using "crashkernel=auto"(for x86, get extra 64MB memory every 1TB system memory) with less than around 1.5TB system memory will fall into the first category ending up with "nr_cpus=1".
ppc64, step is 512MB percpu (a flexible of 6MB variant)
crashkernel=auto: <2G:0M, 2G-4G:384M,4G-16G:512M,16G-64G:1G,64G-128G:2G,128G-:4G
powerpc implementation:
int arch_calc_cpus_from_crashmemory():
(0M, 512M]: 1
(512M, 2G]: 2
(2G, 4G]: 4
(4G, 8G]: 8
(8G, UMAX]: 16
Normally ppc64 systems using "crashkernel=auto" with less than around 16GB system memory will fall into the first category ending up with "nr_cpus=1", with less than around 128GB system memory will fall into the second category ending up with "nr_cpus=2".
nr_cpus{maxcpus} = arch_calc_cpus_from_crashmemory(crashmemory_in_MB);
if (nr_cpus > $(nproc))
nr_cpus = $(nproc)
This formula is mainly based on the theoretic analysis, any comment?
After some discussion, we decide only to deal with the x86 insufficient vector issue. For other cases that need more than one cpus, users can change "nr_cpus=X" in /etc/sysconfig/kdump manually as needed. patches posted: http://post-office.corp.redhat.com/archives/kexec-kdump-list/2017-February/msg00008.html This bug was opened by redhat internal engineer who has left, based on a green thought. Later on the implementation based on the design caused generic bug, so we just reverted it. I have no time to further investigate it, and nobody reported issue or complain, so I would like to close it. If anyone have concern, or any issue in this area reported, we can continue. Thanks Baoquan |