Bug 1399481 - RFE: Determine the value of nr_cpus/maxcpus for systems with a large number of CPUs
Summary: RFE: Determine the value of nr_cpus/maxcpus for systems with a large number o...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: kexec-tools
Version: 7.4
Hardware: Unspecified
OS: Linux
unspecified
low
Target Milestone: rc
: ---
Assignee: Baoquan He
QA Contact: Emma Wu
URL:
Whiteboard:
Depends On:
Blocks: 1469549 1473055 1477926 1494043 1549423 1653509
TreeView+ depends on / blocked
 
Reported: 2016-11-29 07:06 UTC by Xunlei Pang
Modified: 2018-12-03 01:43 UTC (History)
8 users (show)

Fixed In Version: kexec-tools-2.0.15-10.el7
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-12-03 01:43:18 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Xunlei Pang 2016-11-29 07:06:11 UTC
Description of problem:
For some systems with a large number of CPUs and peripherals the default "nr_cpus=1" hard-coded in "/etc/sysconfig/kdump" probably won't work.
We need a algorithm/formula to choose the proper value in these cases.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:
For example, a HP system with 758CPUs in https://bugzilla.redhat.com/show_bug.cgi?id=1346327, adopted "nr_cpus=4" instead of "nr_cpus=1".

We also need to consider the extra memory consumption after increasing the number of cpus used by kdump kernel, in case of "crashkernel=auto".

Comment 2 Xunlei Pang 2016-12-13 04:13:08 UTC
Known affected factors:
*)reserved crash memory
*)number of cpus
*)architectures
  different memory overhead adding one extra cpu,
  different reserved crash memory policy when using crashkernel=auto.
*)large number of io devices
  assuming required large crash memory, relate it with reserved crash memory.

e.g. HP x86_64: 24TB RAM, 768-CPU, crashkernel=512M,high crashkernel=256M,low
this manual configuration can work.

Did some investigation and tests.
1. extra memory consumption for extra cpus
a) X86_64
tested on 
amd, 2-smt 12-cpu, memory: 10GB, crashkernel=161MB, f25
and 4-core kvm.fedora, f24

result: less than 2MB with one extra cpu.

b) s390x
tested on ibm-z10-47.rhts.eng.bos.redhat.com 1994MB crashkernel=161MB, 3.10.0-514.2.2.el7.s390x

result: less than 2MB with one extra logical cpu.

c) ppc64("maxcpus=X")
tested on a 4-smt 7-core 28-cpu machine.

result: around 5MB with one extra logical cpu.


2. For crashkernel=auto, the reserved memory policy is like:
X86_64:
<2G:0M, 2G-:(161MB + 64MB/1TB)

ppc64:
<2G:0M, 2G-4G:384M,4G-16G:512M,16G-64G:1G,64G-128G:2G,128G-:4G

s390x:
<4G:0M, 4G-:(161MB + 64MB/1TB)

arm64 on rhelsa:
http://git.engineering.redhat.com/git/users/panand/linux.git/commit/?h=pegas-devel&id=124025600b404c0be9c42ee9b598ce10df6ad486
160MB or 2GB(with CMA hugepage)

3. From data described in 1) and 2), we can see clearly that more cpus means more memory consumption, e.g. 1024-cpu system will consume around 2GB more memory, even more for ppc64 systems, this is obviously unacceptable for kdump,
especially when it comes to our "crashkernel=auto" feature. So we must have an upper bound for the number of cpus booted by kdump,

4. A theoretical formula
We assume that an extra cpu every 256MB step is safe enough to be with the flexible of 2MB variant, i.e. we allow one extra cpu for x86/s390(not test arm64 yet) every 256MB crash memory in case there is insufficient memory, and allow one extra cpu for ppc64 every 512MB crash memory. 

We also limit the max cpu for kdump to 16, as the upper bound.

Formula:
An extra cpu every 256MB step(safe enough to be with the flexible of 2MB variant), round down to power of two.
int __weak arch_calc_cpus_from_crashmemory(): // x86, arm64, s390x
    (0M, 256M]: 1
    (256M, 1G]: 2
    (1G, 2G]: 4
    (2G, 4G]: 8
    (4G, UMAX]: 16

Normally systems using "crashkernel=auto"(for x86, get extra 64MB memory every 1TB system memory)  with less than around 1.5TB system memory will fall into the first category ending up with "nr_cpus=1".

ppc64, step is 512MB percpu (a flexible of 6MB variant)
crashkernel=auto: <2G:0M, 2G-4G:384M,4G-16G:512M,16G-64G:1G,64G-128G:2G,128G-:4G
powerpc implementation:
int arch_calc_cpus_from_crashmemory():
    (0M, 512M]: 1
    (512M, 2G]: 2
    (2G, 4G]: 4
    (4G, 8G]: 8
    (8G, UMAX]: 16

Normally ppc64 systems using "crashkernel=auto" with less than around 16GB system memory will fall into the first category ending up with "nr_cpus=1", with less than around 128GB system memory will fall into the second category ending up with "nr_cpus=2".

nr_cpus{maxcpus} = arch_calc_cpus_from_crashmemory(crashmemory_in_MB);
if (nr_cpus > $(nproc))
    nr_cpus = $(nproc)

This formula is mainly based on the theoretic analysis, any comment?

Comment 3 Xunlei Pang 2017-02-23 08:09:47 UTC
After some discussion, we decide only to deal with the x86 insufficient vector issue. For other cases that need more than one cpus, users can change "nr_cpus=X"
in /etc/sysconfig/kdump manually as needed.

Comment 21 Baoquan He 2018-12-03 01:43:18 UTC
This bug was opened by redhat internal engineer who has left, based on a green thought. Later on the implementation based on the design caused generic bug, so we just reverted it. I have no time to further investigate it, and nobody reported issue or complain, so I would like to close it. If anyone have concern, or any issue in this area reported, we can continue.

Thanks
Baoquan


Note You need to log in before you can comment on or make changes to this bug.