Hide Forgot
Description of problem: For some systems with a large number of CPUs and peripherals the default "nr_cpus=1" hard-coded in "/etc/sysconfig/kdump" probably won't work. We need a algorithm/formula to choose the proper value in these cases. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: For example, a HP system with 758CPUs in https://bugzilla.redhat.com/show_bug.cgi?id=1346327, adopted "nr_cpus=4" instead of "nr_cpus=1". We also need to consider the extra memory consumption after increasing the number of cpus used by kdump kernel, in case of "crashkernel=auto".
Known affected factors: *)reserved crash memory *)number of cpus *)architectures different memory overhead adding one extra cpu, different reserved crash memory policy when using crashkernel=auto. *)large number of io devices assuming required large crash memory, relate it with reserved crash memory. e.g. HP x86_64: 24TB RAM, 768-CPU, crashkernel=512M,high crashkernel=256M,low this manual configuration can work. Did some investigation and tests. 1. extra memory consumption for extra cpus a) X86_64 tested on amd, 2-smt 12-cpu, memory: 10GB, crashkernel=161MB, f25 and 4-core kvm.fedora, f24 result: less than 2MB with one extra cpu. b) s390x tested on ibm-z10-47.rhts.eng.bos.redhat.com 1994MB crashkernel=161MB, 3.10.0-514.2.2.el7.s390x result: less than 2MB with one extra logical cpu. c) ppc64("maxcpus=X") tested on a 4-smt 7-core 28-cpu machine. result: around 5MB with one extra logical cpu. 2. For crashkernel=auto, the reserved memory policy is like: X86_64: <2G:0M, 2G-:(161MB + 64MB/1TB) ppc64: <2G:0M, 2G-4G:384M,4G-16G:512M,16G-64G:1G,64G-128G:2G,128G-:4G s390x: <4G:0M, 4G-:(161MB + 64MB/1TB) arm64 on rhelsa: http://git.engineering.redhat.com/git/users/panand/linux.git/commit/?h=pegas-devel&id=124025600b404c0be9c42ee9b598ce10df6ad486 160MB or 2GB(with CMA hugepage) 3. From data described in 1) and 2), we can see clearly that more cpus means more memory consumption, e.g. 1024-cpu system will consume around 2GB more memory, even more for ppc64 systems, this is obviously unacceptable for kdump, especially when it comes to our "crashkernel=auto" feature. So we must have an upper bound for the number of cpus booted by kdump, 4. A theoretical formula We assume that an extra cpu every 256MB step is safe enough to be with the flexible of 2MB variant, i.e. we allow one extra cpu for x86/s390(not test arm64 yet) every 256MB crash memory in case there is insufficient memory, and allow one extra cpu for ppc64 every 512MB crash memory. We also limit the max cpu for kdump to 16, as the upper bound. Formula: An extra cpu every 256MB step(safe enough to be with the flexible of 2MB variant), round down to power of two. int __weak arch_calc_cpus_from_crashmemory(): // x86, arm64, s390x (0M, 256M]: 1 (256M, 1G]: 2 (1G, 2G]: 4 (2G, 4G]: 8 (4G, UMAX]: 16 Normally systems using "crashkernel=auto"(for x86, get extra 64MB memory every 1TB system memory) with less than around 1.5TB system memory will fall into the first category ending up with "nr_cpus=1". ppc64, step is 512MB percpu (a flexible of 6MB variant) crashkernel=auto: <2G:0M, 2G-4G:384M,4G-16G:512M,16G-64G:1G,64G-128G:2G,128G-:4G powerpc implementation: int arch_calc_cpus_from_crashmemory(): (0M, 512M]: 1 (512M, 2G]: 2 (2G, 4G]: 4 (4G, 8G]: 8 (8G, UMAX]: 16 Normally ppc64 systems using "crashkernel=auto" with less than around 16GB system memory will fall into the first category ending up with "nr_cpus=1", with less than around 128GB system memory will fall into the second category ending up with "nr_cpus=2". nr_cpus{maxcpus} = arch_calc_cpus_from_crashmemory(crashmemory_in_MB); if (nr_cpus > $(nproc)) nr_cpus = $(nproc) This formula is mainly based on the theoretic analysis, any comment?
After some discussion, we decide only to deal with the x86 insufficient vector issue. For other cases that need more than one cpus, users can change "nr_cpus=X" in /etc/sysconfig/kdump manually as needed.
patches posted: http://post-office.corp.redhat.com/archives/kexec-kdump-list/2017-February/msg00008.html
This bug was opened by redhat internal engineer who has left, based on a green thought. Later on the implementation based on the design caused generic bug, so we just reverted it. I have no time to further investigate it, and nobody reported issue or complain, so I would like to close it. If anyone have concern, or any issue in this area reported, we can continue. Thanks Baoquan