Description of problem: Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
I open this bug on behavior of Juan Abia <jabia>. Let him supplement more detail later.
Description of problem: sometimes when booting aarch64 images (in my particular case, validating CCSP aws images) kdump is not operational. After investigation from Pingfan Liu, we realized this only happens if the image kernel version is lower than 4.18.0-449.el8 Version-Release number of selected component (if applicable): kexec-tools-2.0.20-69.el8_6.1.aarch64 How reproducible: there's a low probability of hitting this bug Steps to Reproduce: 1. Boot an aarch64 image with a kernel version lower than 4.18.0-449 2. run "kdumpctl status" Actual results: kdump: Kdump is not operational Expected results: kdump: Kdump is operational Additional info: journalctl kernel: -- Logs begin at Tue 2023-05-30 07:44:43 UTC, end at Tue 2023-05-30 08:22:40 UTC. -- May 30 07:44:43 localhost kernel: Booting Linux on physical CPU 0x0000000000 [0x413fd0c1] May 30 07:44:43 localhost kernel: Linux version 4.18.0-372.57.1.el8_6.aarch64 (mockbuild.eng.bos.redhat.com) (gcc version 8.5.0 20210514 (Red Hat 8.5.0-10) (GCC)) #1 SMP Thu May 11 07:27:41 EDT 2023 May 30 07:44:43 localhost kernel: efi: Getting EFI parameters from FDT: May 30 07:44:43 localhost kernel: efi: EFI v2.70 by EDK II May 30 07:44:43 localhost kernel: efi: SMBIOS=0x7bed0000 SMBIOS 3.0=0x7beb0000 ACPI=0x786e0000 ACPI 2.0=0x786e0014 MEMATTR=0x7a75c018 RNG=0x7bfdef98 MEMRESERVE=0x7857c698 May 30 07:44:43 localhost kernel: efi: seeding entropy pool May 30 07:44:43 localhost kernel: Using crashkernel=auto, the size chosen is a best effort estimation. May 30 07:44:43 localhost kernel: cannot allocate crashkernel (size:0x1c000000) May 30 07:44:43 localhost kernel: ACPI: Early table checksum verification disabled May 30 07:44:43 localhost kernel: ACPI: RSDP 0x00000000786E0014 000024 (v02 AMAZON)
This issue is limited to the aarch64 platforma The root cause should be that the firmware allocates and occupies memory randomly at different location. So there is no continuous memory chunk left under 4GB, which is big enough to allocate memory for crashkernel. Beyond the kernel version 4.18.0-449.el8, the crashkernel supports fallback mode, and it can find a suitable region cross or above 4GB boundary. If a user hit this issue, he/she is suggested to update the kernel beyond 4.18.0-449.el8 to tackle it.
This issue may confuse the user. Based on comment#3, there is nothing can be done at the software level. I suggest to add a kbase.