Bug 1439170
| Summary: | crash: "vmlinux and vmcore do not match!" with ELF vmcores generated with makedumpfile -E or scp | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Emma Wu <xiawu> |
| Component: | crash | Assignee: | Dave Anderson <anderson> |
| Status: | CLOSED ERRATA | QA Contact: | Emma Wu <xiawu> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 7.4 | CC: | bhe, dyoung, panand, qzhao, xiawu |
| Target Milestone: | rc | Keywords: | Reopened |
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | crash-7.1.9-1.el7 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2017-08-01 22:04:38 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Comment 3
Dave Anderson
2017-04-05 13:06:16 UTC
Thanks! In the meantime, I was able to reproduce the issue with: kernel-3.10.0-640.el7.x86_64 kexec-tools-2.0.14-4.el7.x86_64 As you mentioned, crash works OK with compressed kdumps, but fails with ELF format dumpfiles. I simplified my test to remove any filtering: core_collector makedumpfile -E As it turns out, the problem is due to the crash utility's calculation of the kernel's "phys_base" value. Something must have changed with the kernel's /proc/vmcore output, more specifically the contents of the ELF PT_LOAD segments. I will update this bugzilla when I have more information. The problem also occurs with: core_collector scp So we can take makedumpfile out of the picture entirely. Given that it appears to be related to the kernel's creation of /proc/vmcore, do you know the kernel version where this problem started happening? OK, thanks. I saw that segmentation fault when using the installed version of kexec-tools. When I upgraded to kexec-tools-2.0.14-4.el7, it fixed itself.
Anyway, upon further investigation, it is not an issue with the /proc/vmcore
PT_LOAD segments, but rather with recent KASLR-related kernel changes
related to KERNEL_IMAGE_SIZE, which affects the virtual memory address
space layout.
The problem at hand is that the value of KERNEL_IMAGE_SIZE is not exported
with your 3.10.0-609.el7 kernel (or my 3.10.0-640.el7) kernel.
If you run "crash vmlinux vmcore --machdep kernel_image_size=1g" it
should work OK.
However, I see this recent rhel7 commit, which exports the values
of PHYS_BASE and KERNEL_IMAGE_SIZE:
commit 2a74f863738828916976c987e70e7d4c76099394
Author: Baoquan He <bhe>
Date: Fri Mar 24 13:57:28 2017 -0400
[kernel] kexec: export the value of phys_base instead of symbol address
... [ cut ] ...
diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
index e58b0f8..9075516 100644
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -326,7 +326,7 @@ void machine_kexec(struct kimage *image)
void arch_crash_save_vmcoreinfo(void)
{
- VMCOREINFO_SYMBOL(phys_base);
+ VMCOREINFO_NUMBER(phys_base);
VMCOREINFO_SYMBOL(init_level4_pgt);
#ifdef CONFIG_NUMA
@@ -335,6 +335,7 @@ void arch_crash_save_vmcoreinfo(void)
#endif
vmcoreinfo_append_str("KERNELOFFSET=%lx\n",
kaslr_offset());
+ VMCOREINFO_NUMBER(KERNEL_IMAGE_SIZE);
}
and which will show up in kernel-3.10.0-641.el7:
$ git describe --contains 2a74f863738828916976c987e70e7d4c76099394
kernel-3.10.0-641.el7~10
$
So with that kernel patch in place, the problem should go away.
> So with that kernel patch in place, the problem should go away. With "core_collector scp", here is a 3.10.0-643.el7 kernel: # crash /var/crash/127.0.0.1-2017-04-05-14:32:04/vmcore /usr/lib/debug/lib/modules/3.10.0-643.el7.x86_64/vmlinux crash 7.1.8-2.el7 Copyright (C) 2002-2016 Red Hat, Inc. Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation Copyright (C) 1999-2006 Hewlett-Packard Co Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited Copyright (C) 2006, 2007 VA Linux Systems Japan K.K. Copyright (C) 2005, 2011 NEC Corporation Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc. Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc. This program is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Enter "help copying" to see the conditions. This program has absolutely no warranty. Enter "help warranty" for details. GNU gdb (GDB) 7.6 Copyright (C) 2013 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-unknown-linux-gnu"... WARNING: kernel relocated [634MB]: patching 77552 gdb minimal_symbol values KERNEL: /usr/lib/debug/lib/modules/3.10.0-643.el7.x86_64/vmlinux DUMPFILE: /var/crash/127.0.0.1-2017-04-05-14:32:04/vmcore CPUS: 12 DATE: Wed Apr 5 14:31:55 2017 UPTIME: 00:04:34 LOAD AVERAGE: 0.29, 0.58, 0.30 TASKS: 266 NODENAME: hp-z400-02.ml3.eng.bos.redhat.com RELEASE: 3.10.0-643.el7.x86_64 VERSION: #1 SMP Tue Apr 4 19:00:14 EDT 2017 MACHINE: x86_64 (3066 Mhz) MEMORY: 4 GB PANIC: "SysRq : Trigger a crash" PID: 14248 COMMAND: "bash" TASK: ffff91df76f78fb0 [THREAD_INFO: ffff91def3f98000] CPU: 2 STATE: TASK_RUNNING (SYSRQ) crash> Since the RHEL7 kernel code has been "in transition" with respect to KASLR, I would prefer to close this bugzilla since it works with the more recent kernels. Do you all agree? > Do you all agree?
FWIW, I do have a patch that would fix the "interim KASLR kernel" problem,
but again, does it make sense to do it?
Hi Dave, Looks good to me if we close it as WORKSFORME or CURRENTRELEASE. Surely if you have a easy fix and that doesn't bring maintenance confusion to latest code, it's also good. Thanks Baoquan OOPS, I just tried to see what bug close flag should be taken, forget canceling it when submit comment. Sorry about that. Leave it to Dave to decide whether it should be closed or not. Although I don't believe that this bug could ever happen with an older upstream kernel, I am going to commit a fix into the upstream github crash repository. Patch pushed upstream: https://github.com/crash-utility/crash/commit/eb1057eff00620d4519c60db8a3a88ecc6c92fea Fix for the determination of the x86_64 "phys_base" value when it is not passed in the VMCOREINFO data of ELF vmcores. Without the patch, it is possible that the base address of the vmalloc region is unknown and initialized to an incorrect default address during the very early stages of initialization, which causes the parsing of the PT_LOAD segments for the START_KERNEL_map region to fail. (anderson) Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:2019 |