Description of problem: Trying to investigate a kernel crash dump, where ext3 and jbd are built as modules, I start crash on the kernel dump and try to load the module symbols with mod -s. Frequently, the underlying gdb segfaults. Even when the mod -s apparently succeeds, there seems to be no information about the symbol. The kernel and modules are built with -g and KALLSYMS is configured. Version-Release number of selected component (if applicable): crash-3.10-1 How reproducible: Easily, but not every time. Steps to Reproduce: 1. Produce a dump with a kernel that has ext3 built as a module. 2. start crash: crash vmlinux vmcore 3. Try to load ext3 symbols and jbd symbols: crash> mod -s ext3 crash> mod -s jbd Actual results: see attached file Expected results: no seg faults, and values for the symbols. Additional info:
Created attachment 111561 [details] Transcript of short interaction with crash
What version of gcc was used to build this kernel?: RELEASE: 2.6.9-prep Also, if you do run gdb alone on the ext3.ko or jbd.ko files, you might see strange behaviour as well. There's a "known problem" with the ia64 and x86_64 kernel modules built with gcc 3.4.2, such that gdb fails the "add-symbol-file" operation. (I'm not sure whether you can access Red Hat bugzilla number 141523).
gcc 3.4.2 - I should point out that we first ran into this running the stock RHEL4 AS kernel. I just happened to produce the errors for the bug report on one of the test kernels we are currently running to try to catch the bug described in RH bugzilla #146037. I also tried to look at #141523, but I do not have access.
Hmmm -- what gcc was used to build the stock RHEL4 kernel? strings vmlinux | grep 'Linux ver'
Bzzt - the gcc version we use is 3.4.3, *not* 3.4.2 - sorry about that. That is also the version used to build the stock kernel and is the default gcc on our system. There is also a gcc32 and a gcc4 available. If you think that either one of those will help, let me know and I'll try it out. Here is what the strings | grep said: Linux version 2.6.9-5.EL (bhcompile.redhat.com) (gcc version 3.4.3 20041212 (Red Hat 3.4.3-9.EL4)) #1 SMP Wed Jan 5 19:23:24 EST 2005 and here is the version info on the default gcc: # gcc --version gcc (GCC) 3.4.3 20041212 (Red Hat 3.4.3-9.EL4) Copyright (C) 2004 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
What happens when you do this?: $ gdb ext3.o or $ gdb jbd.o
I tried it on both the .o and the .ko. In all cases, gdb starts normally with no errors. It also seems to know at least some things about what the module defines: # gdb /lib/modules/2.6.9-prep/kernel/fs/jbd/jbd.ko GNU gdb Red Hat Linux (6.1post-1.20040607.62rh) Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "ia64-redhat-linux-gnu"...Using host libthread_db library "/lib/tls/libthread_db.so.1". (gdb) ptype transaction_t type = struct transaction_s { journal_t *t_journal; tid_t t_tid; enum {T_RUNNING, T_LOCKED, T_RUNDOWN, T_FLUSH, T_COMMIT, T_FINISHED} t_state; long unsigned int t_log_start; int t_nr_buffers; struct journal_head *t_reserved_list; struct journal_head *t_locked_list; struct journal_head *t_buffers; struct journal_head *t_sync_datalist; struct journal_head *t_forget; struct journal_head *t_checkpoint_list; struct journal_head *t_iobuf_list; struct journal_head *t_shadow_list; struct journal_head *t_log_list; spinlock_t t_handle_lock; int t_updates; int t_outstanding_credits; transaction_t *t_cpnext; transaction_t *t_cpprev; long unsigned int t_expires; int t_handle_count; spinlock_t t_jcb_lock; struct list_head t_jcb; } (gdb)
All right, in few days, I'll try a restoration of gdb-6.1 into a test version of crash to give you, because I was given a small gdb-6.1 patch by our gdb team after I had made the regression back to gdb-6.0. (By then it was too late to go back to gdb-6.1...) Unfortunately, I'm completely swamped with other work right now, and I'll get to it as soon as possible.
Nick, Please try this test version of crash (crash-3.10-13.4_gdb6.1.tar.gz) located here: http://people.redhat.com/anderson/.crash_test and let me know how it works. I cannot guarantee that it will help, nor whether it may introduce other problems. Sorry for the delay... Dave
Dave, I tried the new version (lightly on x86, slightly more heavily on ia64 - on live kernels so far, I have not tried it on a crash dump): it's behaving much better - no crashes and it's giving me the information that I expect. I'll try to push it some more and also try to do some post-mortem debugging. If I find anything amiss, I'll let you know. Thanks very much! Nick
Thanks, Nick. That test version will *not* work with 32-bit (x86) vmcore files that are greater than 4GB in length. Unfortunately that test version dragged in some other stuff I'm working on (specifically, support of multiple PT_LOAD sections so that we can avoid the 256GB sparse memory holes on ia64 boxes), and in so doing I broke read_netdump() for big x86 dumpfiles. If you plug this in as a replacement for netdump.c:read_netdump(), you could more legitimately test x86: int read_netdump(int fd, void *bufptr, int cnt, ulong addr, physaddr_t paddr) { off_t offset; struct pt_load_segment *pls; int i; switch (nd->flags & (NETDUMP_ELF32|NETDUMP_ELF64)) { case NETDUMP_ELF32: offset = (off_t)paddr + (off_t)nd->header_size; break; case NETDUMP_ELF64: for (i = offset = 0; i < nd->num_pt_load_segments; i++) { pls = &nd->pt_load_segments[i]; if ((paddr >= pls->phys_start) && (paddr < pls->phys_end)) { offset = (off_t)(paddr - pls->phys_start) + pls->file_offset; break; } } if (!offset) return READ_ERROR; break; } if (lseek(nd->ndfd, offset, SEEK_SET) == -1) return SEEK_ERROR; if (read(nd->ndfd, bufptr, cnt) != cnt) return READ_ERROR; return cnt; } BTW, do you test any x86_64 machines by any chance? As I recall, we only saw the problem with gdb-6.1 (pre-fix) handling modules on ia64 and x86_64 machines.
The fix for this issue is contained within the crash utility versions associated with its respective RHEL3 and RHEL4 errata: RHEA-2005:599 crash enhancement update (RHEL3-U6) - version 4.0-1 RHEA-2005:600 crash enhancement update (RHEL4-U2) - version 4.0-2
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
This was fixed in RHEL4-U2...