150070 – when loading module symbols into crash, the underlying gdb frequently segfaults

Bug 150070 - when loading module symbols into crash, the underlying gdb frequently segfaults

Summary: when loading module symbols into crash, the underlying gdb frequently segfaults

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Enterprise Linux 4
Classification:	Red Hat
Component:	crash
Sub Component:
Version:	4.0
Hardware:	ia64
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Dave Anderson
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2005-03-02 03:25 UTC by Nick Dokos
Modified:	2007-11-30 22:07 UTC (History)
CC List:	6 users (show)
Fixed In Version:	4.0-2 RHEL4-U2
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2006-11-07 22:10:34 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Transcript of short interaction with crash (2.84 KB, text/plain) 2005-03-02 03:32 UTC, Nick Dokos	no flags	Details
View All

Description Nick Dokos 2005-03-02 03:25:27 UTC

Description of problem: Trying to investigate a kernel crash dump,
where ext3 and jbd are built as modules, I start crash on the kernel dump
and try to load the module symbols with mod -s. Frequently, the underlying
gdb segfaults. Even when the mod -s apparently succeeds, there seems to be
no information about the symbol. The kernel and modules are built with -g
and KALLSYMS is configured.


Version-Release number of selected component (if applicable): crash-3.10-1

How reproducible: Easily, but not every time.



Steps to Reproduce:
1. Produce a dump with a kernel that has ext3 built as a module.
2. start crash: crash vmlinux vmcore
3. Try to load ext3 symbols and jbd symbols:

     crash> mod -s ext3
     crash> mod -s jbd


  
Actual results: see attached file


Expected results: no seg faults, and values for the symbols.


Additional info:

Comment 1 Nick Dokos 2005-03-02 03:32:10 UTC

Created attachment 111561 [details]
Transcript of short interaction with crash

Comment 2 Dave Anderson 2005-03-02 14:03:33 UTC

What version of gcc was used to build this kernel?:

     RELEASE: 2.6.9-prep

Also, if you do run gdb alone on the ext3.ko or jbd.ko files, you
might see strange behaviour as well.

There's a "known problem" with the ia64 and x86_64 kernel modules
built with gcc 3.4.2, such that gdb fails the "add-symbol-file"
operation.  (I'm not sure whether you can access Red Hat bugzilla
number 141523).

Comment 3 Nick Dokos 2005-03-02 16:15:09 UTC

gcc 3.4.2 - I should point out that we first ran into this running
the stock RHEL4 AS kernel. I just happened to produce the errors for
the bug report on one of the test kernels we are currently running
to try to catch the bug described in RH bugzilla #146037.

I also tried to look at #141523, but I do not have access.

Comment 4 Dave Anderson 2005-03-02 16:30:16 UTC

Hmmm -- what gcc was used to build the stock RHEL4 kernel?

strings vmlinux | grep 'Linux ver'

Comment 5 Nick Dokos 2005-03-02 17:21:01 UTC

Bzzt - the gcc version we use is 3.4.3, *not* 3.4.2 - sorry about that.

That is also the version used to build the stock kernel and is the
default gcc on our system. There is also a gcc32 and a gcc4 available.
If you think that either one of those will help, let me know and
I'll try it out.

Here is what the strings | grep said:

Linux version 2.6.9-5.EL (bhcompile.redhat.com) (gcc
version 3.4.3 20041212 (Red Hat 3.4.3-9.EL4)) #1 SMP Wed Jan 5
19:23:24 EST 2005

and here is the version info on the default gcc:

# gcc --version
gcc (GCC) 3.4.3 20041212 (Red Hat 3.4.3-9.EL4)
Copyright (C) 2004 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR
PURPOSE.

Comment 6 Dave Anderson 2005-03-02 18:00:52 UTC

What happens when you do this?:

$ gdb ext3.o

or 

$ gdb jbd.o

Comment 7 Nick Dokos 2005-03-02 21:17:38 UTC

I tried it on both the .o and the .ko. In all cases, gdb starts
normally with no errors. It also seems to know at least some things
about what the module defines:

# gdb /lib/modules/2.6.9-prep/kernel/fs/jbd/jbd.ko 
GNU gdb Red Hat Linux (6.1post-1.20040607.62rh)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and
you are
welcome to change it and/or distribute copies of it under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for
details.
This GDB was configured as "ia64-redhat-linux-gnu"...Using host
libthread_db library "/lib/tls/libthread_db.so.1".

(gdb) ptype transaction_t
type = struct transaction_s {
    journal_t *t_journal;
    tid_t t_tid;
    enum {T_RUNNING, T_LOCKED, T_RUNDOWN, T_FLUSH, T_COMMIT,
T_FINISHED} t_state;
    long unsigned int t_log_start;
    int t_nr_buffers;
    struct journal_head *t_reserved_list;
    struct journal_head *t_locked_list;
    struct journal_head *t_buffers;
    struct journal_head *t_sync_datalist;
    struct journal_head *t_forget;
    struct journal_head *t_checkpoint_list;
    struct journal_head *t_iobuf_list;
    struct journal_head *t_shadow_list;
    struct journal_head *t_log_list;
    spinlock_t t_handle_lock;
    int t_updates;
    int t_outstanding_credits;
    transaction_t *t_cpnext;
    transaction_t *t_cpprev;
    long unsigned int t_expires;
    int t_handle_count;
    spinlock_t t_jcb_lock;
    struct list_head t_jcb;
}
(gdb)

Comment 8 Dave Anderson 2005-03-02 21:31:07 UTC

All right, in few days, I'll try a restoration of gdb-6.1 into a
test version of crash to give you, because I was given a small gdb-6.1
patch by our gdb team after I had made the regression back to gdb-6.0.
(By then it was too late to go back to gdb-6.1...)

Unfortunately, I'm completely swamped with other work right now, and
I'll get to it as soon as possible.

Comment 11 Dave Anderson 2005-04-15 20:03:08 UTC

Nick,

Please try this test version of crash (crash-3.10-13.4_gdb6.1.tar.gz)
located here:

  http://people.redhat.com/anderson/.crash_test

and let me know how it works.  I cannot guarantee that it will help,
nor whether it may introduce other problems.  

Sorry for the delay...

Dave

Comment 12 Nick Dokos 2005-04-18 17:55:05 UTC

Dave,

I tried the new version (lightly on x86, slightly more heavily on ia64 -
on live kernels so far, I have not tried it on a crash dump):
it's behaving much better - no crashes and it's giving me the
information that I expect. I'll try to push it some more and also try to
do some post-mortem debugging. If I find anything amiss, I'll let you know.

Thanks very much!

Nick

Comment 13 Dave Anderson 2005-04-18 18:29:53 UTC

Thanks, Nick.

That test version will *not* work with 32-bit (x86) vmcore
files that are greater than 4GB in length.  Unfortunately 
that test version dragged in some other stuff I'm working
on (specifically, support of multiple PT_LOAD sections so that we
can avoid the 256GB sparse memory holes on ia64 boxes), and
in so doing I broke read_netdump() for big x86 dumpfiles.

If you plug this in as a replacement for netdump.c:read_netdump(),
you could more legitimately test x86:

int
read_netdump(int fd, void *bufptr, int cnt, ulong addr, physaddr_t paddr)
{
        off_t offset;
        struct pt_load_segment *pls;
        int i;

        switch (nd->flags & (NETDUMP_ELF32|NETDUMP_ELF64))
        {
        case NETDUMP_ELF32:
                offset = (off_t)paddr + (off_t)nd->header_size;
                break;

        case NETDUMP_ELF64:
                for (i = offset = 0; i < nd->num_pt_load_segments; i++) {
                        pls = &nd->pt_load_segments[i];
                        if ((paddr >= pls->phys_start) &&
                            (paddr < pls->phys_end)) {
                                offset = (off_t)(paddr - pls->phys_start) +
                                        pls->file_offset;
                                break;
                        }
                }

                if (!offset)
                        return READ_ERROR;

                break;
        }

        if (lseek(nd->ndfd, offset, SEEK_SET) == -1)
                return SEEK_ERROR;

        if (read(nd->ndfd, bufptr, cnt) != cnt)
                return READ_ERROR;
        return cnt;
}

BTW, do you test any x86_64 machines by any chance?
As I recall, we only saw the problem with gdb-6.1 (pre-fix) handling
modules on ia64 and x86_64 machines.

Comment 18 Dave Anderson 2005-09-22 18:02:13 UTC

The fix for this issue is contained within the crash utility
versions associated with its respective RHEL3 and RHEL4 errata:

 RHEA-2005:599 crash enhancement update (RHEL3-U6) - version 4.0-1
 RHEA-2005:600 crash enhancement update (RHEL4-U2) - version 4.0-2

Comment 19 RHEL Program Management 2006-11-07 22:03:28 UTC

This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 20 Dave Anderson 2006-11-07 22:10:34 UTC

This was fixed in RHEL4-U2...

Note You need to log in before you can comment on or make changes to this bug.