Bug 229853

Summary: crash will not open kernels generated on upstream kernels
Product: Red Hat Enterprise Linux 5 Reporter: Josef Bacik <jbacik>
Component: crashAssignee: Dave Anderson <anderson>
Status: CLOSED NOTABUG QA Contact: David Lawrence <dkl>
Severity: medium Docs Contact:
Priority: medium    
Version: 5.0CC: rpeterso
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-02-23 21:02:05 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Josef Bacik 2007-02-23 20:18:56 UTC
Description of problem:
Whenever I generate a core on an upstream (2.6.20) kernel, crash will not open 
it.  Using version 4.0-3.20 spits out these errors

[root@rh5cluster2 127.0.0.1-2007-02-23-15:09:25]# 
crash /root/linux-2.6/vmlinux vmcore 

crash 4.0-3.20
Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007  Red Hat, Inc.
Copyright (C) 2004, 2005, 2006  IBM Corporation
Copyright (C) 1999-2006  Hewlett-Packard Co
Copyright (C) 2005, 2006  Fujitsu Limited
Copyright (C) 2006  VA Linux Systems Japan K.K.
Copyright (C) 2005  NEC Corporation
Copyright (C) 1999, 2002  Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions.  Enter "help copying" to see the conditions.
This program has absolutely no warranty.  Enter "help warranty" for details.
 
GNU gdb 6.1
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i686-pc-linux-gnu"...


crash: invalid (optional) structure member offsets: zone_struct_free_pages or 
zone_free_pages
       FILE: memory.c  LINE: 11520  FUNCTION: dump_memory_nodes()

[/usr/bin/crash] error trace: 8096dcc => 80baca0 => 80ba076 => 812ee1a
/usr/bin/nm: /usr/bin/crash: no symbols
/usr/bin/nm: /usr/bin/crash: no symbols
/usr/bin/nm: /usr/bin/crash: no symbols
/usr/bin/nm: /usr/bin/crash: no symbols

WARNING: Because this kernel was compiled with gcc version 4.1.1, certain
         commands or command options may fail unless crash is invoked with
         the  "--readnow" command line option.

[root@rh5cluster2 127.0.0.1-2007-02-23-15:09:25]# file vmcore 
vmcore: ELF 32-bit LSB core file Intel 80386, version 1 (SYSV), SVR4-style

Using --readnow does not make things any better.  Let me know if you need 
access to the box to look at this on, its not going anywhere.  Oh and I built 
the kernel with -g to see if that made a difference, and it did not.

Comment 1 Dave Anderson 2007-02-23 20:26:04 UTC
Why is this posted against rhel5-rc1?

> Using --readnow does not make things any better.  Let me know if you need 
> access to the box to look at this on, its not going anywhere.  Oh and I built 
> the kernel with -g to see if that made a difference, and it did not.

You have no choice -- by definition the kernel *must* be built with -g.



Comment 2 Josef Bacik 2007-02-23 20:39:01 UTC
I posted it against rhel5-rc1 because its a rhel5 box, what should I have 
posted it against?

>You have no choice -- by definition the kernel *must* be built with -g.

Yeah it wasn't built that way by default and I thought it was, but I rebuilt 
it with -g and I still couldn't read the core.

Comment 3 Dave Anderson 2007-02-23 20:57:38 UTC
> I posted it against rhel5-rc1 because its a rhel5 box, what should I have 
> posted it against?

The short answer is, you don't...

The crash utility released by Red Hat is for Red Hat kernels.  If it doesn't
work with Red Hat kernels, then that's bugzilla-worthy.

The upstream kernel is constantly changing, and because of kernel dependencies
in the crash utility, it's only a matter of time until the shifting sands of
the upstream kernel "breaks" crash.  That kind of bug-report/fixing is typically 
done via the crash utility mailing list, which I just sent an invite to you.

  https://www.redhat.com/mailman/listinfo/crash-utility 

That all being said, the error that you're seeing is a bit strange:

> crash: invalid (optional) structure member offsets: zone_struct_free_pages or 
> zone_free_pages

AFAIK, the zone.free_pages field still exists -- at least in the 2.6.20-based
"rt" kernel, which is fairly up-to-date:

# crash

crash 4.0-3.14
Copyright (C) 2002, 2003, 2004, 2005, 2006  Red Hat, Inc.
Copyright (C) 2004, 2005, 2006  IBM Corporation
Copyright (C) 1999-2006  Hewlett-Packard Co
Copyright (C) 2005  Fujitsu Limited
Copyright (C) 2005  NEC Corporation
Copyright (C) 1999, 2002  Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions.  Enter "help copying" to see the conditions.
This program has absolutely no warranty.  Enter "help warranty" for details.

GNU gdb 6.1
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i686-pc-linux-gnu"...

      KERNEL: /vmlinux-2.6.20-0119.rt8.2
    DUMPFILE: /dev/mem
        CPUS: 8
        DATE: Fri Feb 23 15:49:32 2007
      UPTIME: 1 days, 01:45:01
LOAD AVERAGE: 0.08, 0.03, 0.02
       TASKS: 276
    NODENAME: xw6400-01.boston.redhat.com
     RELEASE: 2.6.20-0119.rt8.2
     VERSION: #1 SMP PREEMPT Mon Feb 19 15:11:23 EST 2007
     MACHINE: i686  (1596 Mhz)
      MEMORY: 2 GB
         PID: 4582
     COMMAND: "crash"
        TASK: f67fa270  [THREAD_INFO: dd0f9000]
         CPU: 3
       STATE: TASK_RUNNING (ACTIVE)

crash> zone
struct zone {
    long unsigned int free_pages;
    long unsigned int pages_min;
    long unsigned int pages_low;
    long unsigned int pages_high;
    long unsigned int lowmem_reserve[3];
    struct per_cpu_pageset pageset[32];
    spinlock_t lock;
    struct free_area free_area[11];
    struct zone_padding _pad1_;
    spinlock_t lru_lock;
    struct list_head active_list;
    struct list_head inactive_list;
    long unsigned int nr_scan_active;
    long unsigned int nr_scan_inactive;
    long unsigned int nr_active;
    long unsigned int nr_inactive;
    long unsigned int pages_scanned;
    int all_unreclaimable;
    atomic_t reclaim_in_progress;
    atomic_long_t vm_stat[11];
    int prev_priority;
    struct zone_padding _pad2_;
    wait_queue_head_t *wait_table;
    long unsigned int wait_table_hash_nr_entries;
    long unsigned int wait_table_bits;
    struct pglist_data *zone_pgdat;
    long unsigned int zone_start_pfn;
    long unsigned int spanned_pages;
    long unsigned int present_pages;
    const char *name;
}
SIZE: 4736
crash>

Note that the zone.free_pages is the first field in the data structure.

When you get the "invalid structure member offset" type messages, in this
case it tried both options, the older zone_struct.free_pages, and then
the zone.free_pages member, because the kernel changed the name of the
structure from zone_struct to just zone.

Anyway, if you do this with your kernel:

  # gdb vmlinux
  ...
  (gdb) ptype struct zone

What do you see?

On the rt kernel, for example, it shows:

# gdb /vmlinux-2.6.20-0119.rt8.2
GNU gdb Red Hat Linux (6.5-16.el5rh)
Copyright (C) 2006 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-redhat-linux-gnu"...Using host libthread_db
library "/lib/libthread_db.so.1".

(gdb) ptype struct zone
type = struct zone {
    long unsigned int free_pages;
    long unsigned int pages_min;
    long unsigned int pages_low;
    long unsigned int pages_high;
    long unsigned int lowmem_reserve[3];
    struct per_cpu_pageset pageset[32];
    spinlock_t lock;
    struct free_area free_area[11];
    struct zone_padding _pad1_;
    spinlock_t lru_lock;
    struct list_head active_list;
    struct list_head inactive_list;
    long unsigned int nr_scan_active;
    long unsigned int nr_scan_inactive;
    long unsigned int nr_active;
    long unsigned int nr_inactive;
    long unsigned int pages_scanned;
    int all_unreclaimable;
    atomic_t reclaim_in_progress;
    atomic_long_t vm_stat[11];
    int prev_priority;
    struct zone_padding _pad2_;
    wait_queue_head_t *wait_table;
    long unsigned int wait_table_hash_nr_entries;
    long unsigned int wait_table_bits;
    struct pglist_data *zone_pgdat;
    long unsigned int zone_start_pfn;
    long unsigned int spanned_pages;
    long unsigned int present_pages;
    const char *name;
}
(gdb)


Comment 4 Josef Bacik 2007-02-23 21:02:05 UTC
ok i will close this and take the discussion to the mailing list.

Comment 5 Dave Anderson 2007-02-23 21:22:38 UTC
> Let me know if you need access to the box to look at this on,
> its not going anywhere.

That would be good -- let me know the details, including where the
source tree is located, and I'll take a look at fixing it on Monday.