Bug 702427

Summary: gdb cannot "handle" a large data structure
Product: Red Hat Enterprise Linux 6 Reporter: Dave Anderson <anderson>
Component: gdbAssignee: Jan Kratochvil <jan.kratochvil>
Status: CLOSED ERRATA QA Contact: qe-baseos-tools-bugs
Severity: high Docs Contact:
Priority: unspecified    
Version: 6.0CC: pmuller, tromey
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: gdb-7.2-49.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-12-06 17:33:50 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
the patch none

Description Dave Anderson 2011-05-05 16:09:10 UTC
Description of problem:

When running gdb against the attached 2.6.38.2-9.fc15.x86_64 
Fedora 15 vmlinux kernel, it cannot properly "handle" this
particular, fairly large, data structure:

(gdb) ptype struct pglist_data
type = struct pglist_data {
    struct zone node_zones[4];
    struct zonelist node_zonelists[2];
    int nr_zones;
    spinlock_t node_size_lock;
    long unsigned int node_start_pfn;
    long unsigned int node_present_pages;
    long unsigned int node_spanned_pages;
    int node_id;
    wait_queue_head_t kswapd_wait;
    struct task_struct *kswapd;
    int kswapd_max_order;
    enum zone_type classzone_idx;
} 

I noticed this when using an older gdb-7.0 version that is
embedded in the crash utility, which relies upon gdb being
able to determine structure member offsets.  And in the data
structure above, the offsets of all members beyond the 
node_zonelists[2] member return 0.

But taking the crash utility out of the picture, the problem
can also be seen simply by running "gdb vmlinux" using either
gdb-7.2.50.20110328-31.fc15 or gdb-7.2-48.el6.

Here's what happens:

  # gdb vmlinux-2.6.38.2-9.fc15
  GNU gdb (GDB) Red Hat Enterprise Linux (7.2-48.el6)
  Copyright (C) 2010 Free Software Foundation, Inc.
  License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
  This is free software: you are free to change and redistribute it.
  There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
  and "show warranty" for details.
  This GDB was configured as "x86_64-redhat-linux-gnu".
  For bug reporting instructions, please see:
  <http://www.gnu.org/software/gdb/bugs/>...
  Reading symbols from /root/vmlinux-2.6.38.2-9.fc15...done.
  (gdb) ptype struct pglist_data
  type = struct pglist_data {
      struct zone node_zones[4];
      struct zonelist node_zonelists[2];
      int nr_zones;
      spinlock_t node_size_lock;
      long unsigned int node_start_pfn;
      long unsigned int node_present_pages;
      long unsigned int node_spanned_pages;
      int node_id;
      wait_queue_head_t kswapd_wait;
      struct task_struct *kswapd;
      int kswapd_max_order;
      enum zone_type classzone_idx;
  }
  (gdb) p &((struct pglist_data *)(0x0)).node_zonelists[0]
  $1 = (struct zonelist *) 0x1c00
  (gdb) p &((struct pglist_data *)(0x0)).nr_zones
  $2 = (int *) 0x0
  (gdb) p &((struct pglist_data *)(0x0)).node_size_lock
  $3 = (spinlock_t *) 0x0
  (gdb) p &((struct pglist_data *)(0x0)).node_start_pfn
  $4 = (long unsigned int *) 0x0
  (gdb) p &((struct pglist_data *)(0x0)).node_present_pages
  $5 = (long unsigned int *) 0x0
  (gdb) p &((struct pglist_data *)(0x0)).node_spanned_pages
  $6 = (long unsigned int *) 0x0
  (gdb) p &((struct pglist_data *)(0x0)).node_id
  $7 = (int *) 0x0
  (gdb) p &((struct pglist_data *)(0x0)).kswapd_wait
  $8 = (wait_queue_head_t *) 0x0
  (gdb) p &((struct pglist_data *)(0x0)).kswapd
  $9 = (struct task_struct **) 0x0
  (gdb) p &((struct pglist_data *)(0x0)).kswapd_max_order
  $10 = (int *) 0x0
  (gdb) p &((struct pglist_data *)(0x0)).classzone_idx
  $11 = (enum zone_type *) 0x0
  (gdb) 
  
Interestingly enough, if I run against a slightly earlier 
2.6.38-rc4 kernel, the problem does not happen:
 
  # gdb vmlinux-2.6.38-rc4
  GNU gdb (GDB) Red Hat Enterprise Linux (7.2-48.el6)
  Copyright (C) 2010 Free Software Foundation, Inc.
  License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
  This is free software: you are free to change and redistribute it.
  There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
  and "show warranty" for details.
  This GDB was configured as "x86_64-redhat-linux-gnu".
  For bug reporting instructions, please see:
  <http://www.gnu.org/software/gdb/bugs/>...
  Reading symbols from /root/vmlinux-2.6.38-rc4...done.
  (gdb) ptype struct pglist_data
  type = struct pglist_data {
      struct zone node_zones[4];
      struct zonelist node_zonelists[2];
      int nr_zones;
      spinlock_t node_size_lock;
      long unsigned int node_start_pfn;
      long unsigned int node_present_pages;
      long unsigned int node_spanned_pages;
      int node_id;
      wait_queue_head_t kswapd_wait;
      struct task_struct *kswapd;
      int kswapd_max_order;
      enum zone_type classzone_idx;
  }
  (gdb) p &((struct pglist_data *)(0x0)).node_zonelists[0]
  $1 = (struct zonelist *) 0x1c00
  (gdb) p &((struct pglist_data *)(0x0)).nr_zones
  $2 = (int *) 0x13e40
  (gdb) p &((struct pglist_data *)(0x0)).node_size_lock
  $3 = (spinlock_t *) 0x13e44
  (gdb) p &((struct pglist_data *)(0x0)).node_start_pfn
  $4 = (long unsigned int *) 0x13e48
  (gdb) p &((struct pglist_data *)(0x0)).node_present_pages
  $5 = (long unsigned int *) 0x13e50
  (gdb) p &((struct pglist_data *)(0x0)).node_spanned_pages
  $6 = (long unsigned int *) 0x13e58
  (gdb) p &((struct pglist_data *)(0x0)).node_id
  $7 = (int *) 0x13e60
  (gdb) p &((struct pglist_data *)(0x0)).kswapd
  $8 = (struct task_struct **) 0x13e80
  (gdb) p &((struct pglist_data *)(0x0)).kswapd_max_order
  $9 = (int *) 0x13e88
  (gdb) p &((struct pglist_data *)(0x0)).classzone_idx
  $10 = (enum zone_type *) 0x13e8c
  (gdb) 

Of interest is that the earlier kernel was compiled with 4.5.1, 
whereas the the newer one was compiled with 4.6.0-1.

Version-Release number of selected component (if applicable):

gdb-7.2-48.el6

How reproducible:

Always

Steps to Reproduce:
1. gdb vmlinux-2.6.38.2-9.fc15
2. calculate offset of pglist_data structure members as shown above
3. 
  
Actual results:

Member offsets beyond pglist_data.node_zonelists[] are miscalculated
to be 0.

Expected results:

Should be the same offset as seen when running against "earlier" 
vmlinux-2.6.38-rc4 kernel.


Additional info:

vmlinux-2.6.38-rc4:
  gcc version 4.5.1 20100924 (Red Hat 4.5.1-4) (GCC)

vmlinux--2.6.38.2-9.fc15:
  gcc version 4.6.0 20110329 (Red Hat 4.6.0-1) (GCC)

Comment 2 Dave Anderson 2011-05-05 16:14:22 UTC
Here are links to the two vmlinux files referenced above:

  http://people.redhat.com/anderson/vmlinux-2.6.38-rc4.gz
  http://people.redhat.com/anderson/vmlinux-2.6.38.2-9.fc15.gz

Comment 3 RHEL Program Management 2011-05-06 06:00:35 UTC
Since RHEL 6.1 External Beta has begun, and this bug remains
unresolved, it has been rejected as it is not proposed as
exception or blocker.

Red Hat invites you to ask your support representative to
propose this request, if appropriate and relevant, in the
next release of Red Hat Enterprise Linux.

Comment 4 Tom Tromey 2011-05-06 16:10:01 UTC
The DWARF is definitely correct, but (IMO) odd, e.g.:

 [  4428]      member
               name                 (strp) "nr_zones"
               decl_file            (data1) 47
               decl_line            (data2) 615
               type                 (ref4) [    ed]
               data_member_location (data4) location list [ 13e40]

The earlier version doesn't have a location list here, just a constant:

 [  447e]      member
               name                 (strp) "nr_zones"
               decl_file            (data1) 47
               decl_line            (data2) 615
               type                 (ref4) [    c7]
               data_member_location (sdata) 13e40

The bug is that gdb gives up on this kind of member location.

Comment 5 Tom Tromey 2011-05-06 16:54:51 UTC
I'm testing a patch.

Comment 6 Dave Anderson 2011-05-06 17:45:46 UTC
(In reply to comment #5)
> I'm testing a patch.

Awesome Tom -- I appreciate the quick response.

(and I'm praying that your patch will apply to the crash utility's
embedded gdb-7.0...)

Comment 7 Tom Tromey 2011-05-06 18:57:37 UTC
> (and I'm praying that your patch will apply to the crash utility's
> embedded gdb-7.0...)

I took a quick look and I think it will.

It would be better if crash did not embed gdb in this way.
Stuff like this is why Fedora forbids private copies of libraries and the like.

E.g., I believe there have been important debuginfo-reading additions
since 7.0.  That means that there are probably cases where gdb can access
values (or maybe even unwind -- not sure) but where crash cannot.

Comment 8 Dave Anderson 2011-05-06 19:17:53 UTC
Yeah, well that's not going to happen...

The embedded gdb is the perfect one-stop-shop for data structure deconstruction,
text disassembly, line-numbers, pretty-printing, add-symbol-file capability for
throwing in all the kernel module objects into the session, etc..  Actually,
unwinding is one thing that gdb is *not* used for.

It's just that the pglist_data structure is a key focal point, and the failure
here was obvious.  But gdb failures have been *very* rare over the years, and
incorporating an updated version proved to be fairly easy the last time we went
over this issue.

Comment 9 Jan Kratochvil 2011-05-06 19:24:25 UTC
http://fedoraproject.org/wiki/Packaging:No_Bundled_Libraries
(that GDB is not a library is not accepted)

Comment 10 Dave Anderson 2011-05-06 19:30:49 UTC
Tom, can you attach the patch to this BZ?

Comment 11 Tom Tromey 2011-05-06 19:42:13 UTC
Created attachment 497441 [details]
the patch

Comment 12 Dave Anderson 2011-05-06 20:03:18 UTC
OK, thanks...

It looked good until the handle_data_member_location() call, which doesn't exist.
But I don't see it in gdb-7.2-48.el6 either -- what are you patching against?

Comment 13 Dave Anderson 2011-05-06 20:06:11 UTC
Sorry it's right there in the patch, but the last stanza doesn't apply
to either gdb-7.0 or gdb-7.2-48.el6.  But I should be able to shoe-horn
it in and test it out.

Thanks!

Comment 18 errata-xmlrpc 2011-12-06 17:33:50 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2011-1699.html