Bug 609636

Summary: Unwinding through prelinked shared library broken (.debug_frame)
Product: Red Hat Enterprise Linux 6 Reporter: Mark Wielaard <mjw>
Component: systemtapAssignee: Frank Ch. Eigler <fche>
Status: CLOSED ERRATA QA Contact: qe-baseos-tools-bugs
Severity: medium Docs Contact:
Priority: low    
Version: 6.0CC: mjw, pmuller
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: systemtap-1.4-1.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-05-19 13:54:34 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 634995    
Bug Blocks:    

Description Mark Wielaard 2010-06-30 17:46:48 UTC
Description of problem:

Unwinding through a CFI that comes from the .debug_frame section in a prelinked shared library is broken (it works fine if the CFI comes from the .eh_frame sections though).

Version-Release number of selected component (if applicable):

systemtap-1.2-9.el6.i686

How reproducible:

Always on i686 (never on x86_64 - at least not with the default compiler settings).

Steps to Reproduce:
1. make installcheck RUNTESTFLAGS=exelib.exp
  
Actual results:

# of expected passes            156
# of unexpected failures        32

Expected results:

# of expected passes            188

Additional info:

Upstream fix

commit 0aab7115c0099c0b8d7579befdea8557c25078f9
Author: Mark Wielaard <mjw>
Date:   Wed Jun 30 14:27:05 2010 +0200

    Fix .debug_frame dwarf unwinding through prelinked dynamic libraries.
    
    This wasn't immediately visible since often we would pick up the .eh_frame
    CFI. But when the would pick up the CFI from the .debug_frame and the
    shared library was prelinked, we would not correctly adjust some addresses.
    
    * runtime/sym.h (_stp_module): Better explain dwarf_module_base.
    * runtime/unwind.c (adjustStartLoc): Only adjust against dwarf_module_base
      when not eh_frame.
    * translate.cxx (dump_unwindsyms): Adjust dwarf_module_base against dwbias.

Comment 2 RHEL Program Management 2010-06-30 18:03:10 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux major release.  Product Management has requested further
review of this request by Red Hat Engineering, for potential inclusion in a Red
Hat Enterprise Linux Major release.  This request is not yet committed for
inclusion.

Comment 3 Mark Wielaard 2010-07-07 20:41:47 UTC
There is a more generic update to the upstream patch:

commit 4d83bd9b6f5ccc4abd212ca5d6a6477cb52f78cc
Author: Mark Wielaard <mjw>
Date:   Mon Jul 5 21:14:42 2010 +0200

    Put generated debug_hdr in _stp_section, add sec_load_offset for adjustment.
    
    Make sure to adjust .debug_frame addresses to section load address.
    Which means keeping track of the (synthetic) .debug_frame_hdr index
    per section. For now keep track of "magic sections". Will need to
    be extended to track all loadable code sections as we do for symbol
    tables. See http://sourceware.org/ml/systemtap/2010-q3/msg00012.html
    
    * runtime/sym.h (_stp_module): Remove dwarf_module_base. Move debug_hdr
      and debug_hdr_len from here to ...
      (_stp_section): ... here. And add sec_load_offset.
    * runtime/unwind.c (adjustStartLoc): Don't use m->dwarf_module_base,
      use s->sec_load_offset.
      (_stp_search_unwind_hdr): Use s->debug_hdr and s->debug_hdr_len.
    * translate.cxx (create_debug_frame_hdr): Accept and set debug_frame_off.
      (get_unwind_data): Likewise.
      (dump_unwindsyms): Keep track of debug_frame_off. Output debug_frame_hdr
      per _stp_section if section is ".dynamic", ".absolute", ".text", or
      "_stext".

This makes it so that user space shared libraries aren't a special case anymore, but are treated similarly to other sections using .debug_frames for unwinding. This fixes a similar issue with unwinding through kernel modules.

An update to the context.exp backtrace.tcl test was also made to check the kernel unwind case:

commit ae38415f9ff7698a3ee39ef1e50ff0360fb2378a
Author: Mark Wielaard <mjw>
Date:   Tue Jul 6 12:24:19 2010 +0200

    Extend context.exp backtrace.tcl test for "perfect" DWARF backtraces.
    
    * testsuite/systemtap.context/backtrace.stp (yyy_func4): Exit at end to not
      stall expect.
    * testsuite/systemtap.context/backtrace.tcl: Add -d systemtap_test_module1
      and -d kernel for "perfect" backtraces. Keep track of module1 and kernel
      frames. Do not accept (inexact) anymore - the dwarf unwinder is "perfect"
      now. Check stap script did exit (eof).

Comment 6 Frank Ch. Eigler 2010-07-21 11:55:31 UTC
It appears that backporting the fixes into the rhel6 1.2 version is more
difficult than expected.  Let's defer this to a later version.  When/if
we rebase to systemtap-1.3 (due out in days), this will be picked up
automatically.  (Note I'm not requesting a rebase at this point for
RHEL6.0, though we can do so if requested.)

Reassigning to RHEL6.1.

Comment 10 errata-xmlrpc 2011-05-19 13:54:34 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0651.html