Bug 599498

Summary: kernel crash on backtrace, stack_walk field not set
Product: Red Hat Enterprise Linux 6 Reporter: Mark Wielaard <mjw>
Component: systemtapAssignee: Frank Ch. Eigler <fche>
Status: CLOSED CURRENTRELEASE QA Contact: Petr Muller <pmuller>
Severity: medium Docs Contact:
Priority: low    
Version: 6.0CC: mjw, ohudlick, pmuller, tao
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: systemtap-1.2-5.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-11-10 21:45:00 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Mark Wielaard 2010-06-03 11:25:12 UTC
Description of problem:

There is a possibility of crashing the kernel when creating a backtrace in a stap script. This is caused by a kernel backport of the dump_stack code that now takes a new argument which wasn't filled in by the systemtap runtime code.

Version-Release number of selected component (if applicable):

systemtap-1.2-3.el6.x86_64

How reproducible:

50%. It depends on the normal systemtap unwinder failing (for example because it cannot access some memory needed), then the fallback kernel dump_stack code might be called with a not fully filled in stacktrace_ops struct (since the backport introduced a new stack_walk field we don't expect in kernel version < 2.6.33), which might lead to the kernel jumping to a random address.

Steps to Reproduce:
1. Run context.exp testcase (make installcheck RUNTESTFLAGS=context.exp)
  
Actual results:

Kernel crash with OOPs.

Expected results:

No crashing kernel, but passing testcase.

Additional info:

Fixed upstream

commit c265cd259a82542abf290a6aeb058056d6c18b73
Author: Mark Wielaard <mjw>
Date:   Thu Jun 3 11:26:17 2010 +0200

    Replace walk_stack field version guard with autoconf test.
    
    The test for whether or not to assign print_context_stack to the
    walk_stack stacktrace_ops field depended on the kernel version.
    Replace with a proper runtime/autoconf test to make sure the field
    always gets assigned when available.
    
    * buildrun.cxx (compile_pass): Add output for STAP_CONF_WALK_STACK.
    * runtime/autoconf-walk-stack.c: New test.
    * runtime/stack.c (print_stack_ops): Assign walk_stack field
      print_context_stack depending on STAP_CONF_WALK_STACK.

Comment 2 RHEL Program Management 2010-06-03 12:23:16 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux major release.  Product Management has requested further
review of this request by Red Hat Engineering, for potential inclusion in a Red
Hat Enterprise Linux Major release.  This request is not yet committed for
inclusion.

Comment 3 Frank Ch. Eigler 2010-06-25 16:01:13 UTC
*** Bug 602560 has been marked as a duplicate of this bug. ***

Comment 5 Frank Ch. Eigler 2010-07-07 21:20:07 UTC
*** Bug 612322 has been marked as a duplicate of this bug. ***

Comment 6 Issue Tracker 2010-07-13 16:56:46 UTC
Event posted on 07-13-2010 07:11am EDT by Glen Johnson

------- Comment From  2010-07-13 07:09 EDT-------
Ported and tested the RHEL 6 Beta2 systemtap packages with commit posted
in https://bugzilla.redhat.com/show_bug.cgi?id=599498#c0, it fixes the
bug.


This event sent from IssueTracker by jkachuck 
 issue 1092833

Comment 9 Petr Muller 2010-09-23 14:31:20 UTC
I've been running the testcase in a loop for about a day, and no box is dead. I suppose thats a PASS. There are still FAILs, but that's different story. 

Moving to VERIFIED.

Comment 10 releng-rhel@redhat.com 2010-11-10 21:45:00 UTC
Red Hat Enterprise Linux 6.0 is now available and should resolve
the problem described in this bug report. This report is therefore being closed
with a resolution of CURRENTRELEASE. You may reopen this bug report if the
solution does not work for you.