Bug 139437 - [RHEL3-U4][crash] bt -a hangs
[RHEL3-U4][crash] bt -a hangs
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: netdump (Show other bugs)
3.0
i386 Linux
medium Severity medium
: ---
: ---
Assigned To: Tatsuo Uchida
David Lawrence
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2004-11-15 17:58 EST by Yuuichi Nagahama
Modified: 2007-11-30 17:07 EST (History)
12 users (show)

See Also:
Fixed In Version: RHBA-2005-186
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2005-05-19 08:47:06 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Yuuichi Nagahama 2004-11-15 17:58:47 EST
Description of problem:
crash-3.8.3
bt -a hangs. Promt doesn't come back.


Version-Release number of selected component (if applicable):
RHEL3-U4

How reproducible: 50%


Steps to Reproduce:
1. crash
2. bt -a
3.
  
Actual results: hang after bt -a. Prompt doesn't come back.


Expected results: bt -a works.


Additional info:
Dave Anderson investigated. His comment is
----------
This bt "hang" was the result of the IP interrupt that was sent
from the diskdump process catching PID 3822 just after it had
entered the kernel to do a system_call, but before it had a chance
to call the actual system call handler function.  I have *never*
seen this before, and the backtrace code is not equipped to
even handle such a situation!

I fixed it with a kludge (the /usr/bin/crash on 192.168.78.227 has
been updated to a temporary version 3.8-5.6a).  The trace looks
like this:

crash> bt 3822
PID: 3822   TASK: f050c000  CPU: 7   COMMAND: "dd"
 #0 [f050df84] smp_call_function_interrupt at 211d18f
 #1 [f050df8c] call_call_function_interrupt at 23eee2f
    EAX: 00000004  EBX: 00000001  ECX: 084cb000  EDX: 00101000  EBP: 
feffa968
    DS:  0068      ESI: 084cb000  ES:  0068      EDI: 00000000
    CS:  0060      EIP: fffd7027  ERR: fffffffb  EFLAGS: 00000286
 #2 [f050dfc0] system_call at 23ee027
    EAX: 00000004  EBX: 00000001  ECX: 084cb000  EDX: 00000200
    DS:  002b      ESI: 084cb000  ES:  002b      EDI: 00000000
    SS:  002b      ESP: feffa948  EBP: feffa968
    CS:  0023      EIP: 001df9fe  ERR: 00000004  EFLAGS: 00000246
crash>

It's interesting -- never have I seen two exceptions happen
so close together without an intervening function call.
The hang was caused by a function that was trying to determine
the stack frame size of "system_call", and complicated
by the fact that the interrupted EIP (0xfffd7027) is
the hugemem trampoline address for the real system_call address
of 0x23ee027.

Anyway, this will be quite difficult to reproduce, so I
won't be updating my people site with this fix
until something else comes along.
-----------------
Comment 1 Dave Anderson 2004-11-16 08:44:16 EST
I'm confused here.  Is this a new report?  I fixed the x86 backtrace
issue in my public crash utility release on people.redhat.com
in version 3.8-5.7, and that will be carried forward into an
update of crash for RHEL3-U5.

But this BZ states that it happens on "All" Platforms, and
that it happens 50% of the time.  Are we talking specifically
about the situation that I investigated? 

  
Comment 2 Yuuichi Nagahama 2004-11-16 14:40:10 EST
Dave,

I put old status bug report without checking the latest one.
Tatsuo and I will check the latest version 3.8-5.7 tomorrow(11/17).
And I will put the result.
Comment 3 Tatsuo Uchida 2004-11-17 14:50:31 EST
I confirmed it at crash-3.8.5-11.
It works without hangup.
Comment 4 Dave Anderson 2005-02-18 11:28:20 EST
Fix checked into CVS:

RHEL3:

* Fri Feb 04 2005 Dave Anderson <anderson@redhat.com> 3.10-4
- Fixes potential "bt -a" hang on dumpfile where netdump IPI
  interrupted an x86 process while executing the instructions just
  after it had entered the kernel for a syscall, but before calling
  the handler.  BZ #139437

RHEL4:
* Thu Feb 10 2005 Dave Anderson <anderson@redhat.com> 3.10-7
- Fixes potential "bt -a" hang on dumpfile where netdump IPI
  interrupted an x86 process while executing the instructions just
  after it had entered the kernel for a syscall, but before calling
  the handler.  BZ #139437

Comment 5 Tim Powers 2005-05-19 08:47:06 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2005-184.html

Note You need to log in before you can comment on or make changes to this bug.