Bug 139437 - [RHEL3-U4][crash] bt -a hangs
[RHEL3-U4][crash] bt -a hangs
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: netdump (Show other bugs)
i386 Linux
medium Severity medium
: ---
: ---
Assigned To: Tatsuo Uchida
David Lawrence
Depends On:
  Show dependency treegraph
Reported: 2004-11-15 17:58 EST by Yuuichi Nagahama
Modified: 2007-11-30 17:07 EST (History)
12 users (show)

See Also:
Fixed In Version: RHBA-2005-186
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2005-05-19 08:47:06 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description Yuuichi Nagahama 2004-11-15 17:58:47 EST
Description of problem:
bt -a hangs. Promt doesn't come back.

Version-Release number of selected component (if applicable):

How reproducible: 50%

Steps to Reproduce:
1. crash
2. bt -a
Actual results: hang after bt -a. Prompt doesn't come back.

Expected results: bt -a works.

Additional info:
Dave Anderson investigated. His comment is
This bt "hang" was the result of the IP interrupt that was sent
from the diskdump process catching PID 3822 just after it had
entered the kernel to do a system_call, but before it had a chance
to call the actual system call handler function.  I have *never*
seen this before, and the backtrace code is not equipped to
even handle such a situation!

I fixed it with a kludge (the /usr/bin/crash on has
been updated to a temporary version 3.8-5.6a).  The trace looks
like this:

crash> bt 3822
PID: 3822   TASK: f050c000  CPU: 7   COMMAND: "dd"
 #0 [f050df84] smp_call_function_interrupt at 211d18f
 #1 [f050df8c] call_call_function_interrupt at 23eee2f
    EAX: 00000004  EBX: 00000001  ECX: 084cb000  EDX: 00101000  EBP: 
    DS:  0068      ESI: 084cb000  ES:  0068      EDI: 00000000
    CS:  0060      EIP: fffd7027  ERR: fffffffb  EFLAGS: 00000286
 #2 [f050dfc0] system_call at 23ee027
    EAX: 00000004  EBX: 00000001  ECX: 084cb000  EDX: 00000200
    DS:  002b      ESI: 084cb000  ES:  002b      EDI: 00000000
    SS:  002b      ESP: feffa948  EBP: feffa968
    CS:  0023      EIP: 001df9fe  ERR: 00000004  EFLAGS: 00000246

It's interesting -- never have I seen two exceptions happen
so close together without an intervening function call.
The hang was caused by a function that was trying to determine
the stack frame size of "system_call", and complicated
by the fact that the interrupted EIP (0xfffd7027) is
the hugemem trampoline address for the real system_call address
of 0x23ee027.

Anyway, this will be quite difficult to reproduce, so I
won't be updating my people site with this fix
until something else comes along.
Comment 1 Dave Anderson 2004-11-16 08:44:16 EST
I'm confused here.  Is this a new report?  I fixed the x86 backtrace
issue in my public crash utility release on people.redhat.com
in version 3.8-5.7, and that will be carried forward into an
update of crash for RHEL3-U5.

But this BZ states that it happens on "All" Platforms, and
that it happens 50% of the time.  Are we talking specifically
about the situation that I investigated? 

Comment 2 Yuuichi Nagahama 2004-11-16 14:40:10 EST

I put old status bug report without checking the latest one.
Tatsuo and I will check the latest version 3.8-5.7 tomorrow(11/17).
And I will put the result.
Comment 3 Tatsuo Uchida 2004-11-17 14:50:31 EST
I confirmed it at crash-3.8.5-11.
It works without hangup.
Comment 4 Dave Anderson 2005-02-18 11:28:20 EST
Fix checked into CVS:


* Fri Feb 04 2005 Dave Anderson <anderson@redhat.com> 3.10-4
- Fixes potential "bt -a" hang on dumpfile where netdump IPI
  interrupted an x86 process while executing the instructions just
  after it had entered the kernel for a syscall, but before calling
  the handler.  BZ #139437

* Thu Feb 10 2005 Dave Anderson <anderson@redhat.com> 3.10-7
- Fixes potential "bt -a" hang on dumpfile where netdump IPI
  interrupted an x86 process while executing the instructions just
  after it had entered the kernel for a syscall, but before calling
  the handler.  BZ #139437

Comment 5 Tim Powers 2005-05-19 08:47:06 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.


Note You need to log in before you can comment on or make changes to this bug.