Description of problem: crash-3.8.3 bt -a hangs. Promt doesn't come back. Version-Release number of selected component (if applicable): RHEL3-U4 How reproducible: 50% Steps to Reproduce: 1. crash 2. bt -a 3. Actual results: hang after bt -a. Prompt doesn't come back. Expected results: bt -a works. Additional info: Dave Anderson investigated. His comment is ---------- This bt "hang" was the result of the IP interrupt that was sent from the diskdump process catching PID 3822 just after it had entered the kernel to do a system_call, but before it had a chance to call the actual system call handler function. I have *never* seen this before, and the backtrace code is not equipped to even handle such a situation! I fixed it with a kludge (the /usr/bin/crash on 192.168.78.227 has been updated to a temporary version 3.8-5.6a). The trace looks like this: crash> bt 3822 PID: 3822 TASK: f050c000 CPU: 7 COMMAND: "dd" #0 [f050df84] smp_call_function_interrupt at 211d18f #1 [f050df8c] call_call_function_interrupt at 23eee2f EAX: 00000004 EBX: 00000001 ECX: 084cb000 EDX: 00101000 EBP: feffa968 DS: 0068 ESI: 084cb000 ES: 0068 EDI: 00000000 CS: 0060 EIP: fffd7027 ERR: fffffffb EFLAGS: 00000286 #2 [f050dfc0] system_call at 23ee027 EAX: 00000004 EBX: 00000001 ECX: 084cb000 EDX: 00000200 DS: 002b ESI: 084cb000 ES: 002b EDI: 00000000 SS: 002b ESP: feffa948 EBP: feffa968 CS: 0023 EIP: 001df9fe ERR: 00000004 EFLAGS: 00000246 crash> It's interesting -- never have I seen two exceptions happen so close together without an intervening function call. The hang was caused by a function that was trying to determine the stack frame size of "system_call", and complicated by the fact that the interrupted EIP (0xfffd7027) is the hugemem trampoline address for the real system_call address of 0x23ee027. Anyway, this will be quite difficult to reproduce, so I won't be updating my people site with this fix until something else comes along. -----------------
I'm confused here. Is this a new report? I fixed the x86 backtrace issue in my public crash utility release on people.redhat.com in version 3.8-5.7, and that will be carried forward into an update of crash for RHEL3-U5. But this BZ states that it happens on "All" Platforms, and that it happens 50% of the time. Are we talking specifically about the situation that I investigated?
Dave, I put old status bug report without checking the latest one. Tatsuo and I will check the latest version 3.8-5.7 tomorrow(11/17). And I will put the result.
I confirmed it at crash-3.8.5-11. It works without hangup.
Fix checked into CVS: RHEL3: * Fri Feb 04 2005 Dave Anderson <anderson> 3.10-4 - Fixes potential "bt -a" hang on dumpfile where netdump IPI interrupted an x86 process while executing the instructions just after it had entered the kernel for a syscall, but before calling the handler. BZ #139437 RHEL4: * Thu Feb 10 2005 Dave Anderson <anderson> 3.10-7 - Fixes potential "bt -a" hang on dumpfile where netdump IPI interrupted an x86 process while executing the instructions just after it had entered the kernel for a syscall, but before calling the handler. BZ #139437
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2005-184.html