Bug 88821 - lockup in panic()
Summary: lockup in panic()
Alias: None
Product: Red Hat Enterprise Linux 2.1
Classification: Red Hat
Component: kernel
Version: 2.1
Hardware: i386
OS: Linux
Target Milestone: ---
Assignee: Larry Woodman
QA Contact: Brian Brock
Depends On:
TreeView+ depends on / blocked
Reported: 2003-04-14 16:37 UTC by Robert Hentosh
Modified: 2007-11-30 22:06 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Last Closed: 2005-09-28 13:13:24 UTC
Target Upstream Version:

Attachments (Terms of Use)

Description Robert Hentosh 2003-04-14 16:37:24 UTC
Description of problem:
System locks up in panic() in kernel 2.4.9-e.3. Checking 2.4.9-e.16 also shows 
same code.

Version-Release number of selected component (if applicable):
Problem reproduced in 2.4.9-e.3.  Code is same in 2.4.9-e.16.

How reproducible:

Steps to Reproduce:
1. Heavily stress SMP system until panic received. (Still diagnosing)
Actual results:
Endless loop in panic.c when doing CHECK_EMERGENCY_SYNC

Expected results:
no hang.

Additional info:
Using an ITP we obtained the following code of a locked up processor:

0x0148:0010:00000000c011b989   e872cffeff               call $-x00013089  ;a=c01
0x0148:0010:00000000c011b98e   a1f4d53dc0               mov eax, dword ptr 
0x0148:0010:00000000c011b993   8db600000000             lea esi, dword ptr 
[esi+ 0x00000000] 
0x0148:0010:00000000c011b999   8dbc2700000000           lea edi, dword ptr 
[edi+ 0x00000000] 
0x0148:0010:00000000c011b9a0   85c0                     test eax, eax 
0x0148:0010:00000000c011b9a2   74fc                     jz $-0x02  ;a=c011b9a0 
0x0148:0010:00000000c011b9a4   e8878d0700               call 
+0x00078d8c  ;a=c0194730 

Please note that the "test eax, eax" followed by the jz is not going anywhere.  
eax was previously loaded with the value of emergency_sync_scheduled.  This 
appears at the end of panic() in linux/kernel/panic.c in the macro 
CHECK_EMERGENCY_SYNC which is defined in linux/include/linux/sysrq.h.  Looks 
like emergency_sync_scheduled at first glance should be marked volatile.

Comment 1 Robert Hentosh 2003-04-14 16:39:12 UTC
Hmmm... I just checked 2.4.20 from kernel.org and sysrq.h has been patched to 
define emergency_sync_scheduled as "volatile int".

Comment 2 Arjan van de Ven 2003-04-14 18:03:56 UTC
panic() isn't actually supposed to return....... it's like a panic 

Comment 3 Robert Hentosh 2003-04-14 19:41:12 UTC
That is understandable.  But it would be nice if it was coded that way instead 
of depending on a compiler optimization.

This code looks like it is waiting for the variable emergency_sync_scheduled to 
be non-zero and then call do_emergency_sync.  Coded as is it will never call 
do_emergency_sync.  yes?

Note You need to log in before you can comment on or make changes to this bug.