Bug 88821

Summary: lockup in panic()
Product: Red Hat Enterprise Linux 2.1 Reporter: Robert Hentosh <robert_hentosh>
Component: kernelAssignee: Larry Woodman <lwoodman>
Status: CLOSED WONTFIX QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 2.1CC: john_hull, wwlinuxengineering
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-09-28 13:13:24 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Robert Hentosh 2003-04-14 16:37:24 UTC
Description of problem:
System locks up in panic() in kernel 2.4.9-e.3. Checking 2.4.9-e.16 also shows 
same code.


Version-Release number of selected component (if applicable):
Problem reproduced in 2.4.9-e.3.  Code is same in 2.4.9-e.16.


How reproducible:
Difficult.

Steps to Reproduce:
1. Heavily stress SMP system until panic received. (Still diagnosing)
2.
3.
    
Actual results:
Endless loop in panic.c when doing CHECK_EMERGENCY_SYNC


Expected results:
no hang.

Additional info:
Using an ITP we obtained the following code of a locked up processor:

0x0148:0010:00000000c011b989   e872cffeff               call $-x00013089  ;a=c01
08900 
0x0148:0010:00000000c011b98e   a1f4d53dc0               mov eax, dword ptr 
0xc03dd5f4] 
0x0148:0010:00000000c011b993   8db600000000             lea esi, dword ptr 
[esi+ 0x00000000] 
0x0148:0010:00000000c011b999   8dbc2700000000           lea edi, dword ptr 
[edi+ 0x00000000] 
0x0148:0010:00000000c011b9a0   85c0                     test eax, eax 
0x0148:0010:00000000c011b9a2   74fc                     jz $-0x02  ;a=c011b9a0 
0x0148:0010:00000000c011b9a4   e8878d0700               call 
+0x00078d8c  ;a=c0194730 


Please note that the "test eax, eax" followed by the jz is not going anywhere.  
eax was previously loaded with the value of emergency_sync_scheduled.  This 
appears at the end of panic() in linux/kernel/panic.c in the macro 
CHECK_EMERGENCY_SYNC which is defined in linux/include/linux/sysrq.h.  Looks 
like emergency_sync_scheduled at first glance should be marked volatile.

Comment 1 Robert Hentosh 2003-04-14 16:39:12 UTC
Hmmm... I just checked 2.4.20 from kernel.org and sysrq.h has been patched to 
define emergency_sync_scheduled as "volatile int".

Comment 2 Arjan van de Ven 2003-04-14 18:03:56 UTC
ehm
panic() isn't actually supposed to return....... it's like a panic 

Comment 3 Robert Hentosh 2003-04-14 19:41:12 UTC
That is understandable.  But it would be nice if it was coded that way instead 
of depending on a compiler optimization.

This code looks like it is waiting for the variable emergency_sync_scheduled to 
be non-zero and then call do_emergency_sync.  Coded as is it will never call 
do_emergency_sync.  yes?