Bug 152187

Summary: Kernel panic on SMP x86_64 (Sun V20z/V40z)
Product: [Fedora] Fedora Reporter: Philippe Rigault <prigault>
Component: kernelAssignee: Dave Jones <davej>
Status: CLOSED DUPLICATE QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: medium    
Version: 3CC: pfrields, wtogami
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-03-26 21:36:39 EST Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
Attachments:
Description Flags
Kernel config file none

Description Philippe Rigault 2005-03-25 15:40:55 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (compatible; Konqueror/3.4; Linux; en_US, fr) KHTML/3.4.0 (like Gecko)

Description of problem:
I have experienced kernel panics on SMP x86_64 boxes, more precisely on:      
- Sun V20z (dual Opteron 248)      
- Sun V40z (quad Opteron 848)     
      
I have been running the same kernel on these boxes for two months, and the     
frequency pattern of these panics deserves mention:     
- on the V20z, it happened on one box only in the last two months (I have    
several such boxes). But when it happened (about a month ago), it occured    
twice in the same day.      
- on the V40z, it did not happen for two months, and then twice today.     
            
The exact kernel these machines use is kernel-2.6.10-1.742_PRsmp (custom      
built), which is built like this:     
 - install kernel-2.6.10-1.741_FC3.src.rpm        
 - slightly modified kernel-x86_64-smp.config file (such as 
CONFIG_SCSI_MULTI_LUN=y, ext3 and xfs built in the kernel, removed a bunch of 
modules for hardware never attached to the box). Exact config to be attached 
later.     
 - bump vendor tag (741_FC3 -> 742_PR) in spec file      
 - rpmbuild with optflags: x86_64 -O2 -g -march=opteron        
   
All I have for oopses is the console dump which I manually copied, so it might 
not be very informative. 
The oops files for both machines look very similar, so I think this is the 
same bug for both cases. 
  
================== Oops on V20z ==================   
 # ksymoops -k /proc/kallsyms -m /boot/System.map-2.6.10-1.742_PRsmp 
-l /proc/modules V20z_kernel_crash 
ksymoops 2.4.9 on x86_64 2.6.10-1.742_PRsmp.  Options used 
     -V (default) 
     -k /proc/kallsyms (specified) 
     -l /proc/modules (specified) 
     -o /lib/modules/2.6.10-1.742_PRsmp/ (default) 
     -m /boot/System.map-2.6.10-1.742_PRsmp (specified) 
 
Warning (read_ksyms): no kernel symbols in ksyms, is /proc/kallsyms a valid 
ksyms file? 
No modules in ksyms, skipping objects 
No ksyms, skipping lsmod 
RBP: ffffffff8054cf68 R08: 0000000000093b4b R09: 0000000000000000 
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 
R13: ffffffff80486ba0 R14: 000371a053e3b11b R15: ffffffff803c826c 
FS:  0000002a95573f60(0000) GS:ffffffff805a5f00(0000) knlGS:00000000f7fe46c0 
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b 
CR2: 00000000000000ff CR3: 0000000000101000 CR4: 00000000000006e0 
Stack: ffffffff8011a6c0 ffffffff803c826c ffffffff8010ecf1 ffffffff8054cf68  
<EOI> 
       ffffffff803c826c 000371a053e3b11b ffffffff80486ba0 0000000000000000 
       00000000ffffffff ffffffff803c826c 
Call Trace:<IRQ> <ffffffff8011a6c0>{smp_call_function_interrupt+64} 
       <ffffffff8010ecf1>{call_function_interrupt+133}  <EOI> 
<ffffffff8011a5ff{smp_stop_cpu+31} 
       <ffffffff80134d7b>{panic+203} <ffffffff80115868>{print_mce+136} 
       <ffffffff80115956>{mce_panic+166} 
<ffffffff80115dce>{do_machine_check+1102} 
       <ffffffff8010f607>{machine_check+127} 
<ffffffff8010c760>{default_idle+0} 
       <ffffffff8010c780>{default_idle+32} 
Code:  Bad RIP value. 
CR2: 00000000000000ff 
 <0>Kernel panic - not syncing: Aiee, killing interrupt handler! 
Warning (Oops_read): Code line not seen, dumping what data is available 
 
 
>>RBP; ffffffff8054cf68 <boot_exception_stacks+4c68/5000> 
>>R13; ffffffff80486ba0 <mcheck_work+0/70> 
>>R15; ffffffff803c826c <__func__.1+16fc/9d970> 
 
Trace; ffffffff8010ecf1 <call_function_interrupt+85/8c> 
Trace; ffffffff80134d7b <panic+cb/230> 
Trace; ffffffff80115956 <mce_panic+a6/b0> 
Trace; ffffffff8010f607 <machine_check+7f/84> 
Trace; ffffffff8010c780 <default_idle+20/30> 
 
 
2 warnings issued.  Results may not be reliable. 
 
================== Oops on V40z ==================   
# ksymoops -k /proc/kallsyms -m /boot/System.map-2.6.10-1.742_PRsmp 
-l /proc/modules V40z_kernel_crash 
ksymoops 2.4.9 on x86_64 2.6.10-1.742_PRsmp.  Options used 
     -V (default) 
     -k /proc/kallsyms (specified) 
     -l /proc/modules (specified) 
     -o /lib/modules/2.6.10-1.742_PRsmp/ (default) 
     -m /boot/System.map-2.6.10-1.742_PRsmp (specified) 
 
Warning (read_ksyms): no kernel symbols in ksyms, is /proc/kallsyms a valid 
ksyms file? 
No modules in ksyms, skipping objects 
No ksyms, skipping lsmod 
RBP: ffffffff8054cf68 R08: 00000000000927c0 R09: 0000000000000000 
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 
R13: ffffffff80486ba0 R14: 000012d1552040b8 R15: ffffffff803c826c 
FS:  0000002a95570ea0(0000) GS:ffffffff805a5f00(0000) knlGS:00000000eb9cfbb0 
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b 
CR2: 00000000000000ff CR3: 0000000000101000 CR4: 00000000000006e0 
Stack: ffffffff8011a6c0 ffffffff803c826c ffffffff8010ecf1 ffffffff8054cf68  
<EOI> 
       ffffffff803c826c 000012d1552040b8 ffffffff80486ba0 0000000000000000 
       00000000ffffffff ffffffff803c826c 
Call Trace:<IRQ> <ffffffff8011a6c0>{smp_call_function_interrupt+64} 
       <ffffffff8010ecf1>{call_function_interrupt+133}  <EOI> 
<ffffffff8011a5ff{smp_stop_cpu+31} 
       <ffffffff80134d7b>{panic+203} <ffffffff80115868>{print_mce+136} 
       <ffffffff80115956>{mce_panic+166} 
<ffffffff80115dce>{do_machine_check+1102} 
       <ffffffff8010f607>{machine_check+127} 
<ffffffff8010c760>{default_idle+0} 
       <ffffffff8010c780>{default_idle+32} 
Code:  Bad RIP value. 
CR2: 00000000000000ff 
 <0>Kernel panic - not syncing: Aiee, killing interrupt handler! 
Warning (Oops_read): Code line not seen, dumping what data is available 
 
 
>>RBP; ffffffff8054cf68 <boot_exception_stacks+4c68/5000> 
>>R13; ffffffff80486ba0 <mcheck_work+0/70> 
>>R15; ffffffff803c826c <__func__.1+16fc/9d970> 
 
Trace; ffffffff8010ecf1 <call_function_interrupt+85/8c> 
Trace; ffffffff80134d7b <panic+cb/230> 
Trace; ffffffff80115956 <mce_panic+a6/b0> 
Trace; ffffffff8010f607 <machine_check+7f/84> 
Trace; ffffffff8010c780 <default_idle+20/30> 
 
 
2 warnings issued.  Results may not be reliable. 
 
   
    

Version-Release number of selected component (if applicable):
kernel-2.6.10-1.741 

How reproducible:
Sometimes

Steps to Reproduce:
1.
2.
3.
  

Additional info:
Comment 1 Philippe Rigault 2005-03-25 15:43:52 EST
Created attachment 112352 [details]
Kernel config file
Comment 2 Dave Jones 2005-03-26 21:36:39 EST

*** This bug has been marked as a duplicate of 126342 ***