From Bugzilla Helper: User-Agent: Mozilla/5.0 (compatible; Konqueror/3.4; Linux; en_US, fr) KHTML/3.4.0 (like Gecko) Description of problem: I have experienced kernel panics on SMP x86_64 boxes, more precisely on: - Sun V20z (dual Opteron 248) - Sun V40z (quad Opteron 848) I have been running the same kernel on these boxes for two months, and the frequency pattern of these panics deserves mention: - on the V20z, it happened on one box only in the last two months (I have several such boxes). But when it happened (about a month ago), it occured twice in the same day. - on the V40z, it did not happen for two months, and then twice today. The exact kernel these machines use is kernel-2.6.10-1.742_PRsmp (custom built), which is built like this: - install kernel-2.6.10-1.741_FC3.src.rpm - slightly modified kernel-x86_64-smp.config file (such as CONFIG_SCSI_MULTI_LUN=y, ext3 and xfs built in the kernel, removed a bunch of modules for hardware never attached to the box). Exact config to be attached later. - bump vendor tag (741_FC3 -> 742_PR) in spec file - rpmbuild with optflags: x86_64 -O2 -g -march=opteron All I have for oopses is the console dump which I manually copied, so it might not be very informative. The oops files for both machines look very similar, so I think this is the same bug for both cases. ================== Oops on V20z ================== # ksymoops -k /proc/kallsyms -m /boot/System.map-2.6.10-1.742_PRsmp -l /proc/modules V20z_kernel_crash ksymoops 2.4.9 on x86_64 2.6.10-1.742_PRsmp. Options used -V (default) -k /proc/kallsyms (specified) -l /proc/modules (specified) -o /lib/modules/2.6.10-1.742_PRsmp/ (default) -m /boot/System.map-2.6.10-1.742_PRsmp (specified) Warning (read_ksyms): no kernel symbols in ksyms, is /proc/kallsyms a valid ksyms file? No modules in ksyms, skipping objects No ksyms, skipping lsmod RBP: ffffffff8054cf68 R08: 0000000000093b4b R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffffffff80486ba0 R14: 000371a053e3b11b R15: ffffffff803c826c FS: 0000002a95573f60(0000) GS:ffffffff805a5f00(0000) knlGS:00000000f7fe46c0 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 00000000000000ff CR3: 0000000000101000 CR4: 00000000000006e0 Stack: ffffffff8011a6c0 ffffffff803c826c ffffffff8010ecf1 ffffffff8054cf68 <EOI> ffffffff803c826c 000371a053e3b11b ffffffff80486ba0 0000000000000000 00000000ffffffff ffffffff803c826c Call Trace:<IRQ> <ffffffff8011a6c0>{smp_call_function_interrupt+64} <ffffffff8010ecf1>{call_function_interrupt+133} <EOI> <ffffffff8011a5ff{smp_stop_cpu+31} <ffffffff80134d7b>{panic+203} <ffffffff80115868>{print_mce+136} <ffffffff80115956>{mce_panic+166} <ffffffff80115dce>{do_machine_check+1102} <ffffffff8010f607>{machine_check+127} <ffffffff8010c760>{default_idle+0} <ffffffff8010c780>{default_idle+32} Code: Bad RIP value. CR2: 00000000000000ff <0>Kernel panic - not syncing: Aiee, killing interrupt handler! Warning (Oops_read): Code line not seen, dumping what data is available >>RBP; ffffffff8054cf68 <boot_exception_stacks+4c68/5000> >>R13; ffffffff80486ba0 <mcheck_work+0/70> >>R15; ffffffff803c826c <__func__.1+16fc/9d970> Trace; ffffffff8010ecf1 <call_function_interrupt+85/8c> Trace; ffffffff80134d7b <panic+cb/230> Trace; ffffffff80115956 <mce_panic+a6/b0> Trace; ffffffff8010f607 <machine_check+7f/84> Trace; ffffffff8010c780 <default_idle+20/30> 2 warnings issued. Results may not be reliable. ================== Oops on V40z ================== # ksymoops -k /proc/kallsyms -m /boot/System.map-2.6.10-1.742_PRsmp -l /proc/modules V40z_kernel_crash ksymoops 2.4.9 on x86_64 2.6.10-1.742_PRsmp. Options used -V (default) -k /proc/kallsyms (specified) -l /proc/modules (specified) -o /lib/modules/2.6.10-1.742_PRsmp/ (default) -m /boot/System.map-2.6.10-1.742_PRsmp (specified) Warning (read_ksyms): no kernel symbols in ksyms, is /proc/kallsyms a valid ksyms file? No modules in ksyms, skipping objects No ksyms, skipping lsmod RBP: ffffffff8054cf68 R08: 00000000000927c0 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffffffff80486ba0 R14: 000012d1552040b8 R15: ffffffff803c826c FS: 0000002a95570ea0(0000) GS:ffffffff805a5f00(0000) knlGS:00000000eb9cfbb0 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 00000000000000ff CR3: 0000000000101000 CR4: 00000000000006e0 Stack: ffffffff8011a6c0 ffffffff803c826c ffffffff8010ecf1 ffffffff8054cf68 <EOI> ffffffff803c826c 000012d1552040b8 ffffffff80486ba0 0000000000000000 00000000ffffffff ffffffff803c826c Call Trace:<IRQ> <ffffffff8011a6c0>{smp_call_function_interrupt+64} <ffffffff8010ecf1>{call_function_interrupt+133} <EOI> <ffffffff8011a5ff{smp_stop_cpu+31} <ffffffff80134d7b>{panic+203} <ffffffff80115868>{print_mce+136} <ffffffff80115956>{mce_panic+166} <ffffffff80115dce>{do_machine_check+1102} <ffffffff8010f607>{machine_check+127} <ffffffff8010c760>{default_idle+0} <ffffffff8010c780>{default_idle+32} Code: Bad RIP value. CR2: 00000000000000ff <0>Kernel panic - not syncing: Aiee, killing interrupt handler! Warning (Oops_read): Code line not seen, dumping what data is available >>RBP; ffffffff8054cf68 <boot_exception_stacks+4c68/5000> >>R13; ffffffff80486ba0 <mcheck_work+0/70> >>R15; ffffffff803c826c <__func__.1+16fc/9d970> Trace; ffffffff8010ecf1 <call_function_interrupt+85/8c> Trace; ffffffff80134d7b <panic+cb/230> Trace; ffffffff80115956 <mce_panic+a6/b0> Trace; ffffffff8010f607 <machine_check+7f/84> Trace; ffffffff8010c780 <default_idle+20/30> 2 warnings issued. Results may not be reliable. Version-Release number of selected component (if applicable): kernel-2.6.10-1.741 How reproducible: Sometimes Steps to Reproduce: 1. 2. 3. Additional info:
Created attachment 112352 [details] Kernel config file
*** This bug has been marked as a duplicate of 126342 ***