Description of problem: Kernel BUG at panic:74, invalid operand: 0000 [1] SMP Version-Release number of selected component (if applicable): System is RHEL AS4/U2, Kernel is custom, built from kernel-2.6.9-22.EL.src.rpm BUT with TWO modifications: 1) Increase CONFIG_NR_CPUS=8 to 16 in kernel-2.6.9-x86_64-smp.config 2) Workaround for reboot problem; hence this patch is applied: (http://www.iwill.net/product_imgs/90/RHEL4_Update1_Dual_core.PDF) diff -Naur linux-2.6.9/arch/x86_64/kernel/reboot.c linux-2.6.9-mjw/arch/x86_64/kernel/reboot.c --- linux-2.6.9/arch/x86_64/kernel/reboot.c 2005-10-09 23:27:36.000000000 +0100 +++ linux-2.6.9-mjw/arch/x86_64/kernel/reboot.c 2005-10-10 00:03:43.000000000 +0100 @@ -113,7 +113,7 @@ smp_stop_cpu(); /* AP calling this. Just halt */ - if (cpuid != boot_cpu_id) { + if (cpuid != x86_apicid_to_cpu(boot_cpu_id)) { for (;;) asm("hlt"); } Hardware Environment: 8CPU Dual Core Opteron; http://www.iwill.net/product_2.asp?p_id=90&sp=Y How reproducible: Very; repeat the steps below Steps to Reproduce: 1. Run piece of chemistry software 2. Wait a few minutes 3. Actual results: This is captured using netdump: C and OSHP methods do not exist usbhid: probe of 2-3:1.0 failed with error -5 ip_tables: (C) 2000-2002 Netfilter core team ip_tables: (C) 2000-2002 Netfilter core team CPU 30: Machine Check Exception: 4 Bank 4: b200000000070f0f TSC 53843d7e254 CPU 22: Machine Check Exception: 4 Bank 4: b200000000070f0f TSC 53843d821fe CPU 16: Machine Check Exception: 4 Bank 4: b200000000070f0f TSC 53843d82f41 CPU 24: Machine Check Exception: 4 Bank 4: b200000000070f0f TSC 53843d8482e CPU 28: Machine Check Exception: 4 Bank 4: b200000000070f0f TSC 53843d84b3e CPU 18: Machine Check Exception: 4 Bank 4: b200000000070f0f TSC 53843d864b7 CPU 20: Machine Check Exception: 4 Bank 4: b200000000070f0f TSC 53843d871f0 CPU 26: Machine Check Exception: 4 Bank 4: b200000000070f0f TSC 53843d85b5c Kernel panic - not syncing: Machine check ----------- [cut here ] --------- [please bite here ] --------- Kernel BUG at panic:74 invalid operand: 0000 [1] SMP CPU 14 Modules linked in: md5 ipv6 netconsole netdump autofs4 sunrpc ds yenta_socket pcmcia_core dm_mirror dm_mod button battery ac joydev ohci_hcd hw_random e100 mii e1000 floppy ext3 jbd 3w_xxxx sd_mod scsi_mod Pid: 3058, comm: l502.exe Tainted: G M 2.6.9-22mjw.EL.rootsmp RIP: 0010:[<ffffffff8013691a>] <ffffffff8013691a>{panic+211} RSP: 0000:000001023ff60d18 EFLAGS: 00010086 RAX: 000000000000002d RBX: ffffffff80317ca1 RCX: 0000000000000046 RDX: 0000000000006c99 RSI: 0000000000000046 RDI: ffffffff803d7f20 RBP: 0000000000000900 R08: 000000000000000d R09: ffffffff80317ca1 R10: 0000000002000000 R11: 0000000000000061 R12: 00000000ffffffff R13: ffffffff803cf1a0 R14: 0000053843d7ceca R15: ffffffff80317ca1 FS: 0000000040812960(005b) GS:ffffffff804d5c80(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000002aeff6a0b8 CR3: 000000033ff54000 CR4: 00000000000006e0 Process l502.exe (pid: 3058, threadinfo 000001023bf8e000, task 000001013f1b9030) Stack: 0000003000000008 000001023ff60df8 000001023ff60d38 0000053843d85b5c 0000000000006c58 0000000000000046 0000000000006c6c 0000000000000046 000000000000000d 0000000000000000 Call Trace:<ffffffff801176e4>{print_mce+136} <ffffffff801177bc>{mce_available+0} <ffffffff80117b0f>{do_machine_check+825} <ffffffff801111db>{machine_check+127} Code: 0f 0b 31 72 31 80 ff ff ff ff 4a 00 31 ff e8 c3 c4 fe ff e8 RIP <ffffffff8013691a>{panic+211} RSP <000001023ff60d18> Expected results: Program does not crash machine Additional info: MCE error translated using mcelog [root@f01 kernel-2.6.9]# cat /tmp/mce.txt | mcelog --ascii CPU 28 4 northbridge TSC 10311caf801c4 Northbridge Watchdog error bit57 = processor context corrupt bit61 = error uncorrected bus error 'generic participation, request timed out generic error mem transaction generic access, level generic' STATUS b200000000070f0f MCGSTATUS 4
We do not support custom kernels. As of Update 3, RHEL4 now provides a "largesmp" kernel that supports more than 8 processors. Please try the RHEL4 U3 beta and report whether this addresses your problem. I am closing this issue as CANTFIX because it is reported against an unsupported kernel. If you still have problems with the latest kernel, please file a separate support request.