Description of problem: I have a small non-critical server that is used as test-host for network bandwith measurements tests using iperf. Those test are done two or three times a week and now and then the machine crashes hard during those tests. This time it didn't crash hard but instead the kernel reported a invalid opcode: invalid opcode: 0000 [#1] SMP e1000: eth2: e1000_watchdog: NIC Link is Down e1000: eth2: e1000_watchdog: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX invalid opcode: 0000 [#1] SMP Modules linked in: nfsd exportfs auth_rpcgss nfs lockd nfs_acl sunrpc ipv6 cpufreq_ondemand dm_mirror dm_mod button serio_raw k8temp hwmon pcspkr i2c_nforce2 sata_nv i2c_core tg3 forcedeth e1000 floppy sg sr_mod cdrom ata_piix pata_amd ata_generic libata sd_mod scsi_mod ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd CPU: 1 EIP: 0060:[<c061eb6e>] Not tainted VLI EFLAGS: 00210082 (2.6.23.1-49.fc8 #1) EIP is at _spin_lock+0x0/0xf eax: c201a180 ebx: c201a180 ecx: 000002a1 edx: 000000d8 esi: 00000000 edi: b759ab90 ebp: f0e29fb0 esp: f0e29fa8 ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 0068 Process iperf (pid: 14159, ti=f0e29000 task=cc905840 task.ti=f0e29000) Stack: c0429a6f 085b2e38 f0e29000 c040518a 085b2e38 000002a1 000002a0 00000000 b759ab90 b759a398 0000009e 0000007b 0000007b 00000000 0000009e 0012d402 00000073 00200246 b759a378 0000007b ffffffff ffffffff Call Trace: [<c0429a6f>] sys_sched_yield+0x23/0x46 [<c040518a>] syscall_call+0x7/0xb ======================= Code: 79 05 f0 ff 00 30 d2 89 d0 c3 89 c2 f0 81 28 00 00 00 01 0f 94 c0 84 c0 b9 01 00 00 00 75 09 f0 81 02 00 00 00 01 30 c9 89 c8 c3 <f0> fe 08 79 09 f3 90 80 38 00 7e f9 eb f2 c3 f0 81 28 00 00 00 EIP: [<c061eb6e>] _spin_lock+0x0/0xf SS:ESP 0068:f0e29fa8 Hardware: 05:00.0 Ethernet controller [0200]: Intel Corporation 82571EB Gigabit Ethernet Controller [8086:105e] (rev 06) 05:00.1 Ethernet controller [0200]: Intel Corporation 82571EB Gigabit Ethernet Controller [8086:105e] (rev 06) Version-Release number of selected component (if applicable): $ rpm -q kernel kernel-2.6.23.1-49.fc8 (happened with earlier as well) Additional info: Is that worth further investigations? Or is faulty hardware likely the root for this?
Created attachment 284201 [details] lspci lspci info attached If I should report this upstream or test 2.6.24-rc please let me know
This looks like faulty hardware. That 'decb' instruction is legal: 0: f0 fe 08 lock decb (%eax) 3: 79 09 jns 0xe 5: f3 90 pause 7: 80 38 00 cmpb $0x0,(%eax) a: 7e f9 jle 0x5 c: eb f2 jmp 0x0 e: c3 ret Does memtest86 work okay on this machine?
(In reply to comment #2) > This looks like faulty hardware. :-/ [...] > Does memtest86 work okay on this machine? Yes.
This time I got a oops -- but note that I didn't restat the machine after the "invalid opcode" in the initail report of this bug (restating the machine now). BUG: unable to handle kernel paging request at virtual address 1febffe4 printing eip: c05e3bba *pde = 00000000 Oops: 0002 [#2] SMP Modules linked in: e1000 e1000e nfsd exportfs auth_rpcgss nfs lockd nfs_acl sunrpc ipv6 cpufreq_ondemand dm_mirror dm_mod button serio_raw k8temp hwmon pcspkr i2c_nforce2 sata_nv i2c_core tg3 forcedeth floppy sg sr_mod cdrom ata_piix pata_amd ata_generic libata sd_mod scsi_mod ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd CPU: 0 EIP: 0060:[<c05e3bba>] Tainted: G D VLI EFLAGS: 00210256 (2.6.23.1-49.fc8 #1) EIP is at tcp_recvmsg+0x57d/0x9ef eax: 00000000 ebx: 00000000 ecx: 00000000 edx: d6090dfc esi: e6441b40 edi: f6d99304 ebp: 000005b4 esp: d8e5fd40 ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 0068 Process iperf (pid: 21183, ti=d8e5f000 task=d589f230 task.ti=d8e5f000) Stack: 000005b4 0003ac4a 2cc81e2e 0003ac4a d8e5ff20 00000000 00000000 f6d995f8 00000001 00000000 00000000 00000000 c0707080 e6d92000 7fffffff c0707080 00000000 c071d940 d8e5ff20 d8e5ff20 c05b3a2d 00002000 00000000 00000000 Call Trace: [<c05b3a2d>] sock_common_recvmsg+0x3e/0x54 [<c05b2282>] sock_recvmsg+0xec/0x107 [<c043d499>] autoremove_wake_function+0x0/0x35 [<c0425d63>] enqueue_entity+0x2dd/0x307 [<c0425dc8>] task_tick_fair+0x3b/0x60 [<c0429ef9>] scheduler_tick+0x1e3/0x274 [<c05b3182>] sys_recvfrom+0xd8/0x12d [<c041cb3e>] lapic_next_event+0xc/0x10 [<c044372e>] clockevents_program_event+0xb5/0xbc [<c044448e>] tick_program_event+0x33/0x52 [<c04404f2>] hrtimer_interrupt+0x192/0x1bc [<c0444708>] tick_sched_timer+0x0/0xbb [<c0405c2c>] apic_timer_interrupt+0x28/0x30 [<c05b320e>] sys_recv+0x37/0x3b [<c05b36df>] sys_socketcall+0x19c/0x261 [<c04f6cf8>] copy_to_user+0x34/0x48 [<c040518a>] syscall_call+0x7/0xb ======================= Code: 89 f2 89 0c 24 8b 4c 24 2c 89 6c 24 04 89 44 24 08 89 d8 e8 3d 3b fe ff 85 c0 89 87 40 03 00 00 79 0e c7 04 24 03 d5 6d c0 e8 42 <a3> e4 ff eb 1f 8b 44 24 2c 01 e8 3b 46 50 75 2c eb 22 8b 54 24 EIP: [<c05e3bba>] tcp_recvmsg+0x57d/0x9ef SS:ESP 0068:d8e5fd40
Some cleanup: Problematic hardware thrown away in between, so closing this