419881 – problems with e1000 when high load

Bug 419881 - problems with e1000 when high load

Summary: problems with e1000 when high load

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	8
Hardware:	All
OS:	Linux
Priority:	low
Severity:	low
Target Milestone:	---
Assignee:	Kernel Maintainer List
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2007-12-11 15:26 UTC by Thorsten Leemhuis
Modified:	2008-06-27 17:44 UTC (History)
CC List:	0 users
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2008-06-27 17:44:12 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
lspci (3.50 KB, text/plain) 2007-12-11 15:27 UTC, Thorsten Leemhuis	no flags	Details
View All

Description Thorsten Leemhuis 2007-12-11 15:26:02 UTC

Description of problem:
I have a small non-critical server that is used as test-host for network
bandwith measurements tests using iperf. Those test are done two or three times
a week and now and then the machine crashes hard during those tests. This time
it didn't crash hard but instead the kernel reported a invalid opcode:

invalid opcode: 0000 [#1] SMP 
e1000: eth2: e1000_watchdog: NIC Link is Down
e1000: eth2: e1000_watchdog: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
invalid opcode: 0000 [#1] SMP 
Modules linked in: nfsd exportfs auth_rpcgss nfs lockd nfs_acl sunrpc ipv6
cpufreq_ondemand dm_mirror dm_mod button serio_raw k8temp hwmon pcspkr
i2c_nforce2 sata_nv i2c_core tg3 forcedeth e1000 floppy sg sr_mod cdrom ata_piix
pata_amd ata_generic libata sd_mod scsi_mod ext3 jbd mbcache uhci_hcd ohci_hcd
ehci_hcd
CPU:    1
EIP:    0060:[<c061eb6e>]    Not tainted VLI
EFLAGS: 00210082   (2.6.23.1-49.fc8 #1)
EIP is at _spin_lock+0x0/0xf
eax: c201a180   ebx: c201a180   ecx: 000002a1   edx: 000000d8
esi: 00000000   edi: b759ab90   ebp: f0e29fb0   esp: f0e29fa8
ds: 007b   es: 007b   fs: 00d8  gs: 0033  ss: 0068
Process iperf (pid: 14159, ti=f0e29000 task=cc905840 task.ti=f0e29000)
Stack: c0429a6f 085b2e38 f0e29000 c040518a 085b2e38 000002a1 000002a0 00000000 
       b759ab90 b759a398 0000009e 0000007b 0000007b 00000000 0000009e 0012d402 
       00000073 00200246 b759a378 0000007b ffffffff ffffffff 
Call Trace:
 [<c0429a6f>] sys_sched_yield+0x23/0x46
 [<c040518a>] syscall_call+0x7/0xb
 =======================
Code: 79 05 f0 ff 00 30 d2 89 d0 c3 89 c2 f0 81 28 00 00 00 01 0f 94 c0 84 c0 b9
01 00 00 00 75 09 f0 81 02 00 00 00 01 30 c9 89 c8 c3 <f0> fe 08 79 09 f3 90 80
38 00 7e f9 eb f2 c3 f0 81 28 00 00 00 
EIP: [<c061eb6e>] _spin_lock+0x0/0xf SS:ESP 0068:f0e29fa8

Hardware:
05:00.0 Ethernet controller [0200]: Intel Corporation 82571EB Gigabit Ethernet
Controller [8086:105e] (rev 06)
05:00.1 Ethernet controller [0200]: Intel Corporation 82571EB Gigabit Ethernet
Controller [8086:105e] (rev 06)

Version-Release number of selected component (if applicable):
$ rpm -q kernel
kernel-2.6.23.1-49.fc8
(happened with earlier as well)

Additional info:
Is that worth further investigations? Or is faulty hardware likely the root for
this?

Comment 1 Thorsten Leemhuis 2007-12-11 15:27:09 UTC

Created attachment 284201 [details]
lspci

lspci info attached

If I should report this upstream or test 2.6.24-rc please let me know

Comment 2 Chuck Ebbert 2007-12-11 22:43:40 UTC

This looks like faulty hardware. That 'decb' instruction is legal:

   0:   f0 fe 08                lock decb (%eax)
   3:   79 09                   jns    0xe
   5:   f3 90                   pause
   7:   80 38 00                cmpb   $0x0,(%eax)
   a:   7e f9                   jle    0x5
   c:   eb f2                   jmp    0x0
   e:   c3                      ret

Does memtest86 work okay on this machine?

Comment 3 Thorsten Leemhuis 2007-12-12 06:35:07 UTC

(In reply to comment #2)
> This looks like faulty hardware.
:-/

[...]
> Does memtest86 work okay on this machine?
Yes.

Comment 4 Thorsten Leemhuis 2007-12-13 06:50:10 UTC

This time I got a oops -- but note that I didn't restat the machine after the
"invalid opcode" in the initail report of this bug (restating the machine now).

BUG: unable to handle kernel paging request at virtual address 1febffe4
printing eip: c05e3bba *pde = 00000000 
Oops: 0002 [#2] SMP 
Modules linked in: e1000 e1000e nfsd exportfs auth_rpcgss nfs lockd nfs_acl
sunrpc ipv6 cpufreq_ondemand dm_mirror dm_mod button serio_raw k8temp hwmon
pcspkr i2c_nforce2 sata_nv i2c_core tg3 forcedeth floppy sg sr_mod cdrom
ata_piix pata_amd ata_generic libata sd_mod scsi_mod ext3 jbd mbcache uhci_hcd
ohci_hcd ehci_hcd
CPU:    0
EIP:    0060:[<c05e3bba>]    Tainted: G      D VLI
EFLAGS: 00210256   (2.6.23.1-49.fc8 #1)
EIP is at tcp_recvmsg+0x57d/0x9ef
eax: 00000000   ebx: 00000000   ecx: 00000000   edx: d6090dfc
esi: e6441b40   edi: f6d99304   ebp: 000005b4   esp: d8e5fd40
ds: 007b   es: 007b   fs: 00d8  gs: 0033  ss: 0068
Process iperf (pid: 21183, ti=d8e5f000 task=d589f230 task.ti=d8e5f000)
Stack: 000005b4 0003ac4a 2cc81e2e 0003ac4a d8e5ff20 00000000 00000000 f6d995f8 
       00000001 00000000 00000000 00000000 c0707080 e6d92000 7fffffff c0707080 
       00000000 c071d940 d8e5ff20 d8e5ff20 c05b3a2d 00002000 00000000 00000000 
Call Trace:
 [<c05b3a2d>] sock_common_recvmsg+0x3e/0x54
 [<c05b2282>] sock_recvmsg+0xec/0x107
 [<c043d499>] autoremove_wake_function+0x0/0x35
 [<c0425d63>] enqueue_entity+0x2dd/0x307
 [<c0425dc8>] task_tick_fair+0x3b/0x60
 [<c0429ef9>] scheduler_tick+0x1e3/0x274
 [<c05b3182>] sys_recvfrom+0xd8/0x12d
 [<c041cb3e>] lapic_next_event+0xc/0x10
 [<c044372e>] clockevents_program_event+0xb5/0xbc
 [<c044448e>] tick_program_event+0x33/0x52
 [<c04404f2>] hrtimer_interrupt+0x192/0x1bc
 [<c0444708>] tick_sched_timer+0x0/0xbb
 [<c0405c2c>] apic_timer_interrupt+0x28/0x30
 [<c05b320e>] sys_recv+0x37/0x3b
 [<c05b36df>] sys_socketcall+0x19c/0x261
 [<c04f6cf8>] copy_to_user+0x34/0x48
 [<c040518a>] syscall_call+0x7/0xb
 =======================
Code: 89 f2 89 0c 24 8b 4c 24 2c 89 6c 24 04 89 44 24 08 89 d8 e8 3d 3b fe ff 85
c0 89 87 40 03 00 00 79 0e c7 04 24 03 d5 6d c0 e8 42 <a3> e4 ff eb 1f 8b 44 24
2c 01 e8 3b 46 50 75 2c eb 22 8b 54 24 
EIP: [<c05e3bba>] tcp_recvmsg+0x57d/0x9ef SS:ESP 0068:d8e5fd40

Comment 5 Thorsten Leemhuis 2008-06-27 17:44:12 UTC

Some cleanup:

Problematic hardware thrown away in between, so closing this

Note You need to log in before you can comment on or make changes to this bug.