Bug 46126 - kernel 2.4.3-12's 8139too driver hangs
Summary: kernel 2.4.3-12's 8139too driver hangs
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Red Hat Linux
Classification: Retired
Component: kernel
Version: 7.1
Hardware: i386
OS: Linux
medium
low
Target Milestone: ---
Assignee: Arjan van de Ven
QA Contact: Brock Organ
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2001-06-27 00:12 UTC by Alexandre Oliva
Modified: 2007-04-18 16:34 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2002-04-29 18:28:15 UTC
Embargoed:


Attachments (Terms of Use)
Extracted from /var/log/messages. For the first time, the kernel oops made it there. (72.33 KB, text/plain)
2002-04-10 23:08 UTC, Alexandre Oliva
no flags Details
2.4.9-31 crashes too, even after replacement of memory and network card (for a similar model) (4.20 KB, text/plain)
2002-04-29 18:28 UTC, Alexandre Oliva
no flags Details

Description Alexandre Oliva 2001-06-27 00:12:39 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux 2.4.3-12 i686; en-US; rv:0.9.1)
Gecko/20010608

Description of problem:
Using the 8139too.o network card driver freezes on some machines. 

Problem identified with the following network card (from dmesg):
SMC1211TX EZCard 10/100 (RealTek RTL8139)
Identified 8139 chip type 'RTL-8139B'

Some other machines that also used to use 8139too don't seem to hang.


How reproducible:
Always

Steps to Reproduce:
1.Upgrade to kernel-2.4.3-12 a previously-working Red Hat Linux 7.1 NFS server.
2.Pull your hair off your head when you see it hang as soon as it completes
fsck.

	

Actual Results:  The machine hangs without any kernel oopses, log or
console messages, it just goes dead.

Expected Results:  I'd hope it would just keep going...

Additional info:

I found out that the kernel module that used to be called 8139too in
2.4.2-2 is still available in 2.4.3-12, but renamed to 8139too_old. 
Modifying the alias in /etc/modules.conf so that the old module was used
was enough to get the NFS servers back into a good shape.

Comment 1 Jacques Supcik 2001-07-24 11:43:07 UTC
I had the same problem. I was able to solve it by replacing

   alias eth0 8139too
by alias eth0 8139too_old

in /etc/modukes.conf.

And now it works like before.

Comment 2 Arjan van de Ven 2001-07-30 14:08:04 UTC
It seems the "new and improved" 8139too driver doesn't work on some subset
of 8139 cards. I'm sooo glad I decided to also ship the old one ;)

Comment 3 Alexandre Oliva 2001-11-28 20:14:56 UTC
I have recently experienced kernel OOPSes using the newer 2.4.9-12 (!= 2.4.3-12,
as in the report subject), on a machine with an rtl8139 card and a large number
of ext3 filesystems mounted atop RAID0 partitions.  It's a heavily-used NFS
server.  In one of the crashes, it was the journald that died; on another, it
was some other kernel thread that died within the 8139too module.

Comment 4 Alexandre Oliva 2001-12-05 19:11:04 UTC
I've finally got the kernel oopses.  The machine is running 2.4.9-13 now, but
still crashing (see the first oops below).  I suggested the upgrade when
2.4.9-12 started crashing, since ext3 is supported on it.  The second oops is
one of the oopses with 2.4.9-12.  I can dig up some more oopses, if you're
interested.  They're not too different from these, though.

Dec  3 12:29:23 saofrancisco kernel: ------------[ cut here ]------------
Dec  3 12:29:23 saofrancisco kernel: kernel BUG at checkpoint.c:597!
Dec  3 12:29:23 saofrancisco kernel: invalid operand: 0000
Dec  3 12:29:23 saofrancisco kernel: CPU:    0
Dec  3 12:29:23 saofrancisco kernel: EIP:   
0010:[8139too:__insmod_8139too_O/lib/modules/2.4.9-13/kernel/drivers/net/+-2190325/96]
   Not tainted
Dec  3 12:29:23 saofrancisco kernel: EIP:    0010:[<f080640b>]    Not tainted
Dec  3 12:29:23 saofrancisco kernel: EFLAGS: 00010282
Dec  3 12:29:23 saofrancisco kernel: eax: 00000020   ebx: d75bcba0   ecx:
00000001   edx: 000190ff
Dec  3 12:29:23 saofrancisco kernel: esi: ee628400   edi: d1afef40   ebp:
df146220   esp: ee611e58
Dec  3 12:29:23 saofrancisco kernel: ds: 0018   es: 0018   ss: 0018
Dec  3 12:29:23 saofrancisco kernel: Process kjournald (pid: 188,
stackpage=ee611000)
Dec  3 12:29:23 saofrancisco kernel: Stack: f080b4ba 00000255 d75bcba0 f08060d3
ee628400 d75bcba0 de23a6a0 d1afef40 
Dec  3 12:29:23 saofrancisco kernel:        f080503d de23a6a0 ee6284e4 00000000
00000fd4 dd4ac02c 00000000 d1afef40 
Dec  3 12:29:23 saofrancisco kernel:        eca270e0 de23aa90 e3335180 ee043800
000000aa 00008174 c210a6c0 cb686420 
Dec  3 12:29:23 saofrancisco kernel: Call Trace:
[8139too:__insmod_8139too_O/lib/modules/2.4.9-13/kernel/drivers/net/+-2169670/96]
__insmod_jbd_S.rodata_L96 [jbd] 0x28ca 
Dec  3 12:29:23 saofrancisco kernel: Call Trace: [<f080b4ba>]
__insmod_jbd_S.rodata_L96 [jbd] 0x28ca 
Dec  3 12:29:23 saofrancisco kernel:
[8139too:__insmod_8139too_O/lib/modules/2.4.9-13/kernel/drivers/net/+-2191149/96]
journal_recover_R4c414ebd [jbd] 0xc13 
Dec  3 12:29:23 saofrancisco kernel: [<f08060d3>] journal_recover_R4c414ebd
[jbd] 0xc13 
Dec  3 12:29:23 saofrancisco kernel:
[8139too:__insmod_8139too_O/lib/modules/2.4.9-13/kernel/drivers/net/+-2195395/96]
journal_flushpage_R767cb1de [jbd] 0x122d 
Dec  3 12:29:23 saofrancisco kernel: [<f080503d>] journal_flushpage_R767cb1de
[jbd] 0x122d 
Dec  3 12:29:23 saofrancisco kernel: [schedule+612/960] schedule [kernel] 0x264 
Dec  3 12:29:23 saofrancisco kernel: [<c0115064>] schedule [kernel] 0x264 
Dec  3 12:29:23 saofrancisco kernel:
[8139too:__insmod_8139too_O/lib/modules/2.4.9-13/kernel/drivers/net/+-2187658/96]
journal_revoke_R969aab69 [jbd] 0x606 Dec  3 12:29:23 saofrancisco kernel:
[<f0806e76>] journal_revoke_R969aab69 [jbd] 0x606 
Dec  3 12:29:23 saofrancisco kernel:
[8139too:__insmod_8139too_O/lib/modules/2.4.9-13/kernel/drivers/net/+-2187968/96]
journal_revoke_R969aab69 [jbd] 0x4d0 Dec  3 12:29:23 saofrancisco kernel:
[<f0806d40>] journal_revoke_R969aab69 [jbd] 0x4d0 
Dec  3 12:29:23 saofrancisco kernel: [kernel_thread+38/48] kernel_thread
[kernel] 0x26 
Dec  3 12:29:23 saofrancisco kernel: [<c0105726>] kernel_thread [kernel] 0x26 
Dec  3 12:29:23 saofrancisco kernel:
[8139too:__insmod_8139too_O/lib/modules/2.4.9-13/kernel/drivers/net/+-2187936/96]
journal_revoke_R969aab69 [jbd] 0x4f0 Dec  3 12:29:23 saofrancisco kernel:
[<f0806d60>] journal_revoke_R969aab69 [jbd] 0x4f0 
Dec  3 12:29:23 saofrancisco kernel: 
Dec  3 12:29:23 saofrancisco kernel: 
Dec  3 12:29:23 saofrancisco kernel: Code: 0f 0b 59 58 8b 53 2c 85 d2 74 34 68
40 a5 80 f0 68 56 02 00 


Nov 26 11:55:36 saofrancisco kernel: Unable to handle kernel paging request at
virtual address 00008020
Nov 26 11:55:36 saofrancisco kernel:  printing eip:
Nov 26 11:55:36 saofrancisco kernel: f0841113
Nov 26 11:55:36 saofrancisco kernel: *pde = 00000000
Nov 26 11:55:36 saofrancisco kernel: Oops: 0000
Nov 26 11:55:36 saofrancisco kernel: CPU:    0
Nov 26 11:55:36 saofrancisco kernel: EIP:   
0010:[8139too:__insmod_8139too_O/lib/modules/2.4.9-12/kernel/drivers/net/+-2195181/96]
   Not tainted
Nov 26 11:55:36 saofrancisco kernel: EIP:    0010:[<f0841113>]    Not tainted
Nov 26 11:55:36 saofrancisco kernel: EFLAGS: 00010206
Nov 26 11:55:36 saofrancisco kernel: eax: 00000000   ebx: c423a910   ecx:
c423a910   edx: 00008000
Nov 26 11:55:36 saofrancisco kernel: esi: e29d4540   edi: e29d4578   ebp:
00000007   esp: eeee5e50
Nov 26 11:55:37 saofrancisco kernel: ds: 0018   es: 0018   ss: 0018
Nov 26 11:55:38 saofrancisco kernel: Process kjournald (pid: 196,
stackpage=eeee5000)
Nov 26 11:55:38 saofrancisco kernel: Stack: ddd19a90 e29d4540 c423a910 d449d00c
f0841174 c423a910 e29d4540 00000007 
Nov 26 11:55:38 saofrancisco kernel:        f08418cd c423a910 e29d4540 00000007
eef148e4 00000001 00000ff4 d449d00c 
Nov 26 11:55:38 saofrancisco kernel:        00000001 e29d4540 cd5b3980 efb979a0
00b1b8d4 c01a1887 00000001 c037179c 
Nov 26 11:55:38 saofrancisco kernel: Call Trace:
[8139too:__insmod_8139too_O/lib/modules/2.4.9-12/kernel/drivers/net/+-2195084/96]
journal_flushpage_R767cb1de [jbd] 0x354 
Nov 26 11:55:38 saofrancisco kernel: Call Trace: [<f0841174>]
journal_flushpage_R767cb1de [jbd] 0x354 
Nov 26 11:55:38 saofrancisco kernel:
[8139too:__insmod_8139too_O/lib/modules/2.4.9-12/kernel/drivers/net/+-2193203/96]
journal_flushpage_R767cb1de [jbd] 0xaad 
Nov 26 11:55:38 saofrancisco kernel: [<f08418cd>] journal_flushpage_R767cb1de
[jbd] 0xaad 
Nov 26 11:55:38 saofrancisco kernel: [do_rw_disk+359/944] do_rw_disk [kernel] 0x167 
Nov 26 11:55:39 saofrancisco kernel: [<c01a1887>] do_rw_disk [kernel] 0x167 
Nov 26 11:55:39 saofrancisco kernel: [start_request+416/528] start_request
[kernel] 0x1a0 
Nov 26 11:55:39 saofrancisco kernel: [<c018e920>] start_request [kernel] 0x1a0 
Nov 26 11:55:39 saofrancisco kernel: [ide_do_request+659/736] ide_do_request
[kernel] 0x293 
Nov 26 11:55:39 saofrancisco kernel: [<c018ec83>] ide_do_request [kernel] 0x293
Nov 26 11:55:39 saofrancisco kernel: [ide_intr+292/336] ide_intr [kernel] 0x124 
Nov 26 11:55:39 saofrancisco kernel: [<c018f144>] ide_intr [kernel] 0x124 
Nov 26 11:55:39 saofrancisco kernel: [ide_dma_intr+0/192] ide_dma_intr [kernel] 0x0 
Nov 26 11:55:39 saofrancisco kernel: [<c01978d0>] ide_dma_intr [kernel] 0x0 
Nov 26 11:55:39 saofrancisco kernel: [handle_IRQ_event+58/112] handle_IRQ_event
[kernel] 0x3a 
Nov 26 11:55:39 saofrancisco kernel: [<c010825a>] handle_IRQ_event [kernel] 0x3a 
Nov 26 11:55:39 saofrancisco kernel: [schedule+612/960] schedule [kernel] 0x264 
Nov 26 11:55:39 saofrancisco kernel: [<c01150d4>] schedule [kernel] 0x264 
Nov 26 11:55:39 saofrancisco kernel:
[8139too:__insmod_8139too_O/lib/modules/2.4.9-12/kernel/drivers/net/+-2183546/96]
journal_revoke_R969aab69 [jbd] 0x606 Nov 26 11:55:39 saofrancisco kernel:
[<f0843e86>] journal_revoke_R969aab69 [jbd] 0x606 
Nov 26 11:55:39 saofrancisco kernel:
[8139too:__insmod_8139too_O/lib/modules/2.4.9-12/kernel/drivers/net/+-2183856/96]
journal_revoke_R969aab69 [jbd] 0x4d0 Nov 26 11:55:39 saofrancisco kernel:
[<f0843d50>] journal_revoke_R969aab69 [jbd] 0x4d0 
Nov 26 11:55:39 saofrancisco kernel: [kernel_thread+38/48] kernel_thread
[kernel] 0x26 
Nov 26 11:55:39 saofrancisco kernel: [<c0105726>] kernel_thread [kernel] 0x26 
Nov 26 11:55:39 saofrancisco kernel:
[8139too:__insmod_8139too_O/lib/modules/2.4.9-12/kernel/drivers/net/+-2183824/96]
journal_revoke_R969aab69 [jbd] 0x4f0 Nov 26 11:55:39 saofrancisco kernel:
[<f0843d70>] journal_revoke_R969aab69 [jbd] 0x4f0 
Nov 26 11:55:39 saofrancisco kernel: 
Nov 26 11:55:39 saofrancisco kernel: 
Nov 26 11:55:39 saofrancisco kernel: Code: 8b 42 20 89 53 1c 89 43 20 89 5a 20
89 58 1c 89 6b 08 83 fd 


Comment 5 Alexandre Oliva 2001-12-05 19:19:38 UTC
Some more info: dmesg contains a lot of messages like this:

NETDEV WATCHDOG: eth0: transmit timed out
eth0: Tx queue start entry 3668  dirty entry 3664.
eth0:  Tx descriptor 0 is 00002000. (queue head)
eth0:  Tx descriptor 1 is 00002000.
eth0:  Tx descriptor 2 is 00002000.
eth0:  Tx descriptor 3 is 00002000.
eth0: Setting half-duplex based on auto-negotiated partner ability 0000.

lspci -v says:
00:11.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139 (rev 10)
        Subsystem: Realtek Semiconductor Co., Ltd. RT8139
        Flags: bus master, medium devsel, latency 64, IRQ 9
        I/O ports at cc00 [size=256]
        Memory at dfffff00 (32-bit, non-prefetchable) [size=256]
        Capabilities: <available only to root>


Comment 6 Alexandre Oliva 2002-04-10 23:07:10 UTC
Here's an oops obtained the other day, running 2.4.9-13debug.

Comment 7 Alexandre Oliva 2002-04-10 23:08:08 UTC
Created attachment 53264 [details]
Extracted from /var/log/messages.  For the first time, the kernel oops made it there.

Comment 8 Stephen Tweedie 2002-04-11 11:34:38 UTC
First, please attach the oopses rather than including them inline: it makes it
much easier to parse the content of the bug log.

The first of the two jbd oopses is something I've never seen before, anywhere. 
But the second one:

Nov 26 11:55:36 saofrancisco kernel: Unable to handle kernel paging request at
virtual address 00008020
Nov 26 11:55:36 saofrancisco kernel: eax: 00000000   ebx: c423a910   ecx:
c423a910   edx: 00008000

shows a typical footprint of random memory corruption.  We've been walking a
page's buffer ring and found a zero value that had a single-bit flip in it:
0x00008000 instead of all zeroes.  So we attempted to treat that value as a
pointer, and oopsed.

It's really not clear from this bug report whether you think this is one bug or
two.  Do the oopses go away with the other 8139 driver?  Have you run an
overnight memtest86 on this box?

Comment 9 Alexandre Oliva 2002-04-11 20:17:38 UTC
It's not clear to me either.  I've always suspected memory corruption caused by
the 8139too driver, but it's hard to make sure given how seldom it happens, and
how seldom it leaves stack traces.  I'll try to get the machine upgraded to
2.4.9-31 and hope the problem goes away.  This server will hopefully be replaced
soon, and then we'll be able to run memory tests on it for as long as we want
and verify whether it's a hardware or a software problem.

Comment 10 Alexandre Oliva 2002-04-29 18:28:11 UTC
Created attachment 55804 [details]
2.4.9-31 crashes too, even after replacement of memory and network card (for a similar model)

Comment 11 Alexandre Oliva 2002-10-08 15:39:30 UTC
Machine was retired, we won't get any further info from it.


Note You need to log in before you can comment on or make changes to this bug.