Bug 46126
Summary: | kernel 2.4.3-12's 8139too driver hangs | ||
---|---|---|---|
Product: | [Retired] Red Hat Linux | Reporter: | Alexandre Oliva <aoliva> |
Component: | kernel | Assignee: | Arjan van de Ven <arjanv> |
Status: | CLOSED WORKSFORME | QA Contact: | Brock Organ <borgan> |
Severity: | low | Docs Contact: | |
Priority: | medium | ||
Version: | 7.1 | CC: | jacques, sct |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | i386 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2002-04-29 18:28:15 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Attachments: |
Description
Alexandre Oliva
2001-06-27 00:12:39 UTC
I had the same problem. I was able to solve it by replacing alias eth0 8139too by alias eth0 8139too_old in /etc/modukes.conf. And now it works like before. It seems the "new and improved" 8139too driver doesn't work on some subset of 8139 cards. I'm sooo glad I decided to also ship the old one ;) I have recently experienced kernel OOPSes using the newer 2.4.9-12 (!= 2.4.3-12, as in the report subject), on a machine with an rtl8139 card and a large number of ext3 filesystems mounted atop RAID0 partitions. It's a heavily-used NFS server. In one of the crashes, it was the journald that died; on another, it was some other kernel thread that died within the 8139too module. I've finally got the kernel oopses. The machine is running 2.4.9-13 now, but still crashing (see the first oops below). I suggested the upgrade when 2.4.9-12 started crashing, since ext3 is supported on it. The second oops is one of the oopses with 2.4.9-12. I can dig up some more oopses, if you're interested. They're not too different from these, though. Dec 3 12:29:23 saofrancisco kernel: ------------[ cut here ]------------ Dec 3 12:29:23 saofrancisco kernel: kernel BUG at checkpoint.c:597! Dec 3 12:29:23 saofrancisco kernel: invalid operand: 0000 Dec 3 12:29:23 saofrancisco kernel: CPU: 0 Dec 3 12:29:23 saofrancisco kernel: EIP: 0010:[8139too:__insmod_8139too_O/lib/modules/2.4.9-13/kernel/drivers/net/+-2190325/96] Not tainted Dec 3 12:29:23 saofrancisco kernel: EIP: 0010:[<f080640b>] Not tainted Dec 3 12:29:23 saofrancisco kernel: EFLAGS: 00010282 Dec 3 12:29:23 saofrancisco kernel: eax: 00000020 ebx: d75bcba0 ecx: 00000001 edx: 000190ff Dec 3 12:29:23 saofrancisco kernel: esi: ee628400 edi: d1afef40 ebp: df146220 esp: ee611e58 Dec 3 12:29:23 saofrancisco kernel: ds: 0018 es: 0018 ss: 0018 Dec 3 12:29:23 saofrancisco kernel: Process kjournald (pid: 188, stackpage=ee611000) Dec 3 12:29:23 saofrancisco kernel: Stack: f080b4ba 00000255 d75bcba0 f08060d3 ee628400 d75bcba0 de23a6a0 d1afef40 Dec 3 12:29:23 saofrancisco kernel: f080503d de23a6a0 ee6284e4 00000000 00000fd4 dd4ac02c 00000000 d1afef40 Dec 3 12:29:23 saofrancisco kernel: eca270e0 de23aa90 e3335180 ee043800 000000aa 00008174 c210a6c0 cb686420 Dec 3 12:29:23 saofrancisco kernel: Call Trace: [8139too:__insmod_8139too_O/lib/modules/2.4.9-13/kernel/drivers/net/+-2169670/96] __insmod_jbd_S.rodata_L96 [jbd] 0x28ca Dec 3 12:29:23 saofrancisco kernel: Call Trace: [<f080b4ba>] __insmod_jbd_S.rodata_L96 [jbd] 0x28ca Dec 3 12:29:23 saofrancisco kernel: [8139too:__insmod_8139too_O/lib/modules/2.4.9-13/kernel/drivers/net/+-2191149/96] journal_recover_R4c414ebd [jbd] 0xc13 Dec 3 12:29:23 saofrancisco kernel: [<f08060d3>] journal_recover_R4c414ebd [jbd] 0xc13 Dec 3 12:29:23 saofrancisco kernel: [8139too:__insmod_8139too_O/lib/modules/2.4.9-13/kernel/drivers/net/+-2195395/96] journal_flushpage_R767cb1de [jbd] 0x122d Dec 3 12:29:23 saofrancisco kernel: [<f080503d>] journal_flushpage_R767cb1de [jbd] 0x122d Dec 3 12:29:23 saofrancisco kernel: [schedule+612/960] schedule [kernel] 0x264 Dec 3 12:29:23 saofrancisco kernel: [<c0115064>] schedule [kernel] 0x264 Dec 3 12:29:23 saofrancisco kernel: [8139too:__insmod_8139too_O/lib/modules/2.4.9-13/kernel/drivers/net/+-2187658/96] journal_revoke_R969aab69 [jbd] 0x606 Dec 3 12:29:23 saofrancisco kernel: [<f0806e76>] journal_revoke_R969aab69 [jbd] 0x606 Dec 3 12:29:23 saofrancisco kernel: [8139too:__insmod_8139too_O/lib/modules/2.4.9-13/kernel/drivers/net/+-2187968/96] journal_revoke_R969aab69 [jbd] 0x4d0 Dec 3 12:29:23 saofrancisco kernel: [<f0806d40>] journal_revoke_R969aab69 [jbd] 0x4d0 Dec 3 12:29:23 saofrancisco kernel: [kernel_thread+38/48] kernel_thread [kernel] 0x26 Dec 3 12:29:23 saofrancisco kernel: [<c0105726>] kernel_thread [kernel] 0x26 Dec 3 12:29:23 saofrancisco kernel: [8139too:__insmod_8139too_O/lib/modules/2.4.9-13/kernel/drivers/net/+-2187936/96] journal_revoke_R969aab69 [jbd] 0x4f0 Dec 3 12:29:23 saofrancisco kernel: [<f0806d60>] journal_revoke_R969aab69 [jbd] 0x4f0 Dec 3 12:29:23 saofrancisco kernel: Dec 3 12:29:23 saofrancisco kernel: Dec 3 12:29:23 saofrancisco kernel: Code: 0f 0b 59 58 8b 53 2c 85 d2 74 34 68 40 a5 80 f0 68 56 02 00 Nov 26 11:55:36 saofrancisco kernel: Unable to handle kernel paging request at virtual address 00008020 Nov 26 11:55:36 saofrancisco kernel: printing eip: Nov 26 11:55:36 saofrancisco kernel: f0841113 Nov 26 11:55:36 saofrancisco kernel: *pde = 00000000 Nov 26 11:55:36 saofrancisco kernel: Oops: 0000 Nov 26 11:55:36 saofrancisco kernel: CPU: 0 Nov 26 11:55:36 saofrancisco kernel: EIP: 0010:[8139too:__insmod_8139too_O/lib/modules/2.4.9-12/kernel/drivers/net/+-2195181/96] Not tainted Nov 26 11:55:36 saofrancisco kernel: EIP: 0010:[<f0841113>] Not tainted Nov 26 11:55:36 saofrancisco kernel: EFLAGS: 00010206 Nov 26 11:55:36 saofrancisco kernel: eax: 00000000 ebx: c423a910 ecx: c423a910 edx: 00008000 Nov 26 11:55:36 saofrancisco kernel: esi: e29d4540 edi: e29d4578 ebp: 00000007 esp: eeee5e50 Nov 26 11:55:37 saofrancisco kernel: ds: 0018 es: 0018 ss: 0018 Nov 26 11:55:38 saofrancisco kernel: Process kjournald (pid: 196, stackpage=eeee5000) Nov 26 11:55:38 saofrancisco kernel: Stack: ddd19a90 e29d4540 c423a910 d449d00c f0841174 c423a910 e29d4540 00000007 Nov 26 11:55:38 saofrancisco kernel: f08418cd c423a910 e29d4540 00000007 eef148e4 00000001 00000ff4 d449d00c Nov 26 11:55:38 saofrancisco kernel: 00000001 e29d4540 cd5b3980 efb979a0 00b1b8d4 c01a1887 00000001 c037179c Nov 26 11:55:38 saofrancisco kernel: Call Trace: [8139too:__insmod_8139too_O/lib/modules/2.4.9-12/kernel/drivers/net/+-2195084/96] journal_flushpage_R767cb1de [jbd] 0x354 Nov 26 11:55:38 saofrancisco kernel: Call Trace: [<f0841174>] journal_flushpage_R767cb1de [jbd] 0x354 Nov 26 11:55:38 saofrancisco kernel: [8139too:__insmod_8139too_O/lib/modules/2.4.9-12/kernel/drivers/net/+-2193203/96] journal_flushpage_R767cb1de [jbd] 0xaad Nov 26 11:55:38 saofrancisco kernel: [<f08418cd>] journal_flushpage_R767cb1de [jbd] 0xaad Nov 26 11:55:38 saofrancisco kernel: [do_rw_disk+359/944] do_rw_disk [kernel] 0x167 Nov 26 11:55:39 saofrancisco kernel: [<c01a1887>] do_rw_disk [kernel] 0x167 Nov 26 11:55:39 saofrancisco kernel: [start_request+416/528] start_request [kernel] 0x1a0 Nov 26 11:55:39 saofrancisco kernel: [<c018e920>] start_request [kernel] 0x1a0 Nov 26 11:55:39 saofrancisco kernel: [ide_do_request+659/736] ide_do_request [kernel] 0x293 Nov 26 11:55:39 saofrancisco kernel: [<c018ec83>] ide_do_request [kernel] 0x293 Nov 26 11:55:39 saofrancisco kernel: [ide_intr+292/336] ide_intr [kernel] 0x124 Nov 26 11:55:39 saofrancisco kernel: [<c018f144>] ide_intr [kernel] 0x124 Nov 26 11:55:39 saofrancisco kernel: [ide_dma_intr+0/192] ide_dma_intr [kernel] 0x0 Nov 26 11:55:39 saofrancisco kernel: [<c01978d0>] ide_dma_intr [kernel] 0x0 Nov 26 11:55:39 saofrancisco kernel: [handle_IRQ_event+58/112] handle_IRQ_event [kernel] 0x3a Nov 26 11:55:39 saofrancisco kernel: [<c010825a>] handle_IRQ_event [kernel] 0x3a Nov 26 11:55:39 saofrancisco kernel: [schedule+612/960] schedule [kernel] 0x264 Nov 26 11:55:39 saofrancisco kernel: [<c01150d4>] schedule [kernel] 0x264 Nov 26 11:55:39 saofrancisco kernel: [8139too:__insmod_8139too_O/lib/modules/2.4.9-12/kernel/drivers/net/+-2183546/96] journal_revoke_R969aab69 [jbd] 0x606 Nov 26 11:55:39 saofrancisco kernel: [<f0843e86>] journal_revoke_R969aab69 [jbd] 0x606 Nov 26 11:55:39 saofrancisco kernel: [8139too:__insmod_8139too_O/lib/modules/2.4.9-12/kernel/drivers/net/+-2183856/96] journal_revoke_R969aab69 [jbd] 0x4d0 Nov 26 11:55:39 saofrancisco kernel: [<f0843d50>] journal_revoke_R969aab69 [jbd] 0x4d0 Nov 26 11:55:39 saofrancisco kernel: [kernel_thread+38/48] kernel_thread [kernel] 0x26 Nov 26 11:55:39 saofrancisco kernel: [<c0105726>] kernel_thread [kernel] 0x26 Nov 26 11:55:39 saofrancisco kernel: [8139too:__insmod_8139too_O/lib/modules/2.4.9-12/kernel/drivers/net/+-2183824/96] journal_revoke_R969aab69 [jbd] 0x4f0 Nov 26 11:55:39 saofrancisco kernel: [<f0843d70>] journal_revoke_R969aab69 [jbd] 0x4f0 Nov 26 11:55:39 saofrancisco kernel: Nov 26 11:55:39 saofrancisco kernel: Nov 26 11:55:39 saofrancisco kernel: Code: 8b 42 20 89 53 1c 89 43 20 89 5a 20 89 58 1c 89 6b 08 83 fd Some more info: dmesg contains a lot of messages like this: NETDEV WATCHDOG: eth0: transmit timed out eth0: Tx queue start entry 3668 dirty entry 3664. eth0: Tx descriptor 0 is 00002000. (queue head) eth0: Tx descriptor 1 is 00002000. eth0: Tx descriptor 2 is 00002000. eth0: Tx descriptor 3 is 00002000. eth0: Setting half-duplex based on auto-negotiated partner ability 0000. lspci -v says: 00:11.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139 (rev 10) Subsystem: Realtek Semiconductor Co., Ltd. RT8139 Flags: bus master, medium devsel, latency 64, IRQ 9 I/O ports at cc00 [size=256] Memory at dfffff00 (32-bit, non-prefetchable) [size=256] Capabilities: <available only to root> Here's an oops obtained the other day, running 2.4.9-13debug. Created attachment 53264 [details]
Extracted from /var/log/messages. For the first time, the kernel oops made it there.
First, please attach the oopses rather than including them inline: it makes it much easier to parse the content of the bug log. The first of the two jbd oopses is something I've never seen before, anywhere. But the second one: Nov 26 11:55:36 saofrancisco kernel: Unable to handle kernel paging request at virtual address 00008020 Nov 26 11:55:36 saofrancisco kernel: eax: 00000000 ebx: c423a910 ecx: c423a910 edx: 00008000 shows a typical footprint of random memory corruption. We've been walking a page's buffer ring and found a zero value that had a single-bit flip in it: 0x00008000 instead of all zeroes. So we attempted to treat that value as a pointer, and oopsed. It's really not clear from this bug report whether you think this is one bug or two. Do the oopses go away with the other 8139 driver? Have you run an overnight memtest86 on this box? It's not clear to me either. I've always suspected memory corruption caused by the 8139too driver, but it's hard to make sure given how seldom it happens, and how seldom it leaves stack traces. I'll try to get the machine upgraded to 2.4.9-31 and hope the problem goes away. This server will hopefully be replaced soon, and then we'll be able to run memory tests on it for as long as we want and verify whether it's a hardware or a software problem. Created attachment 55804 [details]
2.4.9-31 crashes too, even after replacement of memory and network card (for a similar model)
Machine was retired, we won't get any further info from it. |