Bug 433806 - kernel 2.6.24.2-7.fc8 crashes after a while
kernel 2.6.24.2-7.fc8 crashes after a while
Status: CLOSED WONTFIX
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
8
All Linux
low Severity low
: ---
: ---
Assigned To: Kernel Maintainer List
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2008-02-21 10:39 EST by Adam Goode
Modified: 2009-01-09 04:22 EST (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-01-09 02:40:28 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Serial console crash log (19.60 KB, text/plain)
2008-02-21 10:39 EST, Adam Goode
no flags Details
dmesg (26.61 KB, text/plain)
2008-02-21 10:45 EST, Adam Goode
no flags Details
Calltrace, lspci, lsmod (22.24 KB, text/plain)
2008-03-03 14:54 EST, Vaclav "sHINOBI" Misek
no flags Details
KDB backtrace on hang (1.71 KB, text/plain)
2008-12-04 09:57 EST, Chase Douglas
no flags Details

  None (edit)
Description Adam Goode 2008-02-21 10:39:17 EST
Description of problem:
kernel 2.6.24.2-7.fc8 crashes after a while when X is running.


Version-Release number of selected component (if applicable):
kernel-2.6.24.2-7.fc8.i686


How reproducible:
Always, after minutes or hours.


Steps to Reproduce:
1. Boot into X.
2. Wait.
  
Actual results:
Seems to crash with list_del on a bad pointer. Then all hell breaks loose.
(Softlockup every 11 seconds.)

See attachment.
Comment 1 Adam Goode 2008-02-21 10:39:17 EST
Created attachment 295520 [details]
Serial console crash log
Comment 2 Adam Goode 2008-02-21 10:45:40 EST
Created attachment 295521 [details]
dmesg
Comment 3 Chuck Ebbert 2008-02-21 19:49:43 EST
  24:   53                      push   %ebx
  25:   83 ec 0c                sub    $0xc,%esp
  28:   8b 48 04                mov    0x4(%eax),%ecx
   0:   8b 11                   mov    (%ecx),%edx
   2:   39 c2                   cmp    %eax,%edx
Comment 4 Adam Goode 2008-02-27 10:44:06 EST
Something similar happens with kernel-2.6.24.3-12.fc8.

The crash seems to be triggered by heavy ath5k usage. Also, when I am not using
the wireless, I don't seem to be able to trigger it.

Something in ath5k is stomping over memory?
Comment 5 Dave Jones 2008-02-27 12:49:36 EST
ah, that's an important clue.   John, any reports of memory corruption in ath5k ?
Comment 6 John W. Linville 2008-02-27 15:47:24 EST
Oh, yuck...no, haven't heard anything before now.
Comment 7 John W. Linville 2008-02-27 15:50:09 EST
What sort of wireless configuration do you have?  What encryption (WEP, WPA, 
etc)?  Perhaps Nick or Luis have other ideas about how to find this potential 
memory corruption?
Comment 8 Luis R. Rodriguez 2008-02-27 17:01:08 EST
Hm, haven't hit an issue like this yet.. Can you install the latest drivers from
compat-wireless package (this reflects what's on wireless-testing) and see if it
still happens there?

Also please provide the output of ath_info. You can get ath_info from madwifi.
Its under madwifi/tools/.

Get your device's memory base address with lspci -vvv and then use it as follows:

./ath_info 0xeb000000

note the 0x before the address. This is because my lspci -vvv looks like this:

00:09.0 Ethernet controller: Atheros Communications, Inc. AR5212/AR5213
Multiprotocol MAC/baseband processor (rev 01)
        Subsystem: Z-Com, Inc. Unknown device 0027
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr-
Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 168 (2500ns min, 7000ns max), Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 7
        Region 0: Memory at eb010000 (32-bit, non-prefetchable) [size=64K]
        Capabilities: [44] Power Management version 2
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 PME-Enable- DSel=0 DScale=2 PME-
        Kernel driver in use: ath_pci
        Kernel modules: ath_pci
Comment 9 John W. Linville 2008-02-27 20:19:00 EST
Installing a current rawhide kernel should get you the same version of the 
driver that Luis is suggesting:

   http://koji.fedoraproject.org/koji/buildinfo?buildID=39187
Comment 10 Adam Goode 2008-02-29 17:14:49 EST
Seems to be working solidly with kernel 2.6.25-0.73.rc3.git1.fc9. I will leave
it over the weekend on serial console.

Sometimes I am getting data corruptions (detected by rsync), but I don't think
this is new.
Comment 11 Adam Goode 2008-02-29 17:21:14 EST
02:02.0 Ethernet controller: Atheros Communications, Inc. AR5212 802.11abg NIC
(rev 01)
        Subsystem: Phillips Components Unknown device 8331
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr-
Stepping- SERR+ FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 168 (2500ns min, 7000ns max), Cache Line Size: 32 bytes
        Interrupt: pin A routed to IRQ 11
        Region 0: Memory at c0210000 (32-bit, non-prefetchable) [size=64K]
        Capabilities: [44] Power Management version 2
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 PME-Enable- DSel=0 DScale=2 PME-
        Kernel driver in use: ath5k_pci
        Kernel modules: ath5k



[root@localhost tools]# ./ath_info 0xc0210000
 -==Device Information==-
MAC Version:  5212  (0x50)
MAC Revision: 5213  (0x56)
5Ghz PHY Revision: 5111  (0x17)
2Ghz PHY Revision: 2111  (0x23)
 -==EEPROM Information==-
EEPROM Version:     3.4
EEPROM Size:        16K
Regulatory Domain:  0x61
 -==== Capabilities ====-
|  802.11a Support: yes  |
|  802.11b Support: yes  |
|  802.11g Support: yes  |
|  RFKill  Support: no   |
 ========================
GPIO registers: CR 00000003 DO 00000021 DI 00000005
Comment 12 Vaclav "sHINOBI" Misek 2008-03-03 14:43:45 EST
On my machine the kernel kernel-2.6.24.3-12.fc8 also crashes, unfortunately no 
ath5, but it looks it can be related to the heavy network traffic (or higher
system load). 
Comment 13 Vaclav "sHINOBI" Misek 2008-03-03 14:54:36 EST
Created attachment 296670 [details]
Calltrace, lspci, lsmod
Comment 14 Steve Bergman 2008-03-07 17:16:25 EST
I just had a 70 user XDMCP crash with nothing in the log and a black console
screen.  It's a dual processor 8GB Dell sc1420.  X is used on the console and it
also servers 70 XDMCP and NX sessions.  It's running Fully updated X86_64 and
has performed flawlessly until last night's kernel upgrade.  I moved it back to
2.6.23.

Unfortunately, since there was no console or log output whatsoever, and I
absolutely cannot allow it to crash again, I'm afraid I'm not much help for
investigating the problem.  I've been using my laptop in X all day, and it is
running the kernel upgrade with no problems.  It is a single processor 768K
x86_32 machine.

lspci on the server shows:

00:00.0 Host bridge: Intel Corporation E7520 Memory Controller Hub (rev 09)
00:00.1 Class ff00: Intel Corporation E7525/E7520 Error Reporting Registers (rev 09)
00:02.0 PCI bridge: Intel Corporation E7525/E7520/E7320 PCI Express Port A (rev 09)
00:03.0 PCI bridge: Intel Corporation E7525/E7520/E7320 PCI Express Port A1 (rev 09)
00:04.0 PCI bridge: Intel Corporation E7525/E7520 PCI Express Port B (rev 09)
00:1d.0 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI
Controller #1 (rev 02)
00:1d.1 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI
Controller #2 (rev 02)
00:1d.2 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI
Controller #3 (rev 02)
00:1d.3 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI
Controller #4 (rev 02)
00:1d.7 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB2 EHCI
Controller (rev 02)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev c2)
00:1f.0 ISA bridge: Intel Corporation 82801EB/ER (ICH5/ICH5R) LPC Interface
Bridge (rev 02)
00:1f.1 IDE interface: Intel Corporation 82801EB/ER (ICH5/ICH5R) IDE Controller
(rev 02)
00:1f.2 IDE interface: Intel Corporation 82801EB (ICH5) SATA Controller (rev 02)
00:1f.3 SMBus: Intel Corporation 82801EB/ER (ICH5/ICH5R) SMBus Controller (rev 02)
01:00.0 PCI bridge: Intel Corporation 6700PXH PCI Express-to-PCI Bridge A
01:00.2 PCI bridge: Intel Corporation 6700PXH PCI Express-to-PCI Bridge B
02:0c.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27)
03:0d.0 RAID bus controller: Adaptec ASC-39320(B) U320 w/HostRAID (rev 10)
03:0d.1 RAID bus controller: Adaptec ASC-39320(B) U320 w/HostRAID (rev 10)
03:0e.0 Ethernet controller: Intel Corporation 82545GM Gigabit Ethernet
Controller (rev 04)
Comment 15 John W. Linville 2008-03-07 17:24:24 EST
Steve, I don't see any evidence of a wireless device in your lspci output.  
Perhaps this is unrelated to ath5k?
Comment 16 Steve Bergman 2008-03-07 17:26:55 EST
It just occurred to me that when I tried to ssh into the box from my office, I
got "ssh_exchange_identification: Connection closed by remote host".  So the
network connection was still alive, but SSH ran into some sort of problem. 
Nothing in the log though, so it was not able to write to disk at the time.

Also, should this bug be upped to greater than low priority and severity?
Comment 17 Steve Bergman 2008-03-07 17:31:53 EST
Correct.  No wireless.  Which does make me think it must be something else.

In the interest of completeness, there are also 2 usb devices attached.  A usb
hard drive which was not mounted at the time.  And an HP usb laser printer.
Comment 18 Chuck Ebbert 2008-03-18 19:54:14 EDT
(In reply to comment #12)
> On my machine the kernel kernel-2.6.24.3-12.fc8 also crashes, unfortunately no 
> ath5, but it looks it can be related to the heavy network traffic (or higher
> system load). 
> 

Your crash is completely different from the initial reported one.
Comment 19 Bug Zapper 2008-11-26 04:54:11 EST
This message is a reminder that Fedora 8 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 8.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '8'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 8's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 8 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Comment 20 Chase Douglas 2008-12-04 09:54:13 EST
I can confirm a seemingly identical problem on FC9. After heavy traffic usage, in my case trying to download a full DVD iso, the computer hangs. The first symptom is wireless networking going dead. After that, the keyboard becomes unresponsive. Finally, the mouse locks up and there's nothing left one can do.

I built the fedora 9 2.6.27.5-41 kernel with KDB. After triggering the hang, the serial console showed:

kernel BUG at drivers/net/wireless/ath5k/base.c:1708!

I will add an attachment with the backtrace at the time of the hang. Since I can reproduce with kdb and can easily view memory and register contents, please let me know if there is any further information that could be useful.
Comment 21 Chase Douglas 2008-12-04 09:57:59 EST
Created attachment 325692 [details]
KDB backtrace on hang
Comment 22 Bug Zapper 2009-01-09 02:40:28 EST
Fedora 8 changed to end-of-life (EOL) status on 2009-01-07. Fedora 8 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.

Note You need to log in before you can comment on or make changes to this bug.