Description of problem: kernel 2.6.24.2-7.fc8 crashes after a while when X is running. Version-Release number of selected component (if applicable): kernel-2.6.24.2-7.fc8.i686 How reproducible: Always, after minutes or hours. Steps to Reproduce: 1. Boot into X. 2. Wait. Actual results: Seems to crash with list_del on a bad pointer. Then all hell breaks loose. (Softlockup every 11 seconds.) See attachment.
Created attachment 295520 [details] Serial console crash log
Created attachment 295521 [details] dmesg
24: 53 push %ebx 25: 83 ec 0c sub $0xc,%esp 28: 8b 48 04 mov 0x4(%eax),%ecx 0: 8b 11 mov (%ecx),%edx 2: 39 c2 cmp %eax,%edx
Something similar happens with kernel-2.6.24.3-12.fc8. The crash seems to be triggered by heavy ath5k usage. Also, when I am not using the wireless, I don't seem to be able to trigger it. Something in ath5k is stomping over memory?
ah, that's an important clue. John, any reports of memory corruption in ath5k ?
Oh, yuck...no, haven't heard anything before now.
What sort of wireless configuration do you have? What encryption (WEP, WPA, etc)? Perhaps Nick or Luis have other ideas about how to find this potential memory corruption?
Hm, haven't hit an issue like this yet.. Can you install the latest drivers from compat-wireless package (this reflects what's on wireless-testing) and see if it still happens there? Also please provide the output of ath_info. You can get ath_info from madwifi. Its under madwifi/tools/. Get your device's memory base address with lspci -vvv and then use it as follows: ./ath_info 0xeb000000 note the 0x before the address. This is because my lspci -vvv looks like this: 00:09.0 Ethernet controller: Atheros Communications, Inc. AR5212/AR5213 Multiprotocol MAC/baseband processor (rev 01) Subsystem: Z-Com, Inc. Unknown device 0027 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 168 (2500ns min, 7000ns max), Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 7 Region 0: Memory at eb010000 (32-bit, non-prefetchable) [size=64K] Capabilities: [44] Power Management version 2 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=2 PME- Kernel driver in use: ath_pci Kernel modules: ath_pci
Installing a current rawhide kernel should get you the same version of the driver that Luis is suggesting: http://koji.fedoraproject.org/koji/buildinfo?buildID=39187
Seems to be working solidly with kernel 2.6.25-0.73.rc3.git1.fc9. I will leave it over the weekend on serial console. Sometimes I am getting data corruptions (detected by rsync), but I don't think this is new.
02:02.0 Ethernet controller: Atheros Communications, Inc. AR5212 802.11abg NIC (rev 01) Subsystem: Phillips Components Unknown device 8331 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 168 (2500ns min, 7000ns max), Cache Line Size: 32 bytes Interrupt: pin A routed to IRQ 11 Region 0: Memory at c0210000 (32-bit, non-prefetchable) [size=64K] Capabilities: [44] Power Management version 2 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=2 PME- Kernel driver in use: ath5k_pci Kernel modules: ath5k [root@localhost tools]# ./ath_info 0xc0210000 -==Device Information==- MAC Version: 5212 (0x50) MAC Revision: 5213 (0x56) 5Ghz PHY Revision: 5111 (0x17) 2Ghz PHY Revision: 2111 (0x23) -==EEPROM Information==- EEPROM Version: 3.4 EEPROM Size: 16K Regulatory Domain: 0x61 -==== Capabilities ====- | 802.11a Support: yes | | 802.11b Support: yes | | 802.11g Support: yes | | RFKill Support: no | ======================== GPIO registers: CR 00000003 DO 00000021 DI 00000005
On my machine the kernel kernel-2.6.24.3-12.fc8 also crashes, unfortunately no ath5, but it looks it can be related to the heavy network traffic (or higher system load).
Created attachment 296670 [details] Calltrace, lspci, lsmod
I just had a 70 user XDMCP crash with nothing in the log and a black console screen. It's a dual processor 8GB Dell sc1420. X is used on the console and it also servers 70 XDMCP and NX sessions. It's running Fully updated X86_64 and has performed flawlessly until last night's kernel upgrade. I moved it back to 2.6.23. Unfortunately, since there was no console or log output whatsoever, and I absolutely cannot allow it to crash again, I'm afraid I'm not much help for investigating the problem. I've been using my laptop in X all day, and it is running the kernel upgrade with no problems. It is a single processor 768K x86_32 machine. lspci on the server shows: 00:00.0 Host bridge: Intel Corporation E7520 Memory Controller Hub (rev 09) 00:00.1 Class ff00: Intel Corporation E7525/E7520 Error Reporting Registers (rev 09) 00:02.0 PCI bridge: Intel Corporation E7525/E7520/E7320 PCI Express Port A (rev 09) 00:03.0 PCI bridge: Intel Corporation E7525/E7520/E7320 PCI Express Port A1 (rev 09) 00:04.0 PCI bridge: Intel Corporation E7525/E7520 PCI Express Port B (rev 09) 00:1d.0 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #1 (rev 02) 00:1d.1 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #2 (rev 02) 00:1d.2 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #3 (rev 02) 00:1d.3 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #4 (rev 02) 00:1d.7 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB2 EHCI Controller (rev 02) 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev c2) 00:1f.0 ISA bridge: Intel Corporation 82801EB/ER (ICH5/ICH5R) LPC Interface Bridge (rev 02) 00:1f.1 IDE interface: Intel Corporation 82801EB/ER (ICH5/ICH5R) IDE Controller (rev 02) 00:1f.2 IDE interface: Intel Corporation 82801EB (ICH5) SATA Controller (rev 02) 00:1f.3 SMBus: Intel Corporation 82801EB/ER (ICH5/ICH5R) SMBus Controller (rev 02) 01:00.0 PCI bridge: Intel Corporation 6700PXH PCI Express-to-PCI Bridge A 01:00.2 PCI bridge: Intel Corporation 6700PXH PCI Express-to-PCI Bridge B 02:0c.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27) 03:0d.0 RAID bus controller: Adaptec ASC-39320(B) U320 w/HostRAID (rev 10) 03:0d.1 RAID bus controller: Adaptec ASC-39320(B) U320 w/HostRAID (rev 10) 03:0e.0 Ethernet controller: Intel Corporation 82545GM Gigabit Ethernet Controller (rev 04)
Steve, I don't see any evidence of a wireless device in your lspci output. Perhaps this is unrelated to ath5k?
It just occurred to me that when I tried to ssh into the box from my office, I got "ssh_exchange_identification: Connection closed by remote host". So the network connection was still alive, but SSH ran into some sort of problem. Nothing in the log though, so it was not able to write to disk at the time. Also, should this bug be upped to greater than low priority and severity?
Correct. No wireless. Which does make me think it must be something else. In the interest of completeness, there are also 2 usb devices attached. A usb hard drive which was not mounted at the time. And an HP usb laser printer.
(In reply to comment #12) > On my machine the kernel kernel-2.6.24.3-12.fc8 also crashes, unfortunately no > ath5, but it looks it can be related to the heavy network traffic (or higher > system load). > Your crash is completely different from the initial reported one.
This message is a reminder that Fedora 8 is nearing its end of life. Approximately 30 (thirty) days from now Fedora will stop maintaining and issuing updates for Fedora 8. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '8'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 8's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 8 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora please change the 'version' of this bug to the applicable version. If you are unable to change the version, please add a comment here and someone will do it for you. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
I can confirm a seemingly identical problem on FC9. After heavy traffic usage, in my case trying to download a full DVD iso, the computer hangs. The first symptom is wireless networking going dead. After that, the keyboard becomes unresponsive. Finally, the mouse locks up and there's nothing left one can do. I built the fedora 9 2.6.27.5-41 kernel with KDB. After triggering the hang, the serial console showed: kernel BUG at drivers/net/wireless/ath5k/base.c:1708! I will add an attachment with the backtrace at the time of the hang. Since I can reproduce with kdb and can easily view memory and register contents, please let me know if there is any further information that could be useful.
Created attachment 325692 [details] KDB backtrace on hang
Fedora 8 changed to end-of-life (EOL) status on 2009-01-07. Fedora 8 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. Thank you for reporting this bug and we are sorry it could not be fixed.