Description of problem: The ath5k driver included in these test kernels available from people.redhat.com work fine, except when the interface is brought down (shutdown or hibernate) the system freezes. I will provide more hardware information tomorrow, as well as ask others to report their problem/hardware here as well. Version-Release number of selected component (if applicable): kernel-2.6.18-141.el5 kernel-2.6.18-144.el5 How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
I can confirm this on a Compaq CQ60-206US. The wireless card is an AR5007EG (listed in linux as AR242x): 07:00.0 Ethernet controller: Atheros Communications Inc. AR242x 802.11abg Wireless PCI Express Adapter (rev 01) Subsystem: Hewlett-Packard Company Unknown device 137a Flags: bus master, fast devsel, latency 0, IRQ 169 Memory at c2000000 (64-bit, non-prefetchable) [size=64K] Capabilities: [40] Power Management version 2 Capabilities: [50] Message Signalled Interrupts: 64bit- Queue=0/0 Enable- Capabilities: [60] Express Legacy Endpoint IRQ 0 Capabilities: [90] MSI-X: Enable- Mask- TabSize=1 Capabilities: [100] Advanced Error Reporting Capabilities: [140] Virtual Channel Steps to Reproduce: 1. Shutdown/Reboot computer or shutdown device (ifconfig ath0 down/service network stop) Actual results: Computer becomes unresponsive and hangs. To the point that you must hold the power button until it shuts off. Expected results: 1. Network device shuts down cleanly and computer does not hang. Additional info: None
In my case the hardware in the system is: 01:0e.0 Ethernet controller: Atheros Communications Inc. Atheros AR5001X+ Wireless Network Adapter (rev 01) Subsystem: Netgear Unknown device 5a00 Flags: bus master, medium devsel, latency 168, IRQ 217 Memory at fddf0000 (32-bit, non-prefetchable) [size=64K] Capabilities: [44] Power Management version 2 Hardware fails with ath5k module that ships with 2.6.18-128.el5. Hardware works fine with ath5k (for as-is in 2.6.18-141.el5, 2.6.18-143.el5.jwltest and as a 'backported' kmod-ath5k with 2.6.18-128.el5) but freezes the system when bringing down the interface (shutdown or hibernate). Worked fine before with the madwifi (dkms) package from RPMforge on 2.6.18-92.el5.
I have the same problem on a IBM Thinkpad T60. 03:00.0 Ethernet controller: Atheros Communications Inc. AR5212 802.11abg NIC (rev 01) Subsystem: IBM ThinkPad 11a/b/g Wireless LAN Mini Express Adapter (AR5BXB6) Flags: bus master, fast devsel, latency 0, IRQ 66 Memory at edf00000 (64-bit, non-prefetchable) [size=64K] Capabilities: [40] Power Management version 2 Capabilities: [50] Message Signalled Interrupts: 64bit- Queue=0/0 Enable- Capabilities: [60] Express Legacy Endpoint IRQ 0 Capabilities: [90] MSI-X: Enable- Mask- TabSize=1 Capabilities: [100] Advanced Error Reporting Capabilities: [140] Virtual Channel Hardware ok before kernel 2.6.18-128.el5 with madwifi from RPMforge With kernel 2.6.18-128.el5, the hardware doesn't work anymore with madwifi package, nor with ath5k module. Hardware works with kmod-ath5k package on kernel 2.6.18-128.el5 but system hangs on shutdown,hibernate and suspend. Hardware works with kernel 2.6.18-141.el5 but same problem on shutdown, hibernate and suspend.
Created attachment 344005 [details] SysRq T in the hung state I can reproduce the hang when downing the ath5k interface on -135.el5 and higher kernels. It is not reproducible with -134.el5.
Nothin in -135.el5 strikes me as obviously related. I'll take a closer look at the iwlwifi one (which includes mac80211 changes)... Could you try reverting this patch? linux-2.6-wireless-iwlwifi-booting-with-rf-kill-switch-enabled.patch * Fri Mar 13 2009 Don Zickus <dzickus> [2.6.18-135.el5] - [s390] iucv: failing cpu hot remove for inactive iucv (Hans-Joachim Picht ) [485412] - [s390] dasd: fix waitqueue for sleep_on_immediatly (Hans-Joachim Picht ) [480161] - [ide] increase timeouts in wait_drive_not_busy (Stanislaw Gruszka ) [464039] - [x86_64] mce: do not clear an unrecoverable error status (Aristeu Rozanski ) [489692] - [wireless] iwlwifi: booting with RF-kill switch enabled (John W. Linville ) [482990] - [net] put_cmsg: may cause application memory overflow (Jiri Pirko ) [488367] - [x86_64] fix gettimeoday TSC overflow issue (Prarit Bhargava ) [467942] - [net] ipv6: check hop limit setting in ancillary data (Jiri Pirko ) [487406] - [net] ipv6: check outgoing interface in all cases (Jiri Pirko ) [486215] - [acpi] disable GPEs at the start of resume (Matthew Garrett ) [456302] - [crypto] include crypto headers in kernel-devel (Neil Horman ) [470929] - [net] netxen: rebase for RHEL-5.4 (tcamuso ) [485381] - [misc] signal: modify locking to handle large loads (AMEET M. PARANJAPE ) [487376] - [kexec] add ability to dump log from vmcore file (Neil Horman ) [485308] - [fs] ext3: handle collisions in htree dirs (Eric Sandeen ) [465626] - [acpi] use vmalloc in acpi_system_read_dsdt (Prarit Bhargava ) [480142] - [misc] make ioctl.h compatible with userland (Jiri Pirko ) [473947] - [nfs] sunrpc: add sv_maxconn field to svc_serv (Jeff Layton ) [468092] - [nfs] lockd: set svc_serv->sv_maxconn to a better value (Jeff Layton ) [468092] - [mm] decrement reclaim_in_progress after an OOM kill (Larry Woodman ) [488955] - [misc] sysrq-t: display backtrace for runnable processes (Anton Arapov ) [456588]
John, reverting linux-2.6-wireless-iwlwifi-booting-with-rf-kill-switch-enabled.patch helps here. Dag & other reporters, I uploaded a test kernel with this patch reverted: http://michich.fedorapeople.org/bz499999/kernel-2.6.18-148.el5.bz499999test.i686.rpm http://michich.fedorapeople.org/bz499999/kernel-2.6.18-148.el5.bz499999test.x86_64.rpm http://disk.jabbim.cz/michich@jabber.cz/kernel-2.6.18-148.el5.bz499999test.src.rpm It's not the final fix of course, because we want iwlwifi working too.
One suspicious thing in the patch is that it uses cancel_rearming_delayed_work() on a non-keventd work.
Created attachment 344165 [details] proposed patch Not tested yet.
Created attachment 344183 [details] tested patch The previous patch changed the hang into infinite spinning, because cancel_rearming_delayed_work() assumes the work is pending. This patch works better. Upstream uses cancel_delayed_work_sync(), so replace it with cancel_delayed_work() and flush_workqueue().
Dag, David, Dirk, could you please test this?: http://michich.fedorapeople.org/bz499999/kernel-2.6.18-148.el5.bz499999test2.i686.rpm http://michich.fedorapeople.org/bz499999/kernel-2.6.18-148.el5.bz499999test2.x86_64.rpm http://disk.jabbim.cz/michich@jabber.cz/kernel-2.6.18-148.el5.bz499999test2.src.rpm
I just installed and tried the i686 rpm, cannot get this kernel to boot. I get kernel panic at CPU Scaling immediately after Checking for new hardware line. The kernel panic mentions cpu_freq_governor, cpu_set_policy, etc... in the call trace, then: Kernel panic - not syncing: Fatal exception
oops, forgot to mention, I tried "pci=nomsi" (for my sata drives) and "nomce" (since I have a Compaq). OLD WORKING KERNEL: On the running kernel, which I am using now, I have to have both on the kernel line or my laptop will not boot. I see this kernel panic without the "pci=nomsi" on kernel line with running kernel. I get the "Machine Check Exception" if I do not have "nomce". NEW TEST KERNEL: I tried them seperately then together and then without any. The system either hangs with no error or gives kernel panic but always at the same place in boot. Almost seems that the "pci=nomsi" option is not taking as I get the same error on old kernel without it.
David, your booting problem is unfortunate. Please report it as a separate BZ if you haven't done so already. It is unlikely to be related to ath5k.
I tested with kernel-2.6.18-148.el5.i686.rpm,kernel-2.6.18-148.el5.bz499999test1.i686.rpm, kernel-2.6.18-148.el5.bz499999test2.i686.rpm and my system hangs on "INIT: version 2.86 booting". With kernel-2.6.18-146.el5.i686.rpm my system is booting but still problem with ath5k module.
Correction: with kernel-2.6.18-148.el5.bz499999test2.i686.rpm my system hangs on "INIT: version 2.86 booting" but after a couple of minutes the system is booted. I can now use the ath5k module and initiate a proper shutdown.
Timing: With kernel-2.6.18-148.el5.bz499999test2.i686.rpm, I have my login screen after 10 minutes 44 seconds. With kernel-2.6.18-146.el5.i686.rpm, I have my login screen after 1 minute 20 seconds.
Dirk, so the patch fixes the wireless bug, but -148.el5 causes a big boot time regression on your machine compared to -146.el5. Please report it as a new BZ (with dmesg from both kernels and hw information). Meanwhile I'll post the wireless patch to rhkernel-list. Thanks, Michal
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
Dirk, Can you post the new bz and also boot your machine with 'nmi_watchdog=2' to see if the hang turns into an NMI backtrace.
Hi Dirk, hopefully your hang looks like the one in bugzilla 501178.
I have just created a new bug (bug #501441) as requested, sorry for the delay. I have also listed this bug report as an external bug on that bug report.
With kernel-2.6.18-148.el5.bz499999test2.i686.rpm (and kernel-2.6.18-148.el5) my system freezes when the initscripts start the network (mouse freezes, keyboard numlock does not light led). I have the initscripts take care of bringing up the wireless network because this box (TV system) logs in automatically and starts a TV application and NetworkManager in RHEL5 does not allow to have password-less authentication using gnome-keyring. Hardware: 01:0e.0 Ethernet controller: Atheros Communications Inc. Atheros AR5001X+ Wireless Network Adapter (rev 01) Subsystem: Netgear Unknown device 5a00 Flags: bus master, medium devsel, latency 168, IRQ 217 Memory at fddf0000 (32-bit, non-prefetchable) [size=64K] Capabilities: [44] Power Management version 2 I don't know if the problem is related to the fact that I start the hardware differently (via initscripts) or that it is different hardware altogether. I do not have this problem with kernel-2.6.18-144.el5 (but that one still crashes when the interface is brought down).
Dag, do you have the same problem booting with 2.6.18-147 and the patch in comment #9? There is a patch in -148 that we are currently tracking as a problem, so it would be nice to narrow down your issue as one that is a wireless problem or one with the questionable patch in -148. If you could build and it test it, I would really appreciate it.
I created a new bz https://bugzilla.redhat.com/show_bug.cgi?id=501676 With I boot my system with "nmi_watchdog=2", I have a black screen and the system hangs...
in kernel-2.6.18-150.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5 Please do NOT transition this bugzilla state to VERIFIED until our QE team has sent specific instructions indicating when to do so. However feel free to provide a comment indicating that this fix has been verified.
I downloaded and tested this kernel (kernel-2.6.18-150.el5), the kernel boot properly however the the ath5k driver fails to work for my card. I get errors in syslog and when I try to get DHCP IP on my card: /var/log/messages: May 21 23:33:12 mobile-tek kernel: ACPI: PCI Interrupt 0000:07:00.0[A] -> Link [Z012] -> GSI 23 (level, low) -> IRQ 169 May 21 23:33:12 mobile-tek kernel: ath5k_pci 0000:07:00.0: registered as 'phy0' May 21 23:33:12 mobile-tek kernel: ath5k phy0: Support for RF2425 is under development. May 21 23:33:13 mobile-tek kernel: ath5k phy0: Atheros AR2425 chip found (MAC: 0xe2, PHY: 0x70) May 21 23:33:13 mobile-tek NetworkManager: <info> wlan0: driver is 'ath5k_pci'. May 21 23:33:13 mobile-tek NetworkManager: <info> wlan0: driver supports SSID scans (scan_capa 0x01). May 21 23:33:13 mobile-tek NetworkManager: <info> Found new 802.11 WiFi device 'wlan0'. May 21 23:33:13 mobile-tek NetworkManager: <info> (wlan0): exported as /org/freedesktop/Hal/devices/net_00_24_2b_a2_fe_e0_0 May 21 23:33:17 mobile-tek NetworkManager: <info> (wlan0): device state change: 1 -> 2 May 21 23:33:17 mobile-tek NetworkManager: <info> (wlan0): bringing up device. May 21 23:33:17 mobile-tek kernel: ath5k phy0: gain calibration timeout (2412MHz) May 21 23:33:17 mobile-tek kernel: ath5k phy0: unable to reset hardware: -11 May 21 23:33:17 mobile-tek NetworkManager: <WARN> nm_device_hw_bring_up(): (wlan0): device not up after timeout! May 21 23:33:17 mobile-tek NetworkManager: <info> (wlan0): deactivating device (reason: 2). May 21 23:33:17 mobile-tek NetworkManager: <info> (wlan0): device state change: 2 -> 3 May 21 23:33:17 mobile-tek kernel: ath5k phy0: gain calibration timeout (2412MHz) May 21 23:33:17 mobile-tek kernel: ath5k phy0: unable to reset hardware: -11
Oh, the dreaded "ath5k phy0: gain calibration timeout". Sometimes I see this after boot on my laptop too. Once the card is in this state, it is stuck and not even reboot helps. I have to power off completely to make it work again. It's an upstream bug and it happens even with the latest Fedora kernel. Until the bug is fixed, could you please powerdown and try again? It should happer quite rarely.
Upstream bugreport for the "gain calibration timeout" bug has a proposed patch from Bob Copeland, see http://bugzilla.kernel.org/show_bug.cgi?id=12080 . I will test it.
This fix works fine for me. I had some troubles recompiling the -146 with the patch so I am happy that -150 includes it. I have no other bootproblems with -150 as I reported above. Thanks for your help !
I am a bit confused, Michal you say that this proposed patch, Dag you say that you are happy that it is included in the -150 kernel. I was testing the -150 kernel when ran across those errors. So, is the patch in this kernel or not? If so then this gain calibration issue is clearly no resolved, if not then shouldn't I wait for a patched kernel before retesting?
David, the original patch that fixes the "freeze" when bringing down the interface is included in -150. The patch that helps with your problem is not included. You can see this from the changelog of the packages (either by looking at the SPEC file or doing rpm -qp --changelog <rpm file>). The reason why you see a different behaviour (and still have a problem) is because our hardware is sufficiently different even though we use the same driver. I would guess the developers prefer to have you open a new bug-report for this rather than to use this one. PS It would be good to provide the information you gave me in private, that your device does not occasionally have this problem, but that it is a reproducable problem. PS2 Also, Alan Bartlett tried to port the patch you need to his kmod-ath5k package, but failed in his attempt. So I hope Red Hat can provide you with a patched and built kernel to test in your reproducible situation. Hang in there :-)
(In reply to comment #33) > I am a bit confused, Michal you say that this proposed patch, Dag you say that > you are happy that it is included in the -150 kernel. I was testing the -150 > kernel when ran across those errors. David, there are two different bugs: (1) freeze when ath5k interface is brought down (2) ath5k gain calibration timeout > So, is the patch in this kernel or not? The patch for bug (1) is included in the -150 kernel. The patch for bug (2) is not included in any kernel version yet, not even upstream. > If so then this gain calibration issue is clearly no resolved, if not then > shouldn't I wait for a patched kernel before retesting? Please test kernel -150 again and make sure you power down your laptop before trying (don't just reboot). Chances are you will not hit the gain calibration timeout, because its occurence is random and it happens rarely.
Forgive me, I am very new to trying to track kernel bugs. I completely missed that fact (2 seperate bugs). Michal, I have rebooted and also cold booted (completely turned off and started cold) many times and I have never seen this kernel bring up this card yet. It is always the error above. As Dag said, this is NOT a once in a while issue, this is every time. I have yet to ever make the ath5k driver work. Currently I am using the madwifi-hal and madwifi-hal-module RPM's that I built from their latest snapshot. From what Dag has mentioned to me in other conversation this my be an issue. Dag you mentioned that some kernel modules get replaced by these drivers. When I try testing ath5k I go an extra (in-needed?) step. I uninstall the madwifi RPM's and then re-install the kernel RPM using "--replacepkgs --replacefiles". I hope that is sufficient. Should I, as suggested start yet another bug/issue?
CentOS 5.3 IBM Thinkpad T20 (i386) [mpeters@atlantis ~]$ rpm -qa |grep '^kmod' kmod-ath5k-0.5.1-1.el5.elrepo kmod-mac80211-0.150-1.el5.elrepo 02:00.0 Ethernet controller: Atheros Communications Inc. Atheros AR5001X+ Wireless Network Adapter (rev 01) Installed last night. So far no freezing through several reboots. I was running the kernel-2.6.18-128.1.6.el5 kernel - updated to .10 and booted it but haven't tried rebooting yet. Suspend does not work properly on this laptop but that was an issue before (though it did work properly in Fedora 8, I don't think it ever has in CentOS) Note - Under madwifi, when searching for a wireless network, the "Act" and "Link" lights would alternately blink. When connection established, they would blink together. Now - the Link light seems to stay solid with occasional fluctuation, Act light doesn't do anything.
Addendum to my last remark - No reboot issues with .10 kernel, manually bringing iface down and up again also is without issue. If I can find my pci version of this card I'll test on x86_64 as well.
I have now created yet a new bug... bug #502542 to address the gain calibration issue. As to the reboot issue, using the -150 kernel I can again confirm that my laptop does not hang when the card is brought down (manually or during reboot).
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2009-1243.html