This seems to be a known issue: under relatively heavy load (the easiest way to reproduce is to run few HD streaming flash videos from YouTube/Vimeo), the iwl3945 drops connection and does not reconnect. Cycling through rmmod/modprobe seems to help. The quick search https://bugzilla.redhat.com/buglist.cgi?quicksearch=REPLY_RXON seems to produce few related/duplicate issues, but as they all are reported for the Fedora 12 or earlier, I am opening this ticket. I had the same issue with Fedora 12, on pre-release kernel for 13 issue is still present. Kernel version: 2.6.33.3-72.fc13.x86_64 Adding the /var/log/messages for the iwl3945: May 6 12:36:39 miracle kernel: iwl3945 0000:02:00.0: Error sending REPLY_RXON: time out after 500ms. May 6 12:36:39 miracle kernel: iwl3945 0000:02:00.0: Error setting new configuration (-110). May 6 12:36:40 miracle kernel: iwl3945 0000:02:00.0: Error sending REPLY_RXON: time out after 500ms. May 6 12:36:40 miracle kernel: iwl3945 0000:02:00.0: Error setting new configuration (-110). May 6 12:36:40 miracle kernel: iwl3945 0000:02:00.0: Error sending REPLY_RXON: time out after 500ms. May 6 12:36:40 miracle kernel: iwl3945 0000:02:00.0: Error setting new configuration (-110). May 6 12:36:41 miracle kernel: iwl3945 0000:02:00.0: Error sending REPLY_SCAN_CMD: time out after 500ms. May 6 12:36:41 miracle kernel: iwl3945 0000:02:00.0: Error sending REPLY_TX_PWR_TABLE_CMD: time out after 500ms. May 6 12:36:42 miracle kernel: iwl3945 0000:02:00.0: Error sending REPLY_SCAN_CMD: time out after 500ms. May 6 12:36:42 miracle kernel: iwl3945 0000:02:00.0: Error sending REPLY_RXON: time out after 500ms. May 6 12:36:42 miracle kernel: iwl3945 0000:02:00.0: Error setting new configuration (-110). May 6 12:36:43 miracle kernel: iwl3945 0000:02:00.0: Error sending REPLY_RXON: time out after 500ms. May 6 12:36:43 miracle kernel: iwl3945 0000:02:00.0: Error setting new configuration (-110). May 6 12:36:43 miracle kernel: iwl3945 0000:02:00.0: Error sending REPLY_TX_PWR_TABLE_CMD: time out after 500ms. May 6 12:36:47 miracle kernel: iwl3945 0000:02:00.0: Error sending REPLY_RXON: time out after 500ms. May 6 12:36:47 miracle kernel: iwl3945 0000:02:00.0: Error setting new configuration (-110). May 6 12:36:48 miracle kernel: iwl3945 0000:02:00.0: Error sending REPLY_SCAN_CMD: time out after 500ms. May 6 12:36:48 miracle kernel: iwl3945 0000:02:00.0: Error sending REPLY_TX_PWR_TABLE_CMD: time out after 500ms. May 6 12:36:49 miracle kernel: iwl3945 0000:02:00.0: Error sending REPLY_SCAN_CMD: time out after 500ms. May 6 12:36:49 miracle kernel: iwl3945 0000:02:00.0: Error sending REPLY_RXON: time out after 500ms. May 6 12:36:49 miracle kernel: iwl3945 0000:02:00.0: Error setting new configuration (-110). [rmmod] May 6 12:36:49 miracle kernel: iwl3945 0000:02:00.0: PCI INT A disabled [modprobe] May 6 12:36:51 miracle kernel: iwl3945: Intel(R) PRO/Wireless 3945ABG/BG Network Connection driver for Linux, 2.6.33.3-72.fc13.x86_64-kds May 6 12:36:51 miracle kernel: iwl3945: Copyright(c) 2003-2009 Intel Corporation May 6 12:36:51 miracle kernel: iwl3945 0000:02:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16 May 6 12:36:51 miracle kernel: iwl3945 0000:02:00.0: Tunable channels: 13 802.11bg, 23 802.11a channels May 6 12:36:51 miracle kernel: iwl3945 0000:02:00.0: Detected Intel Wireless WiFi Link 3945ABG ... May 6 12:36:51 miracle kernel: iwl3945 0000:02:00.0: firmware: requesting iwlwifi-3945-2.ucode ... May 6 12:36:51 miracle kernel: iwl3945 0000:02:00.0: loaded firmware version 15.32.2.9 May 6 12:36:51 miracle kernel: ADDRCONF(NETDEV_UP): wlan0: link is not ready ... May 6 12:36:54 miracle kernel: ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready
There seems to be the related thread in linux-wireless http://marc.info/?l=linux-wireless&m=127151315515173&w=2
Please create a file named /etc/modprobe.d/iwl3945.conf with this one line: options iwl3945 disable_hw_scan=1 After that, either reboot or execute this sequence: modprobe -r iwl3945 modprobe iwl3945 Does that produce any better results?
This seems to enhance stability a bit, but the issue still happens: May 13 01:30:10 miracle kernel: iwl3945 0000:02:00.0: Error sending REPLY_RXON: time out after 500ms. May 13 01:30:10 miracle kernel: iwl3945 0000:02:00.0: Error clearing ASSOC_MSK on current configuration (-110). May 13 01:30:11 miracle kernel: iwl3945 0000:02:00.0: Error sending REPLY_RXON: time out after 500ms. May 13 01:30:11 miracle kernel: iwl3945 0000:02:00.0: Error clearing ASSOC_MSK on current configuration (-110). [skipped repeating rows] May 13 01:30:18 miracle kernel: iwl3945 0000:02:00.0: Error sending REPLY_RXON: time out after 500ms. May 13 01:30:18 miracle kernel: iwl3945 0000:02:00.0: Error clearing ASSOC_MSK on current configuration (-110). May 13 01:30:19 miracle kernel: iwl3945 0000:02:00.0: Error sending REPLY_TX_PWR_TABLE_CMD: time out after 500ms. [skipped repeating rows] May 13 01:30:31 miracle kernel: iwl3945 0000:02:00.0: Error sending REPLY_RXON: time out after 500ms. May 13 01:30:31 miracle kernel: iwl3945 0000:02:00.0: Error clearing ASSOC_MSK on current configuration (-110). May 13 01:30:32 miracle kernel: iwl3945 0000:02:00.0: Error sending REPLY_TX_PWR_TABLE_CMD: time out after 500ms. May 13 01:30:32 miracle kernel: iwl3945 0000:02:00.0: No space in command queue May 13 01:30:32 miracle kernel: iwl3945 0000:02:00.0: Restarting adapter due to queue full May 13 01:30:32 miracle kernel: iwl3945 0000:02:00.0: Error sending REPLY_RXON: enqueue_hcmd failed: -28 May 13 01:30:32 miracle kernel: iwl3945 0000:02:00.0: Error clearing ASSOC_MSK on current configuration (-28). Also, the new log messages started appearing in syslog with disable_hw_scan=1 : May 13 01:38:13 miracle kernel: iwl3945 0000:02:00.0: Failed to get channel info for channel 100 [0] May 13 01:38:14 miracle kernel: iwl3945 0000:02:00.0: Failed to get channel info for channel 104 [0] [skipped repeating rows] May 13 01:48:17 miracle kernel: iwl3945 0000:02:00.0: Failed to get channel info for channel 136 [0] May 13 01:48:18 miracle kernel: iwl3945 0000:02:00.0: Failed to get channel info for channel 140 [0]
There is an improvement - after same hiccup, iwl3945 reported restart of adapter and then managed to reconnect. IIRC, before it was not able to reconnect. Log below: May 13 14:23:45 miracle kernel: iwl3945 0000:02:00.0: Error sending REPLY_RXON: time out after 500ms. May 13 14:23:45 miracle kernel: iwl3945 0000:02:00.0: Error setting new configuration (-110). May 13 14:23:45 miracle kernel: iwl3945 0000:02:00.0: Error sending REPLY_RXON: time out after 500ms. May 13 14:23:45 miracle kernel: iwl3945 0000:02:00.0: Error setting new configuration (-110). [skipped repeating rows] May 13 14:23:58 miracle kernel: iwl3945 0000:02:00.0: Error setting new configuration (-110). May 13 14:23:58 miracle kernel: iwl3945 0000:02:00.0: Error sending REPLY_RXON: time out after 500ms. May 13 14:23:58 miracle kernel: iwl3945 0000:02:00.0: Error setting new configuration (-110). May 13 14:23:58 miracle kernel: iwl3945 0000:02:00.0: No space in command queue May 13 14:23:58 miracle kernel: iwl3945 0000:02:00.0: Restarting adapter due to queue full May 13 14:23:58 miracle kernel: iwl3945 0000:02:00.0: Error sending REPLY_RXON: enqueue_hcmd failed: -28 May 13 14:23:58 miracle kernel: iwl3945 0000:02:00.0: Error setting new configuration (-28). May 13 14:24:00 miracle kernel: iwl3945 0000:02:00.0: Failed to get channel info for channel 140 [0] [skipped some rows] May 13 14:24:09 miracle dhclient[16668]: Listening on LPF/wlan0/00:13:02:b8:20:df May 13 14:24:09 miracle dhclient[16668]: Sending on LPF/wlan0/00:13:02:b8:20:df May 13 14:24:09 miracle dhclient[16668]: Sending on Socket/fallback May 13 14:24:09 miracle dhclient[16668]: DHCPREQUEST on wlan0 to 255.255.255.255 port 67 May 13 14:24:09 miracle dhclient[16668]: DHCPACK from 10.0.0.138 May 13 14:24:09 miracle dhclient[16668]: bound to 10.0.0.2 -- renewal in 35749 seconds.
Looks like the hardware is stuck here ... the kernel you are running (as you discovered) has some safety in it to detect a stuck queue when it is full. In later kernels we have a new mechanism to hopefully do this detection earlier. The patch that accomplishes this is (taken from wireless-next-2.6): commit b74e31a9bc1013e69b85b139072485dc153453dd Author: Wey-Yi Guy <wey-yi.w.guy> Date: Mon Mar 1 17:23:50 2010 -0800 iwlwifi: Recover TX flow stall due to stuck queue Monitors the internal TX queues periodically. When a queue is stuck for some unknown conditions causing the throughput to drop and the transfer is stop, the driver will force firmware reload and bring the system back to normal operational state. The iwlwifi devices behave differently in this regard so this feature is made part of the ops infrastructure so we can have more control on how to monitor and recover from tx queue stall case per device. Signed-off-by: Trieu 'Andrew' Nguyen <trieux.t.nguyen> Signed-off-by: Wey-Yi Guy <wey-yi.w.guy> Signed-off-by: Reinette Chatre <reinette.chatre> Note that this patch requires firmware restart to be supported ("iwl_force_reset(priv, IWL_FW_RESET);"), but I believe you already ported this to FC13. One small additional change needed on top of this patch is to actually enable this feature for iwl3945 since we have not yet seen the need for this on iwl3945, but that obviously changed. I think it will be something like this (whitespace broken copy&pasted and will not apply to your repo ... just to get idea): diff --git a/drivers/net/wireless/iwlwifi/iwl-3945.c b/drivers/net/wireless/iwlwifi/iwl-3945.c index 7cdda5b..8e68026 100644 --- a/drivers/net/wireless/iwlwifi/iwl-3945.c +++ b/drivers/net/wireless/iwlwifi/iwl-3945.c @@ -2700,6 +2700,7 @@ static struct iwl_lib_ops iwl3945_lib = { .config_ap = iwl3945_config_ap, .manage_ibss_station = iwl3945_manage_ibss_station, .check_plcp_health = iwl3945_good_plcp_health, + .recover_from_tx_stall = iwl_bg_monitor_recover, .debugfs_ops = { .rx_stats_read = iwl3945_ucode_rx_stats_read, @@ -2744,6 +2745,7 @@ static struct iwl_cfg iwl3945_bg_cfg = { .monitor_recover_period = IWL_MONITORING_PERIOD, .max_event_log_size = 512, .tx_power_by_driver = true, + .monitor_recover_period = IWL_MONITORING_PERIOD, }; static struct iwl_cfg iwl3945_abg_cfg = {
"iwlwifi: Recover TX flow stall due to stuck queue" is already in F-13. Also, it looks like we already have the '.monitor_recover_period = IWL_MONITORING_PERIOD,' bits, but I've added the '.recover_from_tx_stall = iwl_bg_monitor_recover,' bit to the test kernels here: http://koji.fedoraproject.org/koji/taskinfo?taskID=2197986 Kyryl, when the build finishes can you give those a try?
Had this new kernel running for about two days, the following was logged into the /var/log/messages: May 20 15:19:06 miracle kernel: iwl3945 0000:02:00.0: queue 2 stuck 3 time. Fw reload. May 20 15:19:06 miracle kernel: iwl3945 0000:02:00.0: On demand firmware reload .... May 21 22:42:16 miracle kernel: iwl3945 0000:02:00.0: queue 2 stuck 3 time. Fw reload. May 21 22:42:16 miracle kernel: iwl3945 0000:02:00.0: On demand firmware reload May 21 22:42:25 miracle kernel: iwl3945 0000:02:00.0: queue 4 stuck 3 time. Fw reload. May 21 22:42:25 miracle kernel: iwl3945 0000:02:00.0: On demand firmware reload May 21 22:43:10 miracle kernel: iwl3945 0000:02:00.0: queue 4 stuck 3 time. Fw reload. May 21 22:43:10 miracle kernel: iwl3945 0000:02:00.0: On demand firmware reload May 21 22:44:22 miracle kernel: iwl3945 0000:02:00.0: queue 4 stuck 3 time. Fw reload. May 21 22:44:22 miracle kernel: iwl3945 0000:02:00.0: On demand firmware reload All the above is for the case when option "disable_hw_scan=1" is not passed to module (i.e. hw scan is enabled). I didn't notice any wifi connection interruptions when this was logged, so things look very good! Thanks.
kernel-2.6.33.5-112.fc13 has been submitted as an update for Fedora 13. http://admin.fedoraproject.org/updates/kernel-2.6.33.5-112.fc13
kernel-2.6.33.5-112.fc13 has been pushed to the Fedora 13 stable repository. If problems still persist, please make note of it in this bug report.