I switched to the compat-wireless drivers provided by Stanislaw Gruszka to solve bug 648732. Although that bug went away, I'm having problems with the connection randomly dropping and NM also having problems reconnecting now and then. This is what I get in messages: May 31 13:32:38 mjolnir NetworkManager[1531]: <info> (wlan0): supplicant connection state: completed -> disconnected May 31 13:32:38 mjolnir kernel: [463111.750213] cfg80211: Calling CRDA to update world regulatory domain May 31 13:32:38 mjolnir kernel: [463111.756422] cfg80211: World regulatory domain updated: May 31 13:32:38 mjolnir kernel: [463111.756425] cfg80211: (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp) May 31 13:32:38 mjolnir kernel: [463111.756429] cfg80211: (2402000 KHz - 2472000 KHz @ 40000 KHz), (300 mBi, 2000 mBm) May 31 13:32:38 mjolnir kernel: [463111.756437] cfg80211: (2457000 KHz - 2482000 KHz @ 20000 KHz), (300 mBi, 2000 mBm) May 31 13:32:38 mjolnir kernel: [463111.756440] cfg80211: (2474000 KHz - 2494000 KHz @ 20000 KHz), (300 mBi, 2000 mBm) May 31 13:32:38 mjolnir kernel: [463111.756443] cfg80211: (5170000 KHz - 5250000 KHz @ 40000 KHz), (300 mBi, 2000 mBm) May 31 13:32:38 mjolnir kernel: [463111.756446] cfg80211: (5735000 KHz - 5835000 KHz @ 40000 KHz), (300 mBi, 2000 mBm) May 31 13:32:38 mjolnir kernel: [463111.756459] cfg80211: Calling CRDA for country: SE May 31 13:32:38 mjolnir kernel: [463111.760733] cfg80211: Regulatory domain changed to country: SE May 31 13:32:38 mjolnir kernel: [463111.760737] cfg80211: (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp) May 31 13:32:38 mjolnir kernel: [463111.760740] cfg80211: (2402000 KHz - 2482000 KHz @ 40000 KHz), (N/A, 2000 mBm) May 31 13:32:38 mjolnir kernel: [463111.760743] cfg80211: (5170000 KHz - 5250000 KHz @ 40000 KHz), (N/A, 2000 mBm) May 31 13:32:38 mjolnir kernel: [463111.760746] cfg80211: (5250000 KHz - 5330000 KHz @ 40000 KHz), (N/A, 2000 mBm) May 31 13:32:38 mjolnir kernel: [463111.760749] cfg80211: (5490000 KHz - 5710000 KHz @ 40000 KHz), (N/A, 2700 mBm) May 31 13:32:38 mjolnir NetworkManager[1531]: <info> (wlan0): supplicant connection state: disconnected -> scanning May 31 13:32:54 mjolnir NetworkManager[1531]: <info> (wlan0): device state change: 8 -> 3 (reason 11) May 31 13:32:54 mjolnir NetworkManager[1531]: <info> (wlan0): deactivating device (reason: 11). May 31 13:32:54 mjolnir NetworkManager[1531]: <info> (wlan0): canceled DHCP transaction, DHCP client pid 6447 After a while I get: May 31 13:33:20 mjolnir NetworkManager[1531]: <warn> Activation (wlan0/wireless): association took too long. Nothing from the kernel in between these where it reports association status. Retrying sometimes works, but I've had to reload the module at some points to reset the card/driver.
It's not clear for me, it does not happen on old version i.e. 2.6.35 and does on new 2.6.39 driver, or problem was before (I think you or maybe someone else reported something similar)? How frequent these drops are?
I did not notice it at least before the driver upgrade. :) I had (have?) some other disconnects, but those turned out to be caused by packet loss and a NetworkManager misfeature (gnome bug 643456). The association timeout has happened three times a day yesterday and today, with different APs and different standards (g one day and n the other). Looking for unprovoked disconnects, I can find 12 of these the last few days, 10 of which is with the g AP. Every single one of them is followed in the log by an update of the regulatory domain. Some of them has some info before the disconnect though. Connected to the n AP: May 30 08:07:46 mjolnir kernel: [370864.827321] iwl4965 0000:03:00.0: Queue 7 stuck for 2000 ms. May 30 08:07:46 mjolnir kernel: [370864.827330] iwl4965 0000:03:00.0: On demand firmware reload May 30 08:07:46 mjolnir kernel: [370864.828232] ieee80211 phy0: Hardware restart was requested May 30 08:07:46 mjolnir kernel: [370864.829936] iwl4965 0000:03:00.0: MAC is in deep sleep!. CSR_GP_CNTRL = 0x080033D8 May 30 08:07:46 mjolnir kernel: [370864.885831] iwl4965 0000:03:00.0: Hardware error detected. Restarting. May 30 08:07:46 mjolnir kernel: [370864.885836] iwl4965 0000:03:00.0: Loaded firmware version: 228.61.2.24 May 30 08:07:46 mjolnir kernel: [370864.885840] iwl4965 0000:03:00.0: Not valid error log pointer 0x00000000 for RT uCode May 30 08:07:46 mjolnir kernel: [370864.885842] iwl4965 0000:03:00.0: FH register values: May 30 08:07:46 mjolnir kernel: [370864.885860] iwl4965 0000:03:00.0: FH_RSCSR_CHNL0_STTS_WPTR_REG: 0X138b1c00 May 30 08:07:46 mjolnir kernel: [370864.885877] iwl4965 0000:03:00.0: FH_RSCSR_CHNL0_RBDCB_BASE_REG: 0X013845f0 May 30 08:07:46 mjolnir kernel: [370864.885895] iwl4965 0000:03:00.0: FH_RSCSR_CHNL0_WPTR: 0X000000f8 May 30 08:07:46 mjolnir kernel: [370864.885912] iwl4965 0000:03:00.0: FH_MEM_RCSR_CHNL0_CONFIG_REG: 0X80819000 May 30 08:07:46 mjolnir kernel: [370864.885929] iwl4965 0000:03:00.0: FH_MEM_RSSR_SHARED_CTRL_REG: 0X0000003c May 30 08:07:46 mjolnir kernel: [370864.885947] iwl4965 0000:03:00.0: FH_MEM_RSSR_RX_STATUS_REG: 0X07030000 May 30 08:07:46 mjolnir kernel: [370864.885964] iwl4965 0000:03:00.0: FH_MEM_RSSR_RX_ENABLE_ERR_IRQ2DRV: 0X00000000 May 30 08:07:46 mjolnir kernel: [370864.885982] iwl4965 0000:03:00.0: FH_TSSR_TX_STATUS_REG: 0X07ff0000 May 30 08:07:46 mjolnir kernel: [370864.886015] iwl4965 0000:03:00.0: FH_TSSR_TX_ERROR_REG: 0X00000000 May 30 08:07:46 mjolnir kernel: [370864.886019] iwl4965 0000:03:00.0: Invalid event log pointer 0x00000000 for RT uCode May 30 08:07:46 mjolnir kernel: [370864.890037] ieee80211 phy0: Hardware restart was requested May 30 08:07:47 mjolnir kernel: [370866.207127] iwl4965 0000:03:00.0: Stopping AGG while state not ON or starting May 30 08:07:47 mjolnir kernel: [370866.207140] iwl4965 0000:03:00.0: queue number out of range: 0, must be 7 to 14 May 30 08:07:48 mjolnir kernel: [370866.707103] iwl4965 0000:03:00.0: Error sending REPLY_ADD_STA: time out after 500ms. May 30 08:07:48 mjolnir kernel: [370866.707116] HW problem - can not stop rx aggregation for tid 0 May 30 08:07:48 mjolnir kernel: [370866.707151] iwl4965 0000:03:00.0: Stopping AGG while state not ON or starting May 30 08:07:48 mjolnir kernel: [370866.707160] iwl4965 0000:03:00.0: queue number out of range: 0, must be 7 to 14 May 30 08:07:48 mjolnir kernel: [370866.707190] iwl4965 0000:03:00.0: Stopping AGG while state not ON or starting May 30 08:07:48 mjolnir kernel: [370866.707198] iwl4965 0000:03:00.0: queue number out of range: 0, must be 7 to 14 May 30 08:07:48 mjolnir kernel: [370866.707208] iwl4965 0000:03:00.0: Stopping AGG while state not ON or starting May 30 08:07:48 mjolnir kernel: [370866.707216] iwl4965 0000:03:00.0: queue number out of range: 0, must be 7 to 14 May 30 08:07:48 mjolnir kernel: [370866.709392] iwl4965 0000:03:00.0: index 0 not used in uCode key table. May 30 08:07:48 mjolnir kernel: [370867.209091] iwl4965 0000:03:00.0: Error sending REPLY_ADD_STA: time out after 500ms. May 30 08:07:48 mjolnir kernel: [370867.209109] ieee80211 phy0: failed to remove key (0, e0:91:f5:07:bd:10) from hardware (-110) May 30 08:07:50 mjolnir kernel: [370868.889096] iwl4965 0000:03:00.0: START_ALIVE timeout after 4000ms. May 30 08:07:50 mjolnir NetworkManager[1531]: <info> (wlan0): supplicant connection state: completed -> disconnected May 30 08:07:50 mjolnir kernel: [370869.163074] cfg80211: Calling CRDA to update world regulatory domain Connected to the g AP: May 30 11:58:13 mjolnir kernel: [378823.827636] iwl4965 0000:03:00.0: Queue 4 stuck for 2000 ms. May 30 11:58:13 mjolnir kernel: [378823.827643] iwl4965 0000:03:00.0: On demand firmware reload May 30 11:58:13 mjolnir kernel: [378823.835735] ieee80211 phy0: Hardware restart was requested May 30 11:58:13 mjolnir kernel: [378823.836015] iwl4965 0000:03:00.0: MAC is in deep sleep!. CSR_GP_CNTRL = 0x080033D8 May 30 11:58:13 mjolnir kernel: [378823.950608] iwl4965 0000:03:00.0: Hardware error detected. Restarting. May 30 11:58:13 mjolnir kernel: [378823.950617] iwl4965 0000:03:00.0: Loaded firmware version: 228.61.2.24 May 30 11:58:13 mjolnir kernel: [378823.950623] iwl4965 0000:03:00.0: Not valid error log pointer 0x00000000 for RT uCode May 30 11:58:13 mjolnir kernel: [378823.950627] iwl4965 0000:03:00.0: FH register values: May 30 11:58:13 mjolnir kernel: [378823.950647] iwl4965 0000:03:00.0: FH_RSCSR_CHNL0_STTS_WPTR_REG: 0X138b1c00 May 30 11:58:13 mjolnir kernel: [378823.950666] iwl4965 0000:03:00.0: FH_RSCSR_CHNL0_RBDCB_BASE_REG: 0X013845f0 May 30 11:58:13 mjolnir kernel: [378823.950686] iwl4965 0000:03:00.0: FH_RSCSR_CHNL0_WPTR: 0X000000f8 May 30 11:58:13 mjolnir kernel: [378823.950705] iwl4965 0000:03:00.0: FH_MEM_RCSR_CHNL0_CONFIG_REG: 0X80819000 May 30 11:58:13 mjolnir kernel: [378823.950725] iwl4965 0000:03:00.0: FH_MEM_RSSR_SHARED_CTRL_REG: 0X0000003c May 30 11:58:13 mjolnir kernel: [378823.950744] iwl4965 0000:03:00.0: FH_MEM_RSSR_RX_STATUS_REG: 0X07030000 May 30 11:58:13 mjolnir kernel: [378823.950763] iwl4965 0000:03:00.0: FH_MEM_RSSR_RX_ENABLE_ERR_IRQ2DRV: 0X00000000 May 30 11:58:13 mjolnir kernel: [378823.950783] iwl4965 0000:03:00.0: FH_TSSR_TX_STATUS_REG: 0X07ff0000 May 30 11:58:13 mjolnir kernel: [378823.950802] iwl4965 0000:03:00.0: FH_TSSR_TX_ERROR_REG: 0X00000000 May 30 11:58:13 mjolnir kernel: [378823.950807] iwl4965 0000:03:00.0: Invalid event log pointer 0x00000000 for RT uCode May 30 11:58:13 mjolnir kernel: [378823.966400] ieee80211 phy0: Hardware restart was requested May 30 11:58:13 mjolnir kernel: [378824.100100] usb 1-4: new high speed USB device using ehci_hcd and address 11 May 30 11:58:13 mjolnir kernel: [378824.214701] usb 1-4: New USB device found, idVendor=04b3, idProduct=4485 May 30 11:58:13 mjolnir kernel: [378824.214708] usb 1-4: New USB device strings: Mfr=0, Product=0, SerialNumber=0 May 30 11:58:13 mjolnir kernel: [378824.220229] hub 1-4:1.0: USB hub found May 30 11:58:13 mjolnir kernel: [378824.220321] hub 1-4:1.0: 4 ports detected May 30 11:58:15 mjolnir kernel: [378825.710071] iwl4965 0000:03:00.0: Error sending REPLY_WEPKEY: time out after 500ms. May 30 11:58:15 mjolnir kernel: [378825.710084] ieee80211 phy0: failed to remove key (0, ff:ff:ff:ff:ff:ff) from hardware (-110) May 30 11:58:17 mjolnir kernel: [378827.954070] iwl4965 0000:03:00.0: START_ALIVE timeout after 4000ms. May 30 11:58:17 mjolnir kernel: [378828.237910] cfg80211: Calling CRDA to update world regulatory domain May 30 11:58:17 mjolnir NetworkManager[1531]: <info> (wlan0): supplicant connection state: completed -> disconnected I have a few more like that if you want. Most of the entries are not preceded by anything relevant close by though.
> May 30 11:58:13 mjolnir kernel: [378823.827636] iwl4965 0000:03:00.0: Queue stuck for 2000 ms. > May 30 11:58:13 mjolnir kernel: [378823.827643] iwl4965 0000:03:00.0: On demand firmware reload This mean firmware hung, problem that is demmly hard to debug. I have some pending patches, that perhaps could help. Will prepare test package. If package do not help, I will ask you to capture verbose logs.
I'm not sure if it's related, but today I got a problem with massive packet loss (~70%). Nothing in dmesg, but "Tx excessive retries" in iwconfig ticks up as a steady pace. I can also see it trying lower and lower bit rates. It states a link quality of 60/70 and signal level of -50 dBm. Tried reassociating and then reloading the driver (to reset the firmware), but it had no effect. Switching to another SSID on the same AP and back again got things a bit more sane. I still see some packet loss and excessive retries, but sufficient packets get through to work now at least.
Created attachment 502577 [details] 0001-iwl4965-force-cts-to-self-protection.patch I did not prepared a package, but you can build it by yourself. Download packages from there: http://linuxwireless.org/en/users/Download/stable/#compat-wireless_2.6.39_stable_releases then please apply just attached 0001-iwl4965-force-cts-to-self-protection.patch patch, together with these two: http://marc.info/?l=linux-wireless&m=130694151802892&w=2 http://marc.info/?l=linux-wireless&m=130624736119019&w=2 and let me know it that fix anything.
Created attachment 502886 [details] send_tx_power_after_rxon_assoc.patch Another patch to test.
I won't be at the AP where I'm getting most of these issues until Tuesday, but I'll try to get this tested then. Should I apply all at once? Or first try without that last patch?
All at once. If things will not work situation would be clear. If will work, revert 0001-iwl4965-force-cts-to-self-protection.patch to see if it make any difference, 3 other patches have to be applied anyway, since they fix other bugs.
I just tested the modified drivers with the n AP here, and I'm still seeing the packet loss and excessive retries mentioned in comment 4. I'll be at the g AP tomorrow for testing with that one. Would it be possible to get the SRPM or spec file so I can build packages with the patches instead?
Is the packet loss perhaps bug 648732 in a new form? I tried switching back to the original drivers, and I get about the same packet loss as with the new ones. The difference being the "Received BA when not expected" messages instead of the excessive retries counter ticking up.
(you realized me that I volatile gpl :-) I put source rpm at http://people.redhat.com/sgruszka/compat-wireless/ I saw you CCed on upstream bug where I'm asking about test 5 patches that revert driver from 2.6.35 to 2.6.34. If you can, please do this, and tell which patch helps with some problem (if helps at all). On the meantime could you provide logs with debug=0x47ffffff when packages lost happen? (some description how to gather logs is here https://bugzilla.kernel.org/show_bug.cgi?id=35932#c2)
I haven't had time to play around with different drivers today unfortunately, but I have been seeing disconnects even with the original driver. This is what I get in dmesg: [65204.183144] No probe response from AP 00:1f:33:bd:8d:aa after 500ms, disconnecting. [65204.207110] cfg80211: Calling CRDA to update world regulatory domain [65204.211147] cfg80211: Calling CRDA for country: SE [65204.240432] cfg80211: Regulatory domain changed to country: SE [65204.240440] (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp) [65204.240448] (2402000 KHz - 2482000 KHz @ 40000 KHz), (N/A, 2000 mBm) [65204.240456] (5170000 KHz - 5250000 KHz @ 40000 KHz), (N/A, 2000 mBm) [65204.240463] (5250000 KHz - 5330000 KHz @ 40000 KHz), (N/A, 2000 mBm) [65204.240469] (5490000 KHz - 5710000 KHz @ 40000 KHz), (N/A, 2700 mBm) [65243.522880] wlan0: authenticate with 00:1f:33:bd:8d:aa (try 1) [65243.524741] wlan0: authenticated [65243.524776] wlan0: associate with 00:1f:33:bd:8d:aa (try 1) [65243.527245] wlan0: RX AssocResp from 00:1f:33:bd:8d:aa (capab=0x411 status=0 aid=10) [65243.527250] wlan0: associated [67184.621734] iwlagn 0000:03:00.0: Microcode SW error detected. Restarting 0x82000000. [67184.621745] iwlagn 0000:03:00.0: Loaded firmware version: 228.61.2.24 [67184.621790] iwlagn 0000:03:00.0: Start IWL Error Log Dump: [67184.621798] iwlagn 0000:03:00.0: Status: 0x000213E4, count: 5 [67184.621994] iwlagn 0000:03:00.0: Desc Time data1 data2 line [67184.622007] iwlagn 0000:03:00.0: FH_ERROR (#12) 1949185081 0x00000008 0x03530000 208 [67184.622017] iwlagn 0000:03:00.0: pc blink1 blink2 ilink1 ilink2 hcmd [67184.622027] iwlagn 0000:03:00.0: 0x0046C 0x0A332 0x004C2 0x006DE 0x0A34C 0x48400A8 [67184.622036] iwlagn 0000:03:00.0: FH register values: [67184.622080] iwlagn 0000:03:00.0: FH_RSCSR_CHNL0_STTS_WPTR_REG: 0X13866600 [67184.622122] iwlagn 0000:03:00.0: FH_RSCSR_CHNL0_RBDCB_BASE_REG: 0X01385aa0 [67184.622165] iwlagn 0000:03:00.0: FH_RSCSR_CHNL0_WPTR: 0X00000010 [67184.622208] iwlagn 0000:03:00.0: FH_MEM_RCSR_CHNL0_CONFIG_REG: 0X00819004 [67184.622251] iwlagn 0000:03:00.0: FH_MEM_RSSR_SHARED_CTRL_REG: 0X0000003c [67184.622294] iwlagn 0000:03:00.0: FH_MEM_RSSR_RX_STATUS_REG: 0X03530000 [67184.622337] iwlagn 0000:03:00.0: FH_MEM_RSSR_RX_ENABLE_ERR_IRQ2DRV: 0X00000000 [67184.622380] iwlagn 0000:03:00.0: FH_TSSR_TX_STATUS_REG: 0X07ff0002 [67184.622422] iwlagn 0000:03:00.0: FH_TSSR_TX_ERROR_REG: 0X00000000 [67184.622512] iwlagn 0000:03:00.0: Start IWL Event Log Dump: display last 20 entries [67184.622560] iwlagn 0000:03:00.0: EVT_LOGT:1949183706:0x00000222:0631 [67184.622600] iwlagn 0000:03:00.0: EVT_LOGT:1949183706:0x82c00126:0631 [67184.622638] iwlagn 0000:03:00.0: EVT_LOGT:1949183993:0x00000107:0106 [67184.622672] iwlagn 0000:03:00.0: EVT_LOGT:1949183994:0x00000000:0301 [67184.622672] iwlagn 0000:03:00.0: EVT_LOGT:1949184377:0x00000080:0313 [67184.622672] iwlagn 0000:03:00.0: EVT_LOGT:1949184379:0x000006db:0501 [67184.622672] iwlagn 0000:03:00.0: EVT_LOGT:1949184525:0x00000000:0320 [67184.622672] iwlagn 0000:03:00.0: EVT_LOGT:1949184605:0x00000000:0320 [67184.622672] iwlagn 0000:03:00.0: EVT_LOGT:1949184741:0x00000000:0320 [67184.622672] iwlagn 0000:03:00.0: EVT_LOGT:1949184765:0x00000000:0320 [67184.622672] iwlagn 0000:03:00.0: EVT_LOGT:1949184813:0x00000000:0320 [67184.622672] iwlagn 0000:03:00.0: EVT_LOGT:1949185021:0x00000000:0320 [67184.622672] iwlagn 0000:03:00.0: EVT_LOGT:1949185054:0x000002b3:0501 [67184.622672] iwlagn 0000:03:00.0: EVT_LOGT:1949185067:0x00000000:0504 [67184.622672] iwlagn 0000:03:00.0: EVT_LOGT:1949185069:0x00000000:0504 [67184.622672] iwlagn 0000:03:00.0: EVT_LOGT:1949185069:0x83018bd3:0500 [67184.622672] iwlagn 0000:03:00.0: EVT_LOGT:1949185070:0x83018bd3:0600 [67184.622672] iwlagn 0000:03:00.0: EVT_LOGT:1949185071:0x00000000:0600 [67184.622672] iwlagn 0000:03:00.0: EVT_LOGT:1949185076:0x000000c3:0310 [67184.622672] iwlagn 0000:03:00.0: EVT_LOGT:1949185084:0x00000000:0125 [71484.375149] No probe response from AP 00:1f:33:bd:8d:aa after 500ms, disconnecting. [71484.395793] cfg80211: Calling CRDA to update world regulatory domain [71484.395841] cfg80211: Calling CRDA for country: SE [71484.436412] cfg80211: Regulatory domain changed to country: SE [71484.436416] (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp) [71484.436419] (2402000 KHz - 2482000 KHz @ 40000 KHz), (N/A, 2000 mBm) [71484.436421] (5170000 KHz - 5250000 KHz @ 40000 KHz), (N/A, 2000 mBm) [71484.436424] (5250000 KHz - 5330000 KHz @ 40000 KHz), (N/A, 2000 mBm) [71484.436427] (5490000 KHz - 5710000 KHz @ 40000 KHz), (N/A, 2700 mBm) [71516.274133] wlan0: authenticate with 00:1f:33:bd:8d:aa (try 1) [71516.276123] wlan0: authenticated [71516.276185] wlan0: associate with 00:1f:33:bd:8d:aa (try 1) [71516.279570] wlan0: RX AssocResp from 00:1f:33:bd:8d:aa (capab=0x411 status=0 aid=6) [71516.279579] wlan0: associated [75469.209117] No probe response from AP 00:1f:33:bd:8d:aa after 500ms, disconnecting. [75469.230111] cfg80211: Calling CRDA to update world regulatory domain [75469.230160] cfg80211: Calling CRDA for country: SE [75469.274189] cfg80211: Regulatory domain changed to country: SE [75469.274193] (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp) [75469.274196] (2402000 KHz - 2482000 KHz @ 40000 KHz), (N/A, 2000 mBm) [75469.274199] (5170000 KHz - 5250000 KHz @ 40000 KHz), (N/A, 2000 mBm) [75469.274201] (5250000 KHz - 5330000 KHz @ 40000 KHz), (N/A, 2000 mBm) [75469.274204] (5490000 KHz - 5710000 KHz @ 40000 KHz), (N/A, 2700 mBm) [75508.638737] wlan0: authenticate with 00:1f:33:bd:8d:aa (try 1) [75508.640683] wlan0: authenticated [75508.640745] wlan0: associate with 00:1f:33:bd:8d:aa (try 1) [75508.643154] wlan0: RX AssocResp from 00:1f:33:bd:8d:aa (capab=0x411 status=0 aid=5) [75508.643163] wlan0: associated [76708.217124] No probe response from AP 00:1f:33:bd:8d:aa after 500ms, disconnecting. [76708.239245] cfg80211: Calling CRDA to update world regulatory domain [76708.239319] cfg80211: Calling CRDA for country: SE [76708.297327] cfg80211: Regulatory domain changed to country: SE [76708.297331] (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp) [76708.297334] (2402000 KHz - 2482000 KHz @ 40000 KHz), (N/A, 2000 mBm) [76708.297337] (5170000 KHz - 5250000 KHz @ 40000 KHz), (N/A, 2000 mBm) [76708.297339] (5250000 KHz - 5330000 KHz @ 40000 KHz), (N/A, 2000 mBm) [76708.297342] (5490000 KHz - 5710000 KHz @ 40000 KHz), (N/A, 2700 mBm) [76747.605137] wlan0: authenticate with 00:1f:33:bd:8d:aa (try 1) [76747.606967] wlan0: authenticated [76747.606997] wlan0: associate with 00:1f:33:bd:8d:aa (try 1) [76747.609450] wlan0: RX AssocResp from 00:1f:33:bd:8d:aa (capab=0x411 status=0 aid=10) [76747.609454] wlan0: associated I'm starting to get a bit worried here that I cannot return to an older driver to get back to a working state (with g at least). :/ I can (hopefully) have a look at the debug stuff tonight as the n AP is my home one.
You could also try bt_coex_active=0 module parameter [note this is iwlcore (or iwl_legacy on > 2.6.39) parameter]. Also others iwlagn (iwl4965) parameters swcrypto=1 and 11n_disable=1 . Info if any of that helps would be useful.
Created attachment 503486 [details] disable_plcp_check.patch Another test patch.
I've built a package with all 5 patches referenced so far. ~ [drzeus@mjolnir]$ rpm -q kmod-compat-wireless kmod-compat-wireless-2.6.39-1.fc14.2.x86_64 ~ [drzeus@mjolnir]$ modinfo iwl4965 | grep srcversion srcversion: 7C4B79A8193F76F7CDD0157 I've also added bt_coex_active=0 and swcrypto=1. I have not disabled n as that is what I'm using on my home AP where I'm seeing the massive packet loss. The situation seems to be a lot better in the initial testing I've done so far. I've seen two brief periods of packet loss, but it goes back to a sane state after that. Previously it has kept hovering at a packet loss of 10-30% once things have gone bad. I'll keep monitoring things and also see if I still get disconnects with the g AP. Many thanks for your attention to these issues.
Some more results from today's usage: I did not see any bad disconnects whilst on the g AP today. I got one "queue stuck", but it was just after the machine resumed from suspend, so I don't think that counts as this issue. There was one disconnect where it managed to reconnect directly though: Jun 8 13:11:01 mjolnir NetworkManager[1490]: <info> (wlan0): supplicant connection state: completed -> disconnected Jun 8 13:11:01 mjolnir kernel: [14598.745501] cfg80211: Calling CRDA to update world regulatory domain Jun 8 13:11:01 mjolnir kernel: [14598.781569] cfg80211: World regulatory domain updated: ... Back home on the n AP, I see packet loss now and then. Unfortunately it is just temporary so I'm having a hard time catching it for debug logging. I'll keep on trying though.
Created attachment 504982 [details] kernel.log.gz Finally! Didn't realise I needed to rebuild the module to get debug output, so I missed one opportunity. But today I caught it with a proper build. Attached is the kernel log from when debug was set to 0x47ffffff and I was seeing about 6% packet loss with ping -f: [root@mjolnir ~]# ping -f asgard PING asgard.ossman.eu (10.8.0.254) 56(84) bytes of data. ...........................................................................................................................................................................................................................................................................................................^ --- asgard.ossman.eu ping statistics --- 4502 packets transmitted, 4208 received, 6% packet loss, time 25891ms rtt min/avg/max/mdev = 0.887/7.620/73.324/12.176 ms, pipe 6, ipg/ewma 5.752/4.496 ms It wasn't one of the spectacular failures of about 50% packet loss, but hopefully it is enough so you can see what the hw is upset about.
Got a bad disconnect today with the g ap: Jun 17 14:17:28 mjolnir kernel: [147197.323524] cfg80211: Calling CRDA to update world regulatory domain Jun 17 14:17:28 mjolnir NetworkManager[1537]: <info> (wlan0): supplicant connection state: completed -> disconnected Jun 17 14:17:28 mjolnir kernel: [147197.355615] cfg80211: World regulatory domain updated: ... Jun 17 14:17:47 mjolnir NetworkManager[1537]: <info> (wlan0): supplicant connection state: disconnected -> scanning Jun 17 14:18:10 mjolnir NetworkManager[1537]: <warn> Activation (wlan0/wireless): association took too long. This is with all patches and the parameters above (minus disabling n): [root@mjolnir log]# grep . /sys/module/iwl4965/parameters/* /sys/module/iwl4965/parameters/11n_disable:0 /sys/module/iwl4965/parameters/amsdu_size_8K:1 /sys/module/iwl4965/parameters/debug:0 /sys/module/iwl4965/parameters/fw_restart:1 /sys/module/iwl4965/parameters/queues_num:0 /sys/module/iwl4965/parameters/swcrypto:1 [root@mjolnir log]# grep . /sys/module/iwl_legacy/parameters/* /sys/module/iwl_legacy/parameters/bt_coex_active:N /sys/module/iwl_legacy/parameters/led_mode:0 Is there anything more I should test, or should I wait until you can have a look at the earlier debug log?
Some sensitivity/calibrate messages look suspicious in the log: iwl4965_sens_energy_cck norm FA 204800 < min FA 512000, iwl4965_sens_auto_corr_ofdm norm FA 63283200 > max FA 5119550 FA - mean false alarm i.e when DSP chip take noise as valid signal, and realize lately this is a noise. I do not have messages like that in my log. Anyway, I have to analyse the logs and calibration code in more detailed to find possible bugs, that will take some time. From your site, could you force using some different channel and see if it does make different, issue could be also caused by radio pollution, however I do not think that make disconnect, so for sure we have some bugs in driver or firmware.
I've switched the n AP from "Auto", which at least currently meant channel 1, to forced channel 8. It is ticking up a few "excessive retries" immediately, but not enough to register as any measurable packet loss. I'll see how things progress... And just to clarify: n AP (home): packet loss, no noticeable disconnect issues g AP (work): no noticeable packet loss, random disconnects
Vacation time, so I haven't been doing any testing for a while. Things are back to normal this week though, and I've made one important discovery. Whilst at work, one of my colleagues was also having problems with wifi disconnects. After keeping an eye on it, it turns out that he sees the exact same behaviour as me, and at the exact same times. He's running Fedora 15 and a Intel 6xxx card. So, as these issues are happening on two independent machines at exactly the same time, I'd say it's far more likely that the problem is in that AP than the clients. For now, let's ignore the disconnect issues on the g AP and focus on the packet loss with the n AP.
Thanks for update. I'm currently cleaning up iwlegacy driver, after that should be able to better analyse sensitivity/calibration code which most likely is related with package loss. On the meantime could you go back to F-14 (2.6.35.13-93 or newer) kernel, which should now have "Received BA when not expect" fix, and see if package lost does not happen or happen there. Regarding n AP disconnect, this is driver problem too, because is should not have Microcode error and be able to reconnect again automatically.
„So, as these issues are happening on two independent machines at exactly the same time, I'd say it's far more likely that the problem is in that AP than the clients.“ I have disconnection problems with iwl3945 and I also thought it's the AP because it happened almost at the same time... but the interesting thing was that it was almost exactly every 10 minutes. So it might have been just a coincidence that the two disconnected at the same time as the DHCP logs were contradicting it later.
Just an update, I've been running 2.6.35.14-95.fc14.x86_64 for about a month now and so far I have yet to have a single major problem. I have not a single "Received BA..." in the logs, and I have never seen "excessive retries" tick up above 0. I have had the occasional hiccup where things seem to experience some packet loss. But they have been very brief and therefore impossible to diagnose. I have not done any attempts at provoking the bug with this kernel (yet ;)).
(In reply to comment #24) > I have never seen "excessive retries" tick up above 0. On older kernel this statistic is not passed to iwconfig, I do not remember when exactly that was changed, but pretty sure it was after 2.6.35. On older kernels "excessive retries" this can be read by: mount -t debugfs debug /sys/kernel/debug/ cat /sys/kernel/debug/ieee80211/phy0/statistics/failed_count
Hmm... things might no be so peachy then. I'm having a bit of packet loss on and off last few days (just switched to 2.6.35.14-96.fc14.x86_64, but that might be a coincidence), and I see failed_count ticking up: [root@mjolnir ~]# cat /sys/kernel/debug/ieee80211/phy0/statistics/failed_count 259394
Not sure if this tells you anything, but I was running "iw dev wlan0 link" during the PL period, and "tx bitrate" kept showing "1 MBit/s" intermittently. This behaviour seemed to stop about the same time as the PL issue went away.
I've upgraded to Fedora 16 (3.1.0-7.fc16.x86_64), and the problems unfortunately are present there as well. I have a whole bunch of "Received BA when not expected" and "Tx excessive retries" ticks up like crazy. :/
I'm also seeing these next to the BA messages: [20243.044306] iwl4965 0000:03:00.0: BUG_ON idx doesn't match seq control idx=31, seq_idx=2592, seq=41472 Can't say I've noticed those before.
We do not change almost anything in iwl4965 driver between 2.6.39 and 3.1, I think these new (old) problems happens because of changes in mac80211 and/or NetworkManager/wpa_supplicant. I do not have any remedy yet :-( You may try 11n_disable=1 and swcrypto=1 modules options.
(In reply to comment #30) > We do not change almost anything in iwl4965 driver between 2.6.39 and 3.1, I mean almost nothing was changed
Currently the benefits outweigh the downsides, so I'm persevering for now. I'm able to get by mostly, so I just wanted to do a status report after the upgrade. Many thanks for your continuing efforts with this.
I've switched to a Thinkpad 420s now with a iwl 6205. Still seeing issues with this card as well: [45577.293103] iwlwifi 0000:03:00.0: Queue 11 stuck for 10000 ms. [45577.293116] iwlwifi 0000:03:00.0: Current SW read_ptr 59 write_ptr 26 [45577.293197] iwlwifi 0000:03:00.0: Current HW read_ptr 59 write_ptr 26 [45577.293205] iwlwifi 0000:03:00.0: On demand firmware reload [45577.293988] ieee80211 phy0: Hardware restart was requested [45577.294382] iwlwifi 0000:03:00.0: L1 Enabled; Disabling L0S [45577.294625] iwlwifi 0000:03:00.0: Radio type=0x1-0x2-0x0 I need to reload the module after this to get it working. Should I open a separate bug as it is different hw?
Yes, currently 6205 is handled by different driver.
Pierre, do you still own old hardware for eventual testing?
Yup. It's still sitting here as a backup laptop. :) It lacks a disk right now though, so I'd need some heads up when you have some tests to run. Regarding the new card, do you have an existing bug for that or should I open a new one?
There are lot's off "queue stuck" problems reported, but they generally mean firmware hung, what can have different reasons. Can be one or other firmware bug or driver not talking nice to firmware in one or other place, some start to happen at 2.6.38 other at 3.1 or 3.2, etc. So it's hard to classify them. I think the best would be open a fresh bug report.
There are also cases when watchdog itself couse problems. You can check if wd_disable=1 helps.
[mass update] kernel-3.3.0-4.fc16 has been pushed to the Fedora 16 stable repository. Please retest with this update.
Dave, please fix your script you do not have to ask 3 times :-) There is little chance that this bug is fixed ...
apologies for that, bugzilla proxy error.
I've been using the old laptop for some time now, and as a part of that I tried disabling N on the card. Unfortunately that does not make the problems go away. They are noticeably less severe, but I still get spikes of packet loss now and then. Just FYI.
What is AP model and settings? Perhaps I'll try to order the same device. But before that can you check if updating AP firmware would help by a chance ?
It's a Netgear WNAP210 configured as follows: * 11ng * SSID broadcast on * Channel 8/2.447GHz * MCS Index: Best * Channel width: Dynamic (20/40) * Guard interval: Auto * Output power: Full I'm also using the VLAN feature fully with two SSID:s configured. I've upgraded it to the latest firmware just now. We'll see if that changes anything.
Updated kernel to 3.3.4-1.fc16.x86_64 today and still facing the problem of: [15775.970067] iwlwifi 0000:09:00.0: Queue 12 stuck for 10000 ms. [15775.970072] iwlwifi 0000:09:00.0: Current SW read_ptr 88 write_ptr 171 [15775.970101] iwlwifi 0000:09:00.0: Current HW read_ptr 88 write_ptr 171 [15775.970103] iwlwifi 0000:09:00.0: On demand firmware reload [15775.970454] ieee80211 phy0: Hardware restart was requested [15775.970518] iwlwifi 0000:09:00.0: L1 Disabled; Enabling L0S [15775.977198] iwlwifi 0000:09:00.0: Radio type=0x2-0x2-0x1 Will try with wd_disable=1 now, as advised above. In the meantime people affected by this issue might want to check out a little script I've crafted: https://gist.github.com/2595849
David, this bug track 4965 problems (iwlegacy driver). Above "queue stuck" is on some other adapter (handled by iwlwifi driver), you should add your comment to bug 805285. Thanks.
I'm still getting some packet loss, so I'm afraid the firmware upgrade of the access point did not help. :/
Same Issue here: Jun 20 20:23:08 th1nkpad kernel: [ 89.038021] cfg80211: (5470000 KHz - 5725000 KHz @ 40000 KHz), (N/A, 2698 mBm) Jun 20 20:23:08 th1nkpad NetworkManager[758]: <info> (wlan0): supplicant interface state: associating -> associated Jun 20 20:23:09 th1nkpad NetworkManager[758]: <info> (wlan0): supplicant interface state: associated -> completed Jun 20 20:23:17 th1nkpad kernel: [ 97.796140] iwlwifi 0000:02:00.0: Queue 12 stuck for 2000 ms. Jun 20 20:23:17 th1nkpad kernel: [ 97.796147] iwlwifi 0000:02:00.0: Current SW read_ptr 123 write_ptr 205 Jun 20 20:23:17 th1nkpad kernel: [ 97.796200] iwlwifi 0000:02:00.0: Current HW read_ptr 123 write_ptr 205 Jun 20 20:23:17 th1nkpad kernel: [ 97.796203] iwlwifi 0000:02:00.0: On demand firmware reload Jun 20 20:23:17 th1nkpad kernel: [ 97.796687] ieee80211 phy0: Hardware restart was requested Jun 20 20:23:17 th1nkpad kernel: [ 97.796779] iwlwifi 0000:02:00.0: L1 Enabled; Disabling L0S Jun 20 20:23:17 th1nkpad kernel: [ 97.803493] iwlwifi 0000:02:00.0: Radio type=0x0-0x3-0x1 Tested Kernels: kernel-3.4.2-4.fc17.x86_64 kernel-3.4.0-1.fc17.x86_64 kernel-3.1.0-7.fc16.x86_64 only one that works without interrupting the wlan-connection is 3.1.0-7.
Same issue with me on Fedora 17 with latest kernel kernel-3.4.3-1.fc17.x86_64 I see below error message in /var/log/messages Jun 25 23:27:04 npatil kernel: [ 4132.439244] iwlwifi 0000:02:00.0: Queue 2 stuck for 2000 ms. Jun 25 23:27:04 npatil kernel: [ 4132.439251] iwlwifi 0000:02:00.0: Current SW read_ptr 197 write_ptr 205 Jun 25 23:27:04 npatil kernel: [ 4132.439305] iwlwifi 0000:02:00.0: Current HW read_ptr 197 write_ptr 205 Jun 25 23:27:04 npatil kernel: [ 4132.439309] iwlwifi 0000:02:00.0: On demand firmware reload Jun 25 23:27:04 npatil kernel: [ 4132.439727] ieee80211 phy0: Hardware restart was requested Jun 25 23:27:04 npatil kernel: [ 4132.439816] iwlwifi 0000:02:00.0: L1 Enabled; Disabling L0S Jun 25 23:27:04 npatil kernel: [ 4132.440005] iwlwifi 0000:02:00.0: Radio type=0x0-0x3-0x1
You have different hardware than 4965, you should CC yourself to bug 833117 instead.
Well bad news: that is the result of buying CRAP. This is a problem Intel is aware of for many years now and they are doing NOTHING (search for iwl4965 with same problems). Intel doesn't mean quality any more..same as Thinkpads don't hold the good old IBM-quality-standard anymore. Buy a card with Atheros-Chipset and you'll be happy. The only way to get this fixed is to change the CLOSED-SOURCE firmware..well..good luck (;
> Well bad news: that is the result of buying CRAP. This is a problem Intel is > aware of for many years now and they are doing NOTHING (search for iwl4965 > with same problems). Any links, citations, or just misusing bugzilla as a general discussion forum?
(In reply to comment #54) > > Well bad news: that is the result of buying CRAP. This is a problem Intel is > > aware of for many years now and they are doing NOTHING (search for iwl4965 > > with same problems). > > Any links, citations, or just misusing bugzilla as a general discussion > forum? Just google for iwl4965 with the pasted error-messaged. All those bug-listings you find end up in..nothing. The oldest post is from 2009 this shows everything.
# Mass update to all open bugs. Kernel 3.6.2-1.fc16 has just been pushed to updates. This update is a significant rebase from the previous version. Please retest with this kernel, and let us know if your problem has been fixed. In the event that you have upgraded to a newer release and the bug you reported is still present, please change the version field to the newest release you have encountered the issue with. Before doing so, please ensure you are testing the latest kernel update in that release and attach any new and relevant information you may have gathered. If you are not the original bug reporter and you still experience this bug, please file a new report, as it is possible that you may be seeing a different problem. (Please don't clone this bug, a fresh bug referencing this bug in the comment is sufficient).
I tried reenabling n again. Mostly works fine, but now and then all traffic just stops. No errors in dmesg. Reassociating is sufficient to get it working again.
(In reply to comment #33) > I've switched to a Thinkpad 420s now with a iwl 6205. Still seeing issues > with this card as well: > > [45577.293103] iwlwifi 0000:03:00.0: Queue 11 stuck for 10000 ms. > [45577.293116] iwlwifi 0000:03:00.0: Current SW read_ptr 59 write_ptr 26 > [45577.293197] iwlwifi 0000:03:00.0: Current HW read_ptr 59 write_ptr 26 > [45577.293205] iwlwifi 0000:03:00.0: On demand firmware reload > [45577.293988] ieee80211 phy0: Hardware restart was requested > [45577.294382] iwlwifi 0000:03:00.0: L1 Enabled; Disabling L0S > [45577.294625] iwlwifi 0000:03:00.0: Radio type=0x1-0x2-0x0 > > I need to reload the module after this to get it working. > > Should I open a separate bug as it is different hw? Hi Pierre, I also have a t420s with the same issue... my log looks similar to yours. did you find a fix? thanks in advance Juan Mijares
(In reply to comment #58) > > Hi Pierre, I also have a t420s with the same issue... my log looks similar > to yours. did you find a fix? thanks in advance > Sorry no. I had problems with the t420s so I'm no longer using it.
This message is a reminder that Fedora 17 is nearing its end of life. Approximately 4 (four) weeks from now Fedora will stop maintaining and issuing updates for Fedora 17. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '17'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 17's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 17 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior to Fedora 17's end of life. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete.