Description of problem: When I try and connect my laptop to the wifi network at the comfortinn hotel I get the following error: WARNING: at drivers/net/wireless/iwlwifi/iwl-core.c:482 iwl_chec k_rxon_cmd+0x211/0x21f [iwlagn]() time: Sun Oct 23 20:32:04 2011 backtrace: :WARNING: at drivers/net/wireless/iwlwifi/iwl-core.c:482 iwl_check_rxon_cmd+0x21 1/0x21f [iwlagn]() :Hardware name: 4384BP8 :Invalid RXON (0x40), channel 6 My guess is the WARNING is correct, in that there probably is a value here that is not supported by my hardware. The problem is when one looks to the code they see: if ((rxon->flags & (RXON_FLG_CCK_MSK | RXON_FLG_SHORT_SLOT_MSK)) == (RXON_FLG_CCK_MSK | RXON_FLG_SHORT_SLOT_MSK)) { IWL_WARN(priv, "CCK and short slot\n"); errors |= BIT(7); } So even those this is intended to be a warning that would probably not even prevent a successful wifi connection, it is added as an error. Consequently between NetworkManager and the kernel, I get an endless loop of errors that I can only stop by physically turning off my wifi card. (Even then it takes about 10 minutes for the abort messages to stop appearing after I turn my wifi card off.) Version-Release number of selected component (if applicable): kernel-2.6.40.4-5.fc15.x86_64 How reproducible: 100% Steps to Reproduce: 1. Check in to the Comfort Inn on Sanderson, in Raleigh NC 2. Select to connect to comfortinn on your wifi network. 3. Watch the errors. Actual results: Errors stream over and over, you never connect to the wifi network. Expected results: A fairly harmless warning is added to the system log, and the connection proceeds as normal. Additional info: I tested under Windows, and I was able to connect to the comfortinn wifi, so this is not a hardware issue. However, it could be a limitation in the linux driver for the hardware.
Created attachment 529949 [details] Abort log. I tried to submit this with report-gtk, but that utility consistently failed.
Created attachment 530432 [details] patch that disables the abort This is a patch that simply disables returning the error. I'm using the wifi connection right now with absolutely no noticeable problems to submit this bugzilla. So this proves that the messages really should be warnings. Possibly though, this patch is not the correct way to solve the problem. As it probably is reasonable for the function call to return a status that indicates there is a *potential* problem. But once that status is returned it is incorrect to do an abort and generate a stack trace, rather than to still try connecting. What is the correct site to submit this problem upstream?
Just in passing I mentioned this problem to two people at the office yesterday. Of the two people I mentioned it to one said he saw the same problem with some of the wifi connections he has tried using. So based on that anecdotal evidence I would say the problem is actually fairly common.
Koji build in progress: http://koji.fedoraproject.org/koji/taskinfo?taskID=3465467 http://koji.fedoraproject.org/koji/taskinfo?taskID=3465466 When I installed version of the kernel I built with mock, it gave me all sorts of warnings about missing firmware. Since I'm not using any of the hardware it was warning about, I can probably safely ignore those warnings. But still I'm hoping the koji build does not have the same problem so others can use kernel build.
Short slot is an 802.11g feature, while CCK is an 802.11b modulation. So the presence of both would seem to be a contradiction. That said, I'm not sure what would cause that indication or what it would really mean. Your experience suggests that the indication in the RXON command can be ignored. Wey-yi, the current upstream code seems about the same as the code Bill is patching. Obviously there are problems with that particular patch, but perhaps there is something we can learn here to apply upstream?
Oops, it looks like the reason I had to comment all the errors is I was referencing the wrong block. The block of code where the failure occurs is: if (le16_to_cpu(rxon->assoc_id) > 2007) { IWL_WARN(priv, "aid > 2007\n"); errors |= BIT(6); } So the association id is what causes the problem. So chances are if I were just to comment this one line of code I would achieve equally positive results. I guess the thing to understand is why ignoring the associate id works, and if there is a less restrictive test that could be used that would detect only the instances when this really would cause an abort later in the code anyway.
(In reply to comment #5) > Short slot is an 802.11g feature, while CCK is an 802.11b modulation. So the > presence of both would seem to be a contradiction. That said, I'm not sure > what would cause that indication or what it would really mean. Your experience > suggests that the indication in the RXON command can be ignored. > Wey-yi, the current upstream code seems about the same as the code Bill is > patching. Obviously there are problems with that particular patch, but perhaps > there is something we can learn here to apply upstream? Yes, for sure it is issue in the code, sorry about it and I will make sure it is being addressed. Thanks Wey
This information from the kern.log file will probably help: Oct 27 18:02:34 briemersw kernel: [10717.032031] wlan0: RX AssocResp from 5c:0e: 8b:85:e7:20 (capab=0x401 status=0 aid=16383) 16383 > 2007 which is why the test is failing. I was searching through the code to try and figure out how this value is actually used, and but I couldn't find anything other that looked relevant.
BTW. I've been discussing this with Johannes Berg via e-mail, since his e-mail is listed in the code. Otherwise, I would not have known to look for the assoc_id in the kernel logs, or have realized the bits are counted from 0 not 1, so but 6 is 0x40 not bit 7.
I've been tracing through the ieee80211 code to try and determine how the connection can work with a bogus value. First off I notice an error in the test. The value it should compare to is 0x2007, not 2007. e.g. ieee80211_softmac.c: assoc->aid = cpu_to_le16(ieee->assoc_id); if (ieee->assoc_id == 0x2007) ieee->assoc_id=0; else ieee->assoc_id++; This of course only has an impact if softmac is used. Is that always true for wireless? Next, I see when the value is actually used it is not all the bits: hdr->aid = cpu_to_le16(ieee->assoc_id | 0xc000); So in this case a value of 16383 has 1 added to it and becomes 16384 = 0x4000. When this is assigned to the header it becomes: 0x4000 | 0xc000 => 0xc000 e.g. Equivalent to what would have been used with a assoc_id value of 0x2007. So a better test might be: if ( (le16_to_cpu(rxon->assoc_id) != 0x2007) && ((le16_to_cpu(rxon->assoc_id)+1)&0x3fff > 0x2007) )
I guess we should keep it on the bug ... As I said to Bill in email, he was looking at the wrong code (ieee80211_softmac.c AP side code? where does that even exist?). I sent a patch to mac80211 to make it not send down invalid AID values and disable powersave since there's no way PS can work with this bogus AID. http://mid.gmane.org/1319795987.8931.7.camel@jlt3.sipsolutions.net
I'm doing a build of Johannes' patch right now. http://koji.fedoraproject.org/koji/taskinfo?taskID=3468180 I'll be checking out of my hotel in a few minutes, but maybe I can come back to the hotel lobby to test it at lunch time. If not, I have a college that has been experiencing a similar sounding problem. If it turns out to be the same problem he should be able to test it.
It looks like I rebuilt the wrong source RPM. I should have added a build number or such to it, so I could tell the difference...
I just built the correct rpm and I'm posting with it now. I was not 100% positive my college was seeing the same problem, as my patch to comment out the errors had commented out all errors. So I changed the value 2007 to 1, so my home network would produce the same error. It looks like the patch successfully resolves the issue.
Created attachment 531239 [details] Johannes Berg's patch to disable powersave and reset aid value Johannes Berg's patch to disable powersave and reset the aid value to 0 when it is greater than the maximum allowed value.
Thanks Bill!
Koji builds: x86_64: http://koji.fedoraproject.org/koji/taskinfo?taskID=3480131 i386: http://koji.fedoraproject.org/koji/taskinfo?taskID=3480143
This was fixed in 3.2.