Bug 785561

Summary: iwlwifi is broken in kernels newer than 3.1.0-7 (dataloss)
Product: [Fedora] Fedora Reporter: shamim.islam
Component: kernelAssignee: Stanislaw Gruszka <sgruszka>
Status: CLOSED CURRENTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 16CC: bowe, davej, gansalmon, gfdsa, i.grok, itamar, jlayton, jnansi, jonathan, j.romildo, jruemker, kengert, kernel-maint, madhu.chinakonda, matt, mhuhtala, redhat, ricardo.arguello, satellitgo, sergio, sgruszka, turchi, twaugh, wey-yi.w.guy
Target Milestone: ---Keywords: Regression, Reopened
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-06-09 05:19:49 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
tid check
none
fix rekey
none
iwlwifi with auto_agg=0 wd_disable=1 kernel oops
none
iwlwifi oops requiring module unload/load to resolve...
none
iwlwifi oops during wakeup from suspend none

Description shamim.islam 2012-01-29 15:15:21 UTC
Description of problem:
iwlwifi drive connects to 802.11N with WPA2 Personal
within 10-15 minutes ( or longer if less network traffic ), wireless connection will no longer lookup names or ping or connect with any IP address other than it's own (127.0.0.1 and wlan0 address)
"rmmod iwlwifi" followed by "modprobe iwlwifi" restores all wireless capabilities.


ASUS X53S/K53SV Core i7 
Intel Corporation Centrino Wireless-N 100 (0280:8086:08ae)

Version-Release number of selected component (if applicable):
kernel-3.2.2.-1.fc16.x86_64

How reproducible:
Simply login, connect, surf web and voila

Steps to Reproduce:
1. Start Fedora
2. Open Mozilla or any other web browser
3. Start browsing multiple sites that require DNS lookup (makes the connection drop faster)
4. Connection stops working.
  
Actual results:
"Looking up" message for host name provided and eventually times out

Expected results:
Actual site page will load

Additional info:
/etc/modprobe/iwlwifi.conf containing "options iwlwifi 11n_disable=0" did not help.
"rmmod iwlwifi" followed by "modprobe iwlwifi" restores services.

Comment 1 wey-yi.w.guy 2012-01-30 16:09:23 UTC
Created attachment 558393 [details]
tid check

is this patch from Emmanuel help?

Thanks
Wey

Comment 2 John W. Linville 2012-01-30 18:18:43 UTC
Test kernel w/ above patch building here:

http://koji.fedoraproject.org/koji/taskinfo?taskID=3746937

Please give them a test and report the results here -- thanks!

Comment 3 shamim.islam 2012-01-31 05:28:22 UTC
Seems to be working - what was the fix?

Comment 4 shamim.islam 2012-01-31 14:27:43 UTC
Ok - not really working. What seems to be happening now is that the connection stops working and then restores itself. It's stuttering. If I increase the throughput load to watching a few videos, eventually the DNS lookup for the video site alone fails. When I hit F5 to refresh, the site lookup works (because the connection has restored itself). When I watch the load of pages, I can see the stuttering where there are long pauses between different stages of the page load (Looking up, Connected, Waiting for).

I have a 20 MBps connection (speedtest.net) and on my HP machine, I have no stuttering. Also on the Windows 7 platform, I did not have any stuttering on the X53S.

I now do not have to rmmod/modprobe, but it's still running a lot slower than my connection would suggest it should.

Thanks

Comment 5 Tim Waugh 2012-01-31 14:45:33 UTC
I've been using the test build from comment #2 for 3 hours 30 mins now, including heavy throughput for over 1 hour of that.  No problems seen here.

Comment 6 shamim.islam 2012-01-31 15:10:35 UTC
Tim,

I have been running it for the past 12h. I've seen the slow down pattern since I have my HP sitting right next to the X53S. It has to be heavy throughput with lots of changing sites. The slowdown is seen in the DNS lookup when the connection stops/slows transitions (Looking/Connected/Waiting) and then restores itself. Can you run a similar comparison?

Thanks.

Comment 7 wey-yi.w.guy 2012-01-31 16:10:12 UTC
(In reply to comment #3)
> Seems to be working - what was the fix?

there is a issue being introduced and cause the mix of QoS and non-QoS frames. this patch fix that.

Wey

Comment 8 John W. Linville 2012-01-31 18:41:17 UTC
It's possible that Tim is really seeing the issue from bug 785239 and that
Shamim is seeing something different (or additional)...?

Comment 9 shamim.islam 2012-02-11 04:29:17 UTC
There is definitely something going on with the iwlwifi driver. I have my HP laptop (Phenom II 3GHz) using the ath9k driver running the same kernel as my X53 ASUS (Core i7, 2.9GHz w/ Turbo) using the iwlwifi driver.

Both are connecting to the same AP. iwlwifi connects at 54MB max and drops to 1MBit regularly. Ath9k reports a steady 150MBit connection. The AP is configured for 802.11N (WPA-PSK).

Speedtest.net shows similar results. the ath9k downloads at around 20-22MB, and uploads around 1.5MB. The iwlwifi downloads at 5-6MB, and uploads at 1.5MB. The upload speed is artificially throttled by my cable company. The download speeds should be identical.

So now what? Any additional information I can provide that can help track down the problem? The iwlwifi should not be connecting so slow for sure.

Comment 10 shamim.islam 2012-02-11 04:38:26 UTC
P.S. I have upgraded to kernel 3.2.5-3.fc16.x86_64 on both machines. Any suggestions?

Comment 11 wey-yi.w.guy 2012-02-11 17:28:35 UTC
Shamim,

first, please check if you have the "11n_disable=1" module parameter, if so, you will not seeing 11n throughout

Second, what throughput you are seeing with open security? (do not use WPA-PSK)

Also, there is a patch address the security rekey issue. I will attach here, please give a try.

Thanks
Wey

Comment 12 wey-yi.w.guy 2012-02-11 17:29:31 UTC
Created attachment 561116 [details]
fix rekey

Comment 13 shamim.islam 2012-02-12 02:01:29 UTC
Umm. Any chance I can check thruoghput by doing something OTHER than opening up my AP? My ex-gf trashed my office and I really can't isolate my AP from the rest of my network without stepping on broken glass right now. And I'm way too paranoid to open up an AP w/o isolating it first.

I don't think I have the 11n_disable=1 anywhere - I did use a 11n_disable=0 if I remember correctly - should I remove that?

And lastly - umm - no idea how to test the patch . I don't have a kernel development toolchain. :(

Sorry I'm not as capable as I'd like to be. I'm more a linux admin than a linux developer. 

Suggestions?

Comment 14 John W. Linville 2012-02-13 15:19:55 UTC
*** Bug 789595 has been marked as a duplicate of this bug. ***

Comment 15 John W. Linville 2012-02-13 18:22:21 UTC
Test kernels with patch from comment 12 are building here:

http://koji.fedoraproject.org/koji/taskinfo?taskID=3787164

When the build finishes, please give them a try and post the results here...thanks!

Comment 16 shamim.islam 2012-02-17 16:23:57 UTC
Sorry - I've been away for a couple of days. When I click the link, I don't see where I can download the kernel to test. Apologies if I'm asking a dumb question.

Comment 17 John W. Linville 2012-02-17 18:19:49 UTC
You follow the link for you architecture (probably x86_64).  Then you find the link for the kernel RPMs there.

Comment 18 Bowe Strickland 2012-02-21 16:45:46 UTC
Created attachment 564749 [details]
iwlwifi with auto_agg=0 wd_disable=1 kernel oops

Comment 19 Bowe Strickland 2012-02-21 16:50:15 UTC
FWIW, i've had same iwlwifi headaches as rest of world. 

on uptodate F16 (3.2.5-3.fc16.x86_64), i've tried following:


>> [root@catbus ~]# head /etc/modprobe.d/local.conf 
>> options iwlwifi auto_agg=0 wd_disable=1

life has been almost perfectly stable (no more "Queue 10 stuck for 2000 ms." and
"On demand firmware reload" as before), but did have a new symptom: the attached kernel oops, when network stalled for about 20 secs but then reset on it's own.

Comment 20 Bowe Strickland 2012-02-22 13:27:49 UTC
I've been running the kernel from comment 15 now for about 20 hours, and iwlwifi has been stable.

Comment 21 shamim.islam 2012-02-22 18:31:14 UTC
iwlwifi core i7: Test kernel gives me 5MB/s. 3.2.6-3 gives me 2.5 MB/s. 
ath9k AMD Phenom II: 3.2.6-3 gives me 18 MB/s.

What does the auto_agg and wd_disable do?

Comment 22 shamim.islam 2012-02-22 18:36:35 UTC
Nm. Adding auto_agg=0 and wd_disable=1 slows down my network connection to molasses.

0.6 MB/s

Comment 23 Kai Engert (:kaie) (inactive account) 2012-02-23 12:57:51 UTC
I cannot successfully use "hg clone" of a large tree when connecting using iwlwifi. I also experienced data corruption shown by "hg verify", not sure how the wifi could have caused this.

I was successful (no problem) using the original 3.1.0-7 kernel using a F16 live DVD.

With all later kernels that I tried, 3.1.10, 3.2.1, 3.2.5, 3.2.7, I have the problem.

iwlwifi, Centrino Ultimate-N 6300 (rev 35)

Comment 24 shamim.islam 2012-02-25 08:16:37 UTC
So maybe I should go back to the 3.1.0-7 kernel. If it solves problems, I'm all for it. Any issues using vmplayer with 3.1.0-7??

I see a ray of light using the 3.2.7-1 - I am getting 5MB/s throughput downloading and 1.0MB/s upload. At least that's the same as the bz kernel I tried.

I will attempt kernel-3.1.0-7 and report back.

Comment 25 shamim.islam 2012-02-25 08:35:14 UTC
Confirmed. Kernel 3.1.0-7 performs PERFECTLY with iwlwifi on my X53SV core i7.

Download: 18.68MB/s
Upload: 1.87MB/s

As expected. Only outperformed by my ath9k on my HP Phenom II 3GHz. (21.95 MB/s upload, 1.89 MB/s download).

Can someone please shed some light on what really changed after kernel 3.1.0-7 or when the change took place that is causing all this weirdness?

Thanks

Comment 26 Kai Engert (:kaie) (inactive account) 2012-02-25 11:45:31 UTC
Unfortunately going back to that older kernel isn't an option for me, because it has other issues, such as a one minute delay each time I boot up.

As a temporary measure, until iwlwifi is fixed, because I need 5ghz with my home network I've ordered an external USB Wifi adapter, Trendnet TEW-664UB, which works out of the box with Fedora 16 (rt2800usb), and works stable for me.

We should download both old and new kernel sources, and diff the iwlwifi driver code.

Comment 27 Kai Engert (:kaie) (inactive account) 2012-02-25 12:12:44 UTC
(In reply to comment #26)
> We should download both old and new kernel sources, and diff the iwlwifi driver
> code.

I did. Unfortunately the differences are huge :/

The 3.1 kernel has the iwlwifi driver sources only in a single place.

The 3.2 kernel has it in two places, in addition in a directory named compat-wireless-3.3-rc1-2

Looking at the strings contained in the binary iwlwifi.ko module 
  strings iwlwifi.ko  |grep -i iwlwifi
contained in the 3.2.7 kernel it appears that:

The iwlwifi module in the 3.2.7 was built using directory
  kernel-3.2.fc16/compat-wireless-3.3-rc1-2/drivers/net/wireless/iwlwifi

NOT using
  kernel-3.2.fc16/linux-3.2.i686/drivers/net/wireless/iwlwifi

Comment 28 Kai Engert (:kaie) (inactive account) 2012-02-25 13:02:13 UTC
After further reading of the spec, it seems to be expected that the code in compat-wireless is used.

Using the link in comment 15, I can no longer find RPMs. I assume they have already been deleted automatically. I'm trying the create a kernel myself with the patch from comment 12 applied to help testing.

Comment 29 Kai Engert (:kaie) (inactive account) 2012-02-25 14:12:41 UTC
I confirm the patch from comment 12 fixes the problems on my system (applied to the latest kernel 3.2.7-1.fc16 )

I propose to urgently push this patch as an update.

This bug causes not only slowness, I got all sorts of data corruption when working with network data. Applications assumed they received data from the network correctly, but obviously worked with broken/corrupted data.

Comment 30 shamim.islam 2012-02-25 14:49:30 UTC
Having run 3.1.0-7 for the past 6h, I again attempted the speed test.

Results for the core i7 iwlwifi are:
12.93 Mbps down
1.91 Mbps Up
16.66 Mbps down
1.90 Mbps up
14.27 Mbps down
1.91 Mbps up

Which is on par with what the ath9k on the Phenom II 3Ghz is reporting:
20.96 Mbps down
1.96 Mbps up
13.62 Mbps down
1.89 Mbps up
15.66 Mbps down
1.90 Mbps up

The first test was done on the ath9k, 3 on the iwlwifi, followed by 2 more on the ath9k.

I am going to redownload the test kernel from comment 12 from the link provided and see what kind of results I get.

Now that I know what is possible, I have a better reference model.

Comment 31 Kai Engert (:kaie) (inactive account) 2012-02-25 15:00:57 UTC
(In reply to comment #30)
> I am going to redownload the test kernel from comment 12 from the link provided
> and see what kind of results I get.

That test kernel might have been expired, I couldn't find it anymore.

Comment 32 shamim.islam 2012-02-25 15:49:46 UTC
I had a copy on a USB stick.

uname -r: 3.2.5-3.bz785561.1.fc16.x86_64

Speed test results:
20.08 Mbps down
1.89 Mbps up
19.68 Mbps down
1.89 Mbps up
22.51 Mbps down
1.88 Mbps up
19.38 Mbps down
1.90 Mbps up

I *am* seeing a pattern that look similar to the ath9k driver in the transmission rate during the test itself so to me this is good news.

I will test again in 6h or so and see what I get.

Comment 33 wey-yi.w.guy 2012-02-25 17:56:10 UTC
(In reply to comment #28)
> After further reading of the spec, it seems to be expected that the code in
> compat-wireless is used.
> Using the link in comment 15, I can no longer find RPMs. I assume they have
> already been deleted automatically. I'm trying the create a kernel myself with
> the patch from comment 12 applied to help testing.

the patch in comment 12 is merged into wireles-testing tree already
http://git.kernel.org/?p=linux/kernel/git/linville/wireless-testing.git;a=summary

Wey

Comment 34 shamim.islam 2012-02-25 18:07:12 UTC
Clearing flags for the moment.

Comment 35 shamim.islam 2012-02-25 18:09:41 UTC
Can you please tell us when/which kernel release this ends up in? Currently, I have version locked my kernel files to 3.1.0-7. Hopefully the performance I've seen continues.

Thanks.

Comment 36 wey-yi.w.guy 2012-02-25 18:46:24 UTC
John merge this patch into wireless-testing tree 3 days ago

author Johannes Berg <johannes.berg>  
 Fri, 17 Feb 2012 17:47:14 +0000 (09:47 -0800) 
committer John W. Linville <linville>  
 Tue, 21 Feb 2012 19:45:26 +0000 (14:45 -0500) 
commit 5dcbf480473f6c3f06ad2426b7517038a2a18911 
tree 66d2cbefee018ff46d499e0aeab573aa94558353 tree | snapshot 
parent 7be081539e540517d5e1fcbf96b8080074afbf08 commit | diff 

iwlwifi: fix key removal

When trying to remove a key, we always send key
flags just setting the key type, not including
the multicast flag and the key ID. As a result,
whenever any key was removed, the unicast key 0
would be removed, causing a complete connection
loss after the second rekey (the first doesn't
cause a key removal). Fix the key removal code
to include the key ID and multicast flag, thus
removing the correct key.

Cc: stable.org
Reported-by: Alexander Schnaidt <alex.schnaidt>
Tested-by: Alexander Schnaidt <alex.schnaidt>

Comment 37 John W. Linville 2012-02-27 14:41:35 UTC
And I added it to the f16 kernel on Friday as well.  I expect that it will be part of the next f16 build and subsequent update.

Comment 38 shamim.islam 2012-02-27 15:14:21 UTC
Houston, we have a problem. I've been using the 3.2.5-3 from comment 12 quite heavily, and I'm seeing something worrisome. Throughput seems to be fine, but I am still getting hiccoughs. My N-router is solid. It never goes down.

But over the period of using 3.2.5-3 solid with VMWare on top and heavy mail traffic over VPN, my VPN (which NEVER goes down when used the same way on the ath9k), has dropped over and over every 10 minutes, and then eventually, my iwlwifi refused to connect to my N-only router.

I am now going back to 3.1.0-7. As it stands, I can't use the 3.2.5-3 test kernel solution.

My real question is: if it wasn't broken, what did we gain by making the changes we did? Was there something the changes fixed?

Thanks.

Comment 39 shamim.islam 2012-02-27 15:16:50 UTC
3.2.5-3 from comment 12 is not production ready. There is still a significant problem.

Comment 40 Bowe Strickland 2012-02-27 18:57:15 UTC
Created attachment 566120 [details]
iwlwifi oops requiring module unload/load to resolve...

iwlwifi oops during normal use

Comment 41 Bowe Strickland 2012-02-27 18:58:25 UTC
Created attachment 566121 [details]
iwlwifi oops during wakeup from suspend

this oops happened during wake up from suspend, rebooted to resolve.

Comment 42 Bowe Strickland 2012-02-27 18:59:48 UTC
although candidate kernel was vast improvement, i've been collecting oops with candidate kernel as well, about once a day over the weekend, attached.

other than these, performance has been good.

Comment 43 Bowe Strickland 2012-02-27 19:01:50 UTC
re: comment 21: auto_agg=0 and wd_disable=1 are probably red herrings.  they were my fumbling attempts to regain stability...

Comment 44 wey-yi.w.guy 2012-02-27 20:14:34 UTC
"auto_agg=0" : do not start aggregation until traffic reach the pre-defined threshold

"wd_disable=1" : do not reload firmward when driver detect firmware is not responing for a period of time

Comment 45 shamim.islam 2012-02-27 20:26:07 UTC
So it seems like I'm wondering why we changed from iwlwifi in 3.1.0-7 which worked to something that isn't working and how we managed to get here. Not sure if anyone else is. But for the moment, in production, I can only use 3.1.0-7.

What problem are we solving by making the changes? Just wondering aloud.

Comment 46 John W. Linville 2012-02-27 20:37:20 UTC
"It works for me, so why change anything that might help someone else?"

New hardware happens.

Comment 47 shamim.islam 2012-02-27 20:47:43 UTC
Ahhh - it was a new hardware thing. Got it. Just wondering if we had something driving the change. Guess we'll see if we can resolve the problem then. Looking forward to the next test kernel. Let me know.

Comment 48 shamim.islam 2012-02-28 04:34:01 UTC
Ok - found another interesting tidbit. Sleep/hibernate causes weirdness to take place - hibernate in particular. Sleep not so much. When I come back from hibernate, I have to rmmod and modprobe the driver. So to help thing along, for iwlwifi, I've cobbled together the following to be placed in /etc/pm/sleep.d/30_iwlwifi - not sure what else can be done or if this is something that can be fixed in the driver.

Thoughts and suggestions on how to handle it in the driver would be appreciated - this is truly a kludge since I don't know for sure which iwlwifi drivers are loaded.

Also, in 3.1.0-7, the driver is iwlagn.ko. Later kernels have iwlwifi.ko.

#!/bin/sh
# File: "/etc/pm/sleep.d/30-iwlwifi".
CMD="$1"
REV=`uname -r`
ls -1 /lib/modules/$REV/kernel/drivers/net/wireless/iwlwifi/ | while read FILE; do
        BASE=`basename "$FILE"`
        IWLWIFI=`echo $BASE | sed "s/.ko$//g"`
        echo $IWLWIFI
        case "$CMD" in
                hibernate|suspend)
                        # remove iwlwifi items
                        rmmod $IWLWIFI
                        ;;
                resume|thaw)
                        # add iwlwifi items
                        modprobe $IWLWIFI
                        ;;
        esac 
done
# End of File: "/etc/pm/sleep.d/30-iwlwifi".

Comment 49 Mikko Huhtala 2012-03-02 15:15:12 UTC
I'm using the latest 3.2 kernel from Koji, 3.2.9-1.fc16.x86_64, which includes a patch for the key exchange. I just got another dropped connection to the wireless box. The connection came back up after reloading the iwlwifi module and restarting NetworkManager. The wireless chip is
 
05:00.0 Network controller [0280]: Intel Corporation PRO/Wireless 5100 AGN [Shiloh] Network Connection [8086:4237]

in a Lenovo Thinkpad SL510. So, alas, the problem is still there.

Comment 50 Euloiix 2012-03-04 02:33:44 UTC
Hi there, 

I am having the same issue: connexion freezes randomly after sometime with the same configurations (iwlwifi drivers, Intel 5100). 

I am willing to try the patches provided and do my part to help but have to spend some time on the doc first as, even if I basically know  how to do it, I am not yet very clear on how to play with kernel versions and patches. 

But I had one question I wanted to ask, even if I realize here is not a forum and may not be the best place to do so: 
I tried to install fedora 15 as I have read the problem appeared from kernel 3.2 and onwards, but I still had the issue. 

Does someone can explained to me why, as nearly everyone here seems to be able to revert back without having the wireless problem anymore.

Comment 51 shamim.islam 2012-03-04 02:55:04 UTC
From what's been explained to me, new hardware required the changes that were put into iwlwifi.ko. The older kernel used a different filename without the changes for the new hardware, in the driver called iwlagn.ko. Somewhere along the 3.2.x kernel series, the iwlagn was replaced. I happened to install Fedora 16 after the kernel 3.2.x that started causing the problem. Since someone pointed out that 3.1.0-7 had no problems, I reverted back to it and voila.

The kernel runs the CPU and provides services for the OS. There are incremental improvments (which those of us using 3.1.0-7 in this forum) are not able to take advantage of. But for those of us still using 3.1.0-7. it is because the wireless stability is of more importance than discovering what the missing features are. We are hoping to get a kernel that has a stable iwlwifi that has good throughput like the 3.1.0-7 so that we can all quit following this bug, upgrade to that kernel and go on with our lives. :)

Comment 52 Euloiix 2012-03-06 18:57:48 UTC
Thanks for the tips and explanations. 
I'll try again then, but I was quite sure to still have the problem, even after having reverted back to 3.1.something.

Comment 53 Euloiix 2012-03-06 19:37:43 UTC
When we say: revert back to 3.1.0-7, we mean select this kernel in grub, right ? 
Because when I do so, I still have the Wifi deconnexion problem.
I get nothing in dmesg. 

Here is the output of lshw:


~]$ lshw -c Network
  *-network
       description: Interface réseau sans fil
       produit: WiFi Link 5100
       fabriquant: Intel Corporation
       identifiant matériel: 0
       information bus: pci@0000:05:00.0
       nom logique: wlan0
       version: 00
       numéro de série: 00:24:d6:42:e1:a8
       bits: 64 bits
       horloge: 33MHz
       fonctionnalités: bus_master cap_list ethernet physical wireless
       configuration: broadcast=yes driver=iwlagn driverversion=3.1.0-7.fc16.x86_64 firmware=8.83.5.1 build 33692 ip=192.168.0.100 latency=0 multicast=yes wireless=IEEE 802.11abgn
       ressources: irq:49 mémoire:d6d00000-d6d01fff


Am I having a different bug or ... ??

Comment 54 shamim.islam 2012-03-07 03:41:43 UTC
Not sure. All of so far in this forum seem to be reporting marginal success with 3.2.5-3 with some issues, and 100% success with 3.1.0-7. I'm not the expert though.

Comment 55 Stanislaw Gruszka 2012-03-08 19:58:30 UTC
shamim, 3.1.0-7 is latest working kernel, first broken is 3.1.1-1 (
http://koji.fedoraproject.org/koji/buildinfo?buildID=273752) ? Or regression happen between 3.1.x and 3.2.x ? If regression happen between 3.1.a and 3.1.b it would be rather easy to find patch that broke. Regression between 3.1.x and 3.2.x would be much harder to resolve.

Comment 57 Kai Engert (:kaie) (inactive account) 2012-03-16 18:27:58 UTC
I just spent a lot of time testing the various kernels.
It's unfortunate that my test case requires quite a lot of data to be transfered for the error to show up (hg clone mozilla).

(I tried to interrupt my testing of the 1.4 GB tree early, after 200 GB, in the hope that 200 GB would mean a success, but that was a wrong conclusion. I ended up testing a lot of kernels because of my incorrect assumption...)

Anyway, here are the results:

I reconfirmed that the new kernel 3.2.9-2.fc16 works for me (a full 1.4 GB hg clone succeeded).

I tried several kernels from comment 55 and comment 56. They are all broken.

I tested
- 3.1.1
- 3.1.2
- 3.1.4
- 3.1.5
- 3.1.6
which are all broken ("hg clone" aborts reporting inconsistency).

Then I tried another test, which has a confusing result.
I found kernel 3.1.0-7 at 
  http://koji.fedoraproject.org/koji/buildinfo?buildID=271832
and I believe that's the one included in the Fedora 16 live DVD.

To my surprise, 3.1.0-7 is broken, too!

This means in summary:

- using a F16 live DVD with 3.1.0-7 I had success

- using my installation of F16 with all updates installed,
  I get a failure with 3.1.0-7

- I also get a failure with any later kernels, up to 3.2.7

- 3.2.9 works fine for me.


What could potentially explain the different test results between 3.1.0-7 when using live DVD or installed up-to-date system? Maybe a different firmware?

Comment 58 Kai Engert (:kaie) (inactive account) 2012-03-16 18:30:03 UTC
When using the live DVD I used whatever 32 bit kernel flavor was the default.

With the installed F16, I always used the PAE kernel.

Comment 59 Mikko Huhtala 2012-03-17 06:52:05 UTC
@Kai Engert

Are you using 3.2.9 for your daily life? I couldn't find a test case like a hg clone, the connection would just drop seemingly at random after a while. Sometimes I was able to download several gigabytes of stuff over a period of hours before it happened. 3.2.9 seemed to work better than the preceding 3.1 kernels, i.e. the connection would get dropped later, but it did get dropped. I didn't test this systematically, however. Anyway, I have not got a single dropped connection with 3.1.0-7, and this is after a couple of weeks of daily use. All of this is on x86_64. Is there a difference between 32-bit and 64-bit systems? I haven't run 32-bit F16 on the problem machine at all.

Comment 60 Kai Engert (:kaie) (inactive account) 2012-03-18 09:37:41 UTC
> Are you using 3.2.9 for your daily life? 

Yes

Comment 61 Euloiix 2012-03-20 13:55:01 UTC
As Kai, I had problems with 3.1.0-7 and have none with 3.2.9 which I have been using for nearly a week now. 
I think I have been deconnected once in the entire week, and taht was during the first hour. 

Thank you very much for the quick fix, even if it seems not fully effective yet for everyone.

Comment 62 Dave Jones 2012-03-22 16:52:33 UTC
[mass update]
kernel-3.3.0-4.fc16 has been pushed to the Fedora 16 stable repository.
Please retest with this update.

Comment 63 Dave Jones 2012-03-22 16:56:27 UTC
[mass update]
kernel-3.3.0-4.fc16 has been pushed to the Fedora 16 stable repository.
Please retest with this update.

Comment 64 Dave Jones 2012-03-22 17:07:18 UTC
[mass update]
kernel-3.3.0-4.fc16 has been pushed to the Fedora 16 stable repository.
Please retest with this update.

Comment 65 Kai Engert (:kaie) (inactive account) 2012-03-22 18:35:52 UTC
3.3.0-4.fc16.i686.PAE looks good to me, my test "hg clone mozilla" succeeded.

Comment 66 shamim.islam 2012-03-23 15:03:56 UTC
Hi all. I don't know about the i686 kernels or the PAE kernels. I've only been using the x86_64 version. To me a kernel is broken, not only if my connection drops but also if the connection SpeedTest.net is significantly below my comparable HP.

Machines in question:
HP AMD Phenom II 3GHz 8GB Ram 1TB HDD
Asus X53SV core i7 2630 QM 2.0-2.9GHz 8GB Ram 1TB HDD

I stopped testing at 3.2.9-1 when my speed dropped to less than half of my Phenom machine. On my 64-bit 3.1.0-7, I get 75%-80% throughput of my HP which is usable. I have not seen any kernel provide more than that throughput.

I will try the 3.3.0-4 and report back. Just please keep in mind broken is not just dropped connections but also slowdowns.

Comment 67 shamim.islam 2012-03-23 20:05:47 UTC
I am unable to test kernel 3.3.0-4 since it will not let me compile any kernel modules. And the maintainer responsible for assigning the problem didn't read the problem report far enough to determine that the reason my NVIDIA drivers won't install is that the kernel-headers or kernel-devel or even the kernel RPM is missing files. I have verified that kernel module compilation is not working in vmplayer either. Anyone that has asked about the missing files has been misdirected to install src.rpm files and create a custom kernel, find a precompiled kernel module instad of using dkms (akmod) or has been ignored.

I will test as soon as the maintainer realizes there is a problem with kernel 3.3.0-4 or everyone with a module compilation error in 3.3.0-4 boycotts the kernel until it is obsoleted.

Looking forward to when I can test.

Comment 68 John Ruemker 2012-03-24 23:38:49 UTC
3.3.0-4 is still demonstrating frequent firmware reloads and terrible performance for me

03:00.0 Network controller: Intel Corporation Centrino Ultimate-N 6300 (rev 35)

[  460.854378] iwlwifi 0000:03:00.0: Queue 2 stuck for 2000 ms.
[  460.854383] iwlwifi 0000:03:00.0: Current SW read_ptr 26 write_ptr 42
[  460.854435] iwlwifi 0000:03:00.0: Current HW read_ptr 26 write_ptr 42
[  460.854438] iwlwifi 0000:03:00.0: On demand firmware reload
[  460.854853] ieee80211 phy2: Hardware restart was requested
[  460.854927] iwlwifi 0000:03:00.0: L1 Enabled; Disabling L0S
[  460.855105] iwlwifi 0000:03:00.0: Radio type=0x0-0x3-0x1
[  675.067735] iwlwifi 0000:03:00.0: Queue 2 stuck for 2000 ms.
[  675.067740] iwlwifi 0000:03:00.0: Current SW read_ptr 128 write_ptr 140
[  675.067792] iwlwifi 0000:03:00.0: Current HW read_ptr 128 write_ptr 140
[  675.067795] iwlwifi 0000:03:00.0: On demand firmware reload
[  675.068192] ieee80211 phy2: Hardware restart was requested
[  675.068260] iwlwifi 0000:03:00.0: L1 Enabled; Disabling L0S
[  675.068452] iwlwifi 0000:03:00.0: Radio type=0x0-0x3-0x1
[  758.689344] iwlwifi 0000:03:00.0: Queue 2 stuck for 2000 ms.
[  758.689348] iwlwifi 0000:03:00.0: Current SW read_ptr 86 write_ptr 92
[  758.689399] iwlwifi 0000:03:00.0: Current HW read_ptr 86 write_ptr 92
[  758.689401] iwlwifi 0000:03:00.0: On demand firmware reload
[  758.689803] ieee80211 phy2: Hardware restart was requested
[  758.689866] iwlwifi 0000:03:00.0: L1 Enabled; Disabling L0S
[  758.690033] iwlwifi 0000:03:00.0: Radio type=0x0-0x3-0x1
[  759.252619] iwlwifi 0000:03:00.0: Queue 2 stuck for 2000 ms.
[  759.252623] iwlwifi 0000:03:00.0: Current SW read_ptr 0 write_ptr 1
[  759.252675] iwlwifi 0000:03:00.0: Current HW read_ptr 0 write_ptr 1

Comment 69 shamim.islam 2012-03-27 02:58:46 UTC
Confirmed. Kernel 3.3.0-4 is broken on my core i7 2730 QM (x86_64).

Comment 70 Matt Kinni 2012-03-30 13:28:46 UTC
i've got the same kernel, the same network card and the same dmesg log output

(In reply to comment #68)
> 3.3.0-4 is still demonstrating frequent firmware reloads and terrible
> performance for me
> 
> 03:00.0 Network controller: Intel Corporation Centrino Ultimate-N 6300 (rev 35)
>

Comment 71 shamim.islam 2012-04-02 04:06:49 UTC
A new kernel is out and we're still hosed.

Kernel 3.3.0-8 is broken on my core i7 2730 QM (x86_64).

Kernel 3.3.0-8 runs at 75% capacity for the wifi link compared to my 3.1.0-7.

Kernel 3.1.0-7 runs at 98% capacity for the wifi link compared to my HP Phenom II.

I'm going back to kernel-3.1.0-7. 

Is it time to maybe figure out that the re-implementation of the wifi driver is in error and that what ever new features were added should be compartmentalized so that it does not affect the original design/throughput?

In other words, when do we throw in the towel for a buggy revamp and start over?

We've gone through so many kernels with no change in performance that even comes close to matching 3.1.0-7.

Shouldn't we take the best lessons of 3.1.0-7 and implement based on those principles instead of continuing this way?

No offence to anyone. I'm just asking the question because I don't understand how we can be broken for so long, and so many of us have to use the old kernel and it's not a problem that we're trying to solve the wrong way.

Thanks.

Comment 72 Stanislaw Gruszka 2012-04-02 07:53:18 UTC
Perhaps you should complain directly to Intel (ilw.com or http://bugzilla.intellinuxwireless.org/). On Fedora we provide driver that we get from upstream, which is developed/maintained by Intel.

Comment 73 Euloiix 2012-04-07 19:03:14 UTC
I am very sorry to read the bug is following you up through the kernel updates. 
But for me for example, the kernel 3.1.0-7 was so broken I could not stay connected more than 15 min, while yours was working just fine. 

Now since 3.2.9, the problem is fixed for me, while yours has become buggy. 

Hence, do not think: 'things were working great, why don't we revert back', as it seems not to be as simple as that if you look at the "average user". 

I sincerely hope a fix will be found for everyone quickly.

Comment 74 shamim.islam 2012-04-08 14:58:22 UTC
Either that or I need to find a way to replace my intel wifi card in my x53sv with an atheros like my HP Phenom II.

Comment 75 shamim.islam 2012-04-10 18:17:06 UTC
Confirmed. Kernel 3.3.1-3.fc16.x86_64 is broken just like the first even though throughput is better. Multiple DNS calls over a short period of time require the module to be unloaded and reloaded as it stops providing data to applications. I get stuck on the "looking up XYZ".

Back to kernel 3.1.0-7.

Comment 76 shamim.islam 2012-04-10 18:37:46 UTC
P.S. I was also uploading a couple of hundred megabytes.

Comment 77 shamim.islam 2012-06-08 03:36:44 UTC
Kernel 3.3.7-1.fc16.x86_64 is the first non-broken kernel. I am getting equivalent speeds on my AMD Phenom II 3.0 GHz and my Core i7 (iwlagn/iwlwifi).

Can anyone else confirm? Time to close this bug?

Thanks!

Comment 78 shamim.islam 2012-06-08 04:24:41 UTC
P.S. I've had no problems and the wifi shuts down during sleep properly too.

Comment 79 Euloiix 2012-06-08 06:23:39 UTC
Ah ah ah, I am really glad Red Hat eventually solved it for you also. 
Very good ! 

From the activity of this post, it seems to me you were the only one left :) 

Good job Red Hat, now let's enjoy Fedora !

Comment 80 Stanislaw Gruszka 2012-06-08 08:43:23 UTC
To be honest, we did not working on fixing this, fixes comes from Intel. 

Also it is kind a strange that 3.3.7 fixed problem, because it does not include any iwlwifi/mac80211 fixes. Latest 3.3 kernel which have iwlwifi/mac80211 fixes is 3.3.5, it include those patches.

> 32ff66c2368d12c815d4a2a290c0e6825e6eb024 iwlwifi: use 6000G2B for 6030 device series
> b4308e17eee55ef3ff85da506e80902e3bc1b6c2 iwlwifi: fix hardware queue programming
> 6a2c73fba70c4b59b0060a80c634912af12a8bd2 iwlwifi: use correct released ucode version
> 8aef975a543c0d567dc2fe2eb527dd1f5cfc1e72 iwlwifi: do not nulify ctx->vif on reset
> a8abc1d0160641c79fbaafeedb57506beb3780e4 mac80211: fix AP mode EAP tx for VLAN stations

Shamim, does your AMD machine by a chance include Atheros device handled by atl1c driver?

Comment 81 shamim.islam 2012-06-08 18:48:46 UTC
Problems were only with iwlwifi - iwlagn (3.1.0-7) workd on my core i7. iwlwifi kept losing data and was VERY slow and kept dropping connections after 3.1.0-7.

My AMD always used ath9k and was running at top speed.

/proc/modules follow

ath9k 134768 0 - Live 0xffffffffa054a000
mac80211 496450 1 ath9k, Live 0xffffffffa04b1000
ath9k_common 13600 1 ath9k, Live 0xffffffffa03ed000
ath9k_hw 408211 2 ath9k,ath9k_common, Live 0xffffffffa042a000
ath 23089 3 ath9k,ath9k_common,ath9k_hw, Live 0xffffffffa03b4000
cfg80211 195558 3 ath9k,mac80211,ath, Live 0xffffffffa03f9000

Comment 82 Sergio Basto 2012-06-09 02:06:07 UTC
(In reply to comment #81)
> Problems were only with iwlwifi 

may be should be reported on 
http://bugzilla.intellinuxwireless.org/

Comment 83 shamim.islam 2012-06-09 02:59:10 UTC
Sergio, 

Seriously? 

I reported a problem with my Fedora kernel upgrade MONTHS ago. The kernel UPGRADE worked perfectly on my AMD and failed on my core i7. These awesome people helped me find a workaround that allowed me to use my core i7 for work instead of being stuck on an AMD that was slowly failing due to overheating. 

While using the workaround, after much testing with multiple versions of the Fedora kernel amongst all of us, and I finally found a working kernel upgrade, which now makes everything work even better. 

I report the success story that the KERNEL 3.3.7-1 NOW WORKS.

And you have the AUDACITY MONTHS later marginalize EVERYONE that contributed to finding the workaround, to simply suggest I should report to a different mailing list?

What would you suggest I report on that other mailing list?

I am at a loss for words. If you feel like I am flamebaiting, I apolgize in advance as I am just VERY FRUSTRATED at your ignorance of the MAGNITUDE of the hubris of your statement.

Your comments are not appropriate, inconsiderate to all here and not thought through. Perhaps an apology to all the people that have worked so hard to not only deal with the broken wifi driver in the kernel, but also those that have spent countless hours testing is in order.

Or perhaps you could find something more constructive to offer.

To all that supported this effort to help document the kernel driver issue and the workaround and lasted through all the intermediate kernel testing - a deep thanks from myself for sure.

So long and thanks for all the fish.

Comment 84 Sergio Basto 2012-06-09 03:22:43 UTC
(In reply to comment #83)
> While using the workaround, after much testing with multiple versions of the
> Fedora kernel amongst all of us, and I finally found a working kernel
> upgrade, which now makes everything work even better. 
> 
> I report the success story that the KERNEL 3.3.7-1 NOW WORKS.
> 
> And you have the AUDACITY MONTHS later marginalize EVERYONE that contributed
> to finding the workaround, to simply suggest I should report to a different
> mailing list?
> 
> What would you suggest I report on that other mailing list?

not a mailing list, a bugzilla specific for iwlwifi on linux from intel . 
But now I don't understand, simply  bug is fixed or not ? on comment #81 you say "kept losing data" 
 
> I am at a loss for words. If you feel like I am flamebaiting, I apolgize in
> advance as I am just VERY FRUSTRATED at your ignorance of the MAGNITUDE of
> the hubris of your statement.

ok so what ? should I read all report before answer 

> Your comments are not appropriate, inconsiderate to all here and not thought
> through. Perhaps an apology to all the people that have worked so hard to
> not only deal with the broken wifi driver in the kernel, but also those that
> have spent countless hours testing is in order.
> 
> Or perhaps you could find something more constructive to offer.
> 
> To all that supported this effort to help document the kernel driver issue
> and the workaround and lasted through all the intermediate kernel testing -
> a deep thanks from myself for sure.
> 
> So long and thanks for all the fish.

Comment 85 Stanislaw Gruszka 2012-06-09 05:19:49 UTC
(In reply to comment #81)
> Problems were only with iwlwifi - iwlagn (3.1.0-7) workd on my core i7.
> iwlwifi kept losing data and was VERY slow and kept dropping connections
> after 3.1.0-7.
> 
> My AMD always used ath9k and was running at top speed.
Ah, ok, so that question was about i7 machine: does it have atl1c Atheros Ethernet device. If so, that would eventually explain why 3.3.7 fixed the problem, and show issue was no so obvious and easy to fix.

Anyway I'm happy that things work for you know - I'm closing the bug.