Bug 606393

Summary: [ath9k] Network software dropping connections, terminating downlaods
Product: [Fedora] Fedora Reporter: Paul Lambert <eb30750>
Component: kernelAssignee: John W. Linville <linville>
Status: CLOSED WORKSFORME QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: low    
Version: 13CC: anton, dcbw, dougsland, gansalmon, itamar, jonathan, kernel-maint, madhu.chinakonda, mcgrof, michel
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-01-18 15:04:12 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
output of dmesg
none
outpur of /var/log/messages none

Description Paul Lambert 2010-06-21 14:22:18 UTC
Description of problem: This is the same problem that was reported in bug 551861 for FE-12


Version-Release number of selected component (if applicable):


How reproducible:Any sizeable download will not complete.  Cannot install i386 netinstX without terminating prematurely.


Steps to Reproduce:
1.
2.
3.
  
Actual results:
Network delays, no arp addresses, network communications just stops on large downloads.

Expected results:


Additional info: FE-13 has been a catastrophe.  It is obvious the the network developer did not include the FE-12 network fixes in the FE-13 release.  Many bugs that were fixed in FE-12 are showing up again in FE-13.  The nss-softokn is another example of bugs being carried forward into FE-13 that were previously fixed.

Comment 1 Paul Lambert 2010-06-23 19:03:01 UTC
Highly likely that this network issue is tied to this kernel oops cpu crash.  Wlan0 dropping data frames.  But again, this was an issue on previous versions but resolved in FE-12.  Bugs that were fixed have been reintroduced into FE-13.  I have used the automatic bug filer to report the kernel oops bug.  

WARNING: at net/mac80211/tx.c:553 invoke_tx_handlers+0x59a/0xbfe [mac80211]()
Hardware name: HP Pavilion dv7 Notebook PC
wlan0: Dropped data frame as no usable bitrate found while scanning and associated. Target station: 00:1d:7e:16:e9:16 on 5 GHz band
Modules linked in: bluetooth tun aes_x86_64 aes_generic fuse ipt_MASQUERADE iptable_nat nf_nat bridge stp llc sunrpc cpufreq_ondemand powernow_k8 freq_table xt_physdev ip6t_REJECT nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 kvm_amd kvm uinput snd_hda_codec_atihdmi arc4 ecb snd_hda_codec_idt ath9k ath9k_common mac80211 snd_hda_intel ath9k_hw snd_hda_codec uvcvideo snd_hwdep videodev ath v4l1_compat snd_seq sdhci_pci snd_seq_device snd_pcm hp_wmi sdhci v4l2_compat_ioctl32 cfg80211 snd_timer jmb38x_ms rfkill r8169 memstick mmc_core joydev snd shpchp edac_core microcode mii edac_mce_amd soundcore wmi snd_page_alloc k10temp i2c_piix4 hp_accel lirc_ene0100 lis3lv02d lirc_dev input_polldev ata_generic pata_acpi pata_atiixp video output radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core [last unloaded: scsi_wait_scan]
Pid: 835, comm: phy0 Not tainted 2.6.33.5-124.fc13.x86_64 #1
Call Trace:
[<ffffffff8104b54c>] warn_slowpath_common+0x77/0x8f
[<ffffffff8104b5b1>] warn_slowpath_fmt+0x3c/0x3e
[<ffffffffa02bac68>] invoke_tx_handlers+0x59a/0xbfe [mac80211]
[<ffffffffa02aae8f>] ? sta_info_get+0x2f/0x44 [mac80211]
[<ffffffffa02ba622>] ? ieee80211_tx_prepare+0x2dc/0x323 [mac80211]
[<ffffffffa02bb49f>] ieee80211_tx+0x6d/0x1d3 [mac80211]
[<ffffffff81381edc>] ? skb_release_data+0xc4/0xc9
[<ffffffff813828db>] ? pskb_expand_head+0xed/0x170
[<ffffffffa02bb7ec>] ieee80211_xmit+0x1e7/0x206 [mac80211]
[<ffffffff81381fe4>] ? __alloc_skb+0x7b/0x16b
[<ffffffffa02bb855>] ieee80211_tx_skb+0x4a/0x51 [mac80211]
[<ffffffffa02b0375>] ieee80211_send_nullfunc+0xfc/0x105 [mac80211]
[<ffffffffa02b03c9>] ieee80211_dynamic_ps_enable_work+0x4b/0x84 [mac80211]
[<ffffffff81060d31>] worker_thread+0x1a4/0x232
[<ffffffffa02b037e>] ? ieee80211_dynamic_ps_enable_work+0x0/0x84 [mac80211]
[<ffffffff8106480b>] ? autoremove_wake_function+0x0/0x34
[<ffffffff81060b8d>] ? worker_thread+0x0/0x232
[<ffffffff810643bb>] kthread+0x7a/0x82
[<ffffffff8100a924>] kernel_thread_helper+0x4/0x10
[<ffffffff81064341>] ? kthread+0x0/0x82
[<ffffffff8100a920>] ? kernel_thread_helper+0x0/0x10

Comment 2 Dan Williams 2010-06-28 19:39:57 UTC
Clearly the driver.  If it's not passing traffic properly, it may certainly be responsible for frequently dropping network connections.

Comment 3 John W. Linville 2010-06-28 20:02:31 UTC
Can you recreate this issue _after_ issuing the following command?

iwconfig wlan0 power timeout 0

Comment 4 John W. Linville 2010-07-14 15:58:08 UTC
Any word on this?

Comment 5 John W. Linville 2010-08-05 17:22:41 UTC
Ping?

Comment 6 Paul Lambert 2010-08-06 12:27:24 UTC
I am currently running kernel version kernel.x86_64                              2.6.33.6-147.2.4.fc13.  I did not encounter is issue for several weeks and thought one of the recent upgrades eliminated it.  However, I was performing a substantial batch of upgrades last week and encountered one dropped packet error so the bug still lives.

Comment 7 John W. Linville 2010-08-10 18:50:48 UTC
Based on the backtrace, I surmise that the device is exiting dynamic power saving mode and attempting to transmit a frame to notify the AP that it is awake.  Instead, it is hitting this check in ieee80211_tx_h_rate_ctrl:

        /*
         * Lets not bother rate control if we're associated and cannot
         * talk to the sta. This should not happen.
         */
        if (WARN(test_bit(SCAN_SW_SCANNING, &tx->local->scanning) &&
                 (sta_flags & WLAN_STA_ASSOC) &&
                 !rate_usable_index_exists(sband, &tx->sta->sta),
                 "%s: Dropped data frame as no usable bitrate found while "
                 "scanning and associated. Target station: "
                 "%pM on %d GHz band\n",
                 tx->sdata->name, hdr->addr1,
                 tx->channel->band ? 5 : 2))
                return TX_DROP;

Unfortunately, I don't completely understand what this check is doing -- hopefully Johannes has some advice.  Aside from the warning message, are you experiencing any other problems (e.g. dropped connections)?

Comment 8 Johannes Berg 2010-08-10 19:08:21 UTC
Luis added that check, it was something about catching frames trying to be transmitted on the wrong band. I've never seen this but I guess it's related to scanning.

Comment 9 Luis R. Rodriguez 2010-08-10 22:02:27 UTC
Indeed. The warning is designed to catch bugs on drivers or the wireless core when a frame is trying to be transmitted for an incorrect band, this typically was happening when scanning. I remember I added the check but prior to this fixed the original issue that was causing but I do forget where the issue was happening exactly.

The reason for the detailed print is so you can see if the bit rate is indeed valid, and if the target peer is on the right band that it says you are trying to communicate under.

Comment 10 Mike Gahagan 2010-08-13 18:08:49 UTC
I'm experiencing similar problems on a Lenovo Z61t. According to lspci, it has a:

03:00.0 Network controller: Atheros Communications Inc. AR5008 Wireless Network Adapter (rev 01)

Most of the time when this occurs I see NetworkManager try to reconnect and after a minute or so it just gives up and while it shows wireless still active but never shows any AP's available. Suspending/resuming the machine seems to make it occur more frequently.

I also had this problem in F12, usually disabling networking and re-enabling it in NetworkManager would bring it back. Now with F13 I find I have to stop NetworkManager, unload the ath9k module, re-load ath9k then start NetworkManager and I'll have wireless again that lasts anywhere from 30 min. to a few hours.

Comment 11 Michel Lind 2010-09-21 16:22:36 UTC
Similar problem here on a Sony Vaio W. With F-13 the problem occurs -- frequent disconnects etc. -- but at a manageable level. Now that the machine has been upgraded to F-14 (starting with alpha and updated with yum since) it's currently at a point where pings from the machine is reliable enough, but downloads stall rather frequently, and pings from the outside world to the machine suffer > 50% packet loss.

I attempted to run SSH on the machine and the response is really sluggish, even for keyboard input -- consistent with the packet loss rate.


$ lspci | grep Atheros | grep Wireless
02:00.0 Network controller: Atheros Communications Inc. AR9285 Wireless Network Adapter (PCI-Express) (rev 01)
$ uname -a
Linux iris.localdomain 2.6.35.4-28.fc14.x86_64 #1 SMP Wed Sep 15 01:56:54 UTC 2010 x86_64 x86_64 x86_64 GNU/Linux

Comment 12 John W. Linville 2011-01-13 19:05:20 UTC
Been a long time...does this problaem still happen with current updated Fedora kernels?

Comment 13 Paul Lambert 2011-01-13 19:47:47 UTC
I have not lost all wireless ethernet connections for some time.  I am now using F-14 and it seems to very stable.  However, after upgrading to kernel 10-72 network performance has taken a dive and there are times the Firefox freezes for over a minute before the network is again handling network traffic.  Can you provide more detailed instructions on what error logs would yield the necessary information to pinpoint just the bug related to this bug report?

Comment 14 John W. Linville 2011-01-13 20:10:18 UTC
Difficult to say...of course, dmesg output (and/or /var/log/messages contents) is usually a good place to start. :-)

Comment 15 Paul Lambert 2011-01-13 20:22:30 UTC
There are many aborts regarding nsplugwrapper in the messages log.  I have attached the output of dmesg and /var/log/messages

Comment 16 Paul Lambert 2011-01-13 20:23:15 UTC
Created attachment 473415 [details]
output of dmesg

Comment 17 Paul Lambert 2011-01-13 20:24:10 UTC
Created attachment 473418 [details]
outpur of /var/log/messages

Comment 18 John W. Linville 2011-01-17 14:07:44 UTC
I see a lot of flashplayer crashes in those logs.  Do you have problems when you are not using flash?

Comment 19 Paul Lambert 2011-01-18 00:26:37 UTC
Since I have not observed this bug recently, I believe we can assume that the degraded network performance when using FF was related to the flashplayer crashes. 

This being the case we can close this bug report since neither network issue has been experienced of late.

Comment 20 John W. Linville 2011-01-18 15:04:12 UTC
Closing on basis of comment 19...