Bug 1082690 - Unable to copy large files over wifi
Summary: Unable to copy large files over wifi
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 20
Hardware: x86_64
OS: Unspecified
unspecified
medium
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-03-31 15:57 UTC by Dano
Modified: 2014-06-10 02:44 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-06-10 02:44:16 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
dmesg before fault, just rebooted. (67.55 KB, text/plain)
2014-03-31 15:57 UTC, Dano
no flags Details
dmesg after wifi was down for 10 minutes. (67.65 KB, text/plain)
2014-03-31 15:58 UTC, Dano
no flags Details
dmesg after manual reset of lan. (Off then ON) (69.77 KB, text/plain)
2014-03-31 15:59 UTC, Dano
no flags Details
messages file while this mess is going on. (117.39 KB, text/plain)
2014-03-31 16:02 UTC, Dano
no flags Details
dmesg run level 2 (67.64 KB, text/plain)
2014-04-02 17:38 UTC, Dano
no flags Details
The latest message file at run level 2. (98.69 KB, text/plain)
2014-04-02 17:40 UTC, Dano
no flags Details
dmesg result after failure. (72.09 KB, text/plain)
2014-05-23 19:13 UTC, Dano
no flags Details
journalctl -k (104.08 KB, text/plain)
2014-05-23 19:21 UTC, Dano
no flags Details
Comment (81.67 KB, text/plain)
2014-05-30 19:27 UTC, mobdim
no flags Details

Description Dano 2014-03-31 15:57:33 UTC
Created attachment 880859 [details]
dmesg before fault, just rebooted.

Description of problem:
When copy a large file over wifi using either scp of sftp the transfer fails before the file transfer completes.

Version-Release number of selected component (if applicable):
uname -a
Linux d830 3.13.7-100.fc19.x86_64 #1 SMP Mon Mar 24 21:53:16 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux


How reproducible:
Every time.

Steps to Reproduce:
1.Open a terminal window.
2.Start an scp transfer of a large file (ex.Fedora-Live-Desktop-x86_64-19-1.iso)
3.

Actual results:
The transfer will stall before the files is completed.

Expected results:
The file should be correctry copied.

Additional info:
There is some inconsistency as to when it fails. Sometimes it fails at less than 10Mb, sometimes as much as 200Mb gets tranfered before is fails.
There doesn't seem to be any usefull information in dmesg and messages. But attached anyway.

lspci | grep WLAN
0c:00.0 Network controller: Broadcom Corporation BCM4311 802.11b/g WLAN (rev 01)

Comment 1 Dano 2014-03-31 15:58:54 UTC
Created attachment 880860 [details]
dmesg after wifi was down for 10 minutes.

Comment 2 Dano 2014-03-31 15:59:59 UTC
Created attachment 880861 [details]
dmesg after manual reset of lan. (Off then ON)

Comment 3 Dano 2014-03-31 16:02:28 UTC
Created attachment 880862 [details]
messages file while this mess is going on.

With reference to the messages file, the fault occured at 10:53.

Comment 4 Dano 2014-03-31 16:13:18 UTC
The router is a Linksys E1200, Firmware Version: 1.0.03. (latest available)

Comment 5 Jirka Klimes 2014-04-01 14:36:52 UTC
I'm not sure what could be the reason.

But there is a disconnection:
Mar 31 11:03:57 d830 NetworkManager[452]: <info> (wlan0): device state change: activated -> unavailable (reason 'none') [100 20 0]
Mar 31 11:03:57 d830 NetworkManager[452]: <info> (wlan0): deactivating device (reason 'none') [0]
Mar 31 11:03:57 d830 kernel: [ 1104.226553] wlan0: deauthenticating from c0:c1:c0:dc:08:33 by local choice (reason=3)
Mar 31 11:03:57 d830 NetworkManager[452]: <info> (wlan0): canceled DHCP transaction, DHCP client pid 759
Mar 31 11:03:57 d830 kernel: [ 1104.228316] cfg80211: Calling CRDA to update world regulatory domain

And later:
Mar 31 11:03:58 d830 NetworkManager[452]: <info> WiFi hardware radio set disabled
Mar 31 11:03:58 d830 NetworkManager[452]: <info> WiFi now disabled by radio killswitch


Is that really triggered by copying large files? Did you do anything with the machine while copying?
This seems suspicious to me:
Mar 31 10:47:14 d830 systemd-logind[453]: Removed session c1.
And before that there are some gnome-session problems.


I can also see this:
Mar 31 10:59:00 d830 kernel: [  806.286135] perf samples too long (2507 > 2500), lowering kernel.perf_event_max_sample_rate to 50000

https://bbs.archlinux.org/viewtopic.php?id=170471

Comment 6 Dano 2014-04-01 20:59:35 UTC
The NetworkManager messages at 11:03 was because I manualy reset the wifi to get the connection back. Note that the wifi was down for 10 minutes already. The fault occurred at 10:53.

The only thing going on is scp in a terminal window on the gnome3 desktop.

The "perf samples too long" message happened 6 minutes after the failure. I have no idea if it's related. At that time I was copying the messages file and capturing dmesg output, again in the terminal window.

As I mentioned there is no entry in the log at the time of failure. Is there a way to enable more verbose logging?

Comment 7 Dano 2014-04-02 11:55:33 UTC
Did more tests:
I tried copying the file over NFS to see if it would work. It did not and also failed in the same manner (nothing in the logs). This test should exonorates SSH as a possible problem.

Being curious as to why NetworkManager doesn't see this failure, I installed wireshark to monitor the wlan. During an scp transfer from my server to my laptop, many packets fly by, as expected. When the failure happens, the last packets are a half dozen retransmit request packets from the server. However, other network trafic from other systems on the network are still captured. Various IGMP and ARP packets to and from my server, office desktop, printer and router can be seen. This would indicate that the wan is still up and the hardware driver is still working properly, and probably why NetworkManager is still happy.

Found that this problem is not restricted to scp, sftp or NFS, but generally  occurs any time there is a sustained high bandwidth demand on the wlan. Such as web sites with large amounts of data, or when downloading torrents.

Could this be an IP stack issue. Perhaps a buffer overun situation?

I wish I had some logs to send you or even better, if you could replicate the problem on your system.

Let me know if I can help in any way.

Comment 8 Dano 2014-04-02 17:38:55 UTC
Created attachment 881914 [details]
dmesg run level 2

Tried copying the file at run level 2 since this would take a lot of baggage out of the way. The transfer still failed.

$ who -r
     run-level 2  2014-04-02 13:21

$ scp server:Fedora-Live-Desktop-x86_64-19-1.iso .
Fedora-Live-Desktop-x86_64-19-1.iso             5%   53MB   0.0KB/s - stalled -

Comment 9 Dano 2014-04-02 17:40:09 UTC
Created attachment 881915 [details]
The latest message file at run level 2.

Comment 10 Dan Williams 2014-04-04 18:46:18 UTC
All the logs from comment 8 and comment 9 look OK.  NetworkManager connects and everything looks like it's fine.  The 'dmesg' output also looks OK, there are no errors and the assocation with the AP goes wel.

The problem is likely either in the kernel driver or the b43 open-source firmware, or with rate control.  Over to the kernel to diagnose that.

Comment 11 Dan Williams 2014-04-04 18:47:40 UTC
[   11.405110] ssb: Found chip with id 0x4311, rev 0x01 and package 0x00
[   11.405125] ssb: Core 0 found: ChipCommon (cc 0x800, rev 0x11, vendor 0x4243)
[   11.405137] ssb: Core 1 found: IEEE 802.11 (cc 0x812, rev 0x0A, vendor 0x4243)
[   11.405149] ssb: Core 2 found: USB 1.1 Host (cc 0x817, rev 0x03, vendor 0x4243)
[   11.405161] ssb: Core 3 found: PCI-E (cc 0x820, rev 0x01, vendor 0x4243)
...
[   13.002630] b43-phy0: Broadcom 4311 WLAN found (core revision 10)
[   13.018046] b43-phy0: Found PHY: Analog 4, Type 2 (G), Revision 8
[   13.025200] b43 ssb0:0: Direct firmware load failed with error -2
[   13.025206] b43 ssb0:0: Falling back to user helper
[   13.025231] b43 ssb0:0: Direct firmware load failed with error -2
[   13.025236] b43 ssb0:0: Falling back to user helper
[   13.025329] Broadcom 43xx driver loaded [ Features: PMNLS ]
[   13.510253] b43 ssb0:0: Direct firmware load failed with error -2
[   13.510261] b43 ssb0:0: Falling back to user helper
[   13.552659] cfg80211: Calling CRDA for country: CA
[   13.554969] cfg80211: Regulatory domain changed to country: CA
[   13.554974] cfg80211:   (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp)
[   13.554977] cfg80211:   (2402000 KHz - 2472000 KHz @ 40000 KHz), (300 mBi, 2700 mBm)
[   13.554980] cfg80211:   (5170000 KHz - 5250000 KHz @ 40000 KHz), (300 mBi, 1700 mBm)
[   13.554982] cfg80211:   (5250000 KHz - 5330000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
[   13.554985] cfg80211:   (5490000 KHz - 5710000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
[   13.554987] cfg80211:   (5735000 KHz - 5835000 KHz @ 40000 KHz), (300 mBi, 3000 mBm)
[   13.606739] ieee80211 phy0: Selected rate control algorithm 'minstrel_ht'
...
[   32.276097] b43-phy0: Loading OpenSource firmware version 410.31754
[   32.276106] b43-phy0: Hardware crypto acceleration not supported by firmware
[   32.325089] IPv6: ADDRCONF(NETDEV_UP): wlan0: link is not ready
[   33.914709] wlan0: authenticate with c0:c1:c0:dc:08:33
[   33.932312] wlan0: send auth to c0:c1:c0:dc:08:33 (try 1/3)
[   33.933949] wlan0: authenticated
[   33.935026] wlan0: associate with c0:c1:c0:dc:08:33 (try 1/3)
[   33.938639] wlan0: RX AssocResp from c0:c1:c0:dc:08:33 (capab=0x411 status=0 aid=3)
[   33.938953] wlan0: associated
[   33.938983] IPv6: ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready

Comment 12 Dano 2014-05-09 01:07:00 UTC
Just upgraded to latest kernel.
Problem still occurs.

$ uname -a
Linux d830 3.13.11-100.fc19.x86_64 #1 SMP Wed Apr 23 20:10:57 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

Comment 13 Justin M. Forbes 2014-05-21 19:31:29 UTC
*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 19 kernel bugs.

Fedora 19 has now been rebased to 3.14.4-100.fc19.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you have moved on to Fedora 20, and are still experiencing this issue, please change the version to Fedora 20.

If you experience different issues, please open a new bug report for those.

Comment 14 Dano 2014-05-22 13:32:56 UTC
Upgraded to new kernel as requested.

$ uname -a
Linux d830 3.14.4-100.fc19.x86_64 #1 SMP Tue May 13 15:00:26 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

scp over wifi still fails miserably both in graphical or multi-user run level.

Comment 15 Dano 2014-05-22 18:27:35 UTC
Installed and booted the debug kernel.

$ uname -a 
Linux d830 3.14.4-100.fc19.x86_64.debug #1 SMP Tue May 13 14:46:53 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

When scp fails, nothing is written in the message file.

Comment 16 Dano 2014-05-23 19:13:57 UTC
Created attachment 898788 [details]
dmesg result after failure.

Comment 17 Dano 2014-05-23 19:21:17 UTC
Created attachment 898790 [details]
journalctl -k

Did a fresh net-install of fc20.

$ uname -a
Linux d830 3.14.4-200.fc20.x86_64 #1 SMP Tue May 13 13:51:08 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

The only this I did was setup SSH keys copy my host file over and attempted the dreaded scp file copy. This was the result.

scp server:Fedora-Live-Desktop-x86_64-19-1.iso .
Fedora-Live-Desktop-x86_64-19-1.iso            15%  149MB   0.0KB/s - stalled

$ lspci | grep WLAN
0c:00.0 Network controller: Broadcom Corporation BCM4311 802.11b/g WLAN (rev 01)

Comment 18 mobdim 2014-05-30 19:27:06 UTC
Created attachment 915912 [details]
Comment

(This comment was longer than 65,535 characters and has been moved to an attachment by Red Hat Bugzilla).

Comment 19 Dano 2014-06-10 02:44:16 UTC
Replaced my Broadcom wifi card with an Intel ($10 on ebay) 

$ lspci | grep Network
0c:00.0 Network controller: Intel Corporation PRO/Wireless 4965 AG or AGN  Kedron] Network Connection (rev 61)

The wifi connection is now solid and about 30% faster.

The Broadcom card made a satisfying crackling sound when I crushed it in my bench vise.

Problem solved.


Note You need to log in before you can comment on or make changes to this bug.