This service will be undergoing maintenance at 00:00 UTC, 2016-08-01. It is expected to last about 1 hours

Bug 962211

Summary: brcmsmac divide-by-zero (?) in brcms_c_calc_frame_time
Product: [Fedora] Fedora Reporter: Daniel Stone <daniel>
Component: kernelAssignee: John Greene <jogreene>
Status: CLOSED DUPLICATE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 19CC: fedora-kernel-wireless-brcm80211, gansalmon, itamar, jonathan, kernel-maint, kmaraas, madhu.chinakonda, phaber
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-07-31 04:22:39 EDT Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Attachments:
Description Flags
OOPS output none

Description Daniel Stone 2013-05-12 11:31:38 EDT
Every so often (at least hourly today) with 3.9.1-301.fc19, I see what seems to be a divide-by-zero in brcms_c_calc_frame_time(), called from brcms_c_compute_rtscts_dur().  This is on a mid-2012 Ivy Bridge MacBook Air, with a BRCM43224 chip.

Probably notable is that I have an _extremely_ marginal connection to the AP: if I move out of my bedroom, I can't even see it beaconing anymore, and I get quite frequent and long dropouts.  So this is probably quite pathological in terms of retry/etc behaviour.

As it's a hard panic, I don't have any saved OOPSes, but I hope you enjoy the attached picture, with all the quality of a five-year old phonecam, because it was taken with a five-year old phone.

Please let me know if there's anything more I can do: unfortunately I can't get the kernel source for any further debugging, since this connection is so bad and rarely lasts long enough before OOPSing to download much anything ...
Comment 1 Josh Boyer 2013-05-13 09:16:48 EDT
*** Bug 962212 has been marked as a duplicate of this bug. ***
Comment 2 Josh Boyer 2013-05-13 09:19:06 EDT
Daniel, the picture was never attached.
Comment 3 Daniel Stone 2013-05-13 09:55:43 EDT
Created attachment 747240 [details]
OOPS output

Ahr, let's try again.
Comment 4 John Greene 2013-05-13 13:42:35 EDT
Daniel, 
Good picture, good enough anyway.  
The divide is calculating by the rate, which would have to be zero right?  Quick scan doesn't show specific fix upstream as yet.. Will check some other places: wireless testing too.  Does the issue go away when good signal with AP or just when it's marginal?

Are you able to try other kernel versions?  Even an older one might give us a new data point.
Comment 5 Daniel Stone 2013-05-13 13:48:07 EDT
Thanks.  So far at the office I haven't seen this occur at all: even with a really terrible AP which drops association every 15 minutes or so.  But I've got a good link while it lasts, so the rate thing seems fairly likely.

Since this is a fresh F19 install, I haven't got any other kernels on here, I'm afraid.
Comment 6 John Greene 2013-05-14 14:54:19 EDT
Ok, but you are able to reproduce easily it seems.  Perhaps I can get a test kernel to you to get some data if so, are you able to do this?
Comment 7 Daniel Stone 2013-05-14 15:15:31 EDT
Sure, but the sooner the better; as soon as BT fix my DSL (any day now ...) I'm going to stop paying £5/day for this terrible wifi.
Comment 8 Piotr Haber 2013-05-15 04:38:19 EDT
I took a quick look at this issue and seems that only division by zero that can happen in brcms_c_calc_frame_time is in line 638 or 645 (kNdps being zero), that would mean mcs_2_rate returned 0 - which can happen if MCS is 32 and 40MHz channel is in use.

Would you be able to test a simple kernel patch?
Maybe you could provide a scan results ('iw wlanx scan', and indicate which AP you are connected to), maybe also output of 'iw wlanx link'?
Comment 9 Daniel Stone 2013-05-15 05:32:55 EDT
Sure thing, happy to test patches etc.  FWIW - haven't seen it at all today or last night.

Connected to 8a:d1:5e:b6:9f:28 (on wlp2s0)
	SSID: BTWiFi
	freq: 2412
	RX: 47364391 bytes (54165 packets)
	TX: 6451717 bytes (42215 packets)
	signal: -88 dBm
	tx bitrate: 26.0 MBit/s MCS 3

	bss flags:	short-preamble short-slot-time
	dtim period:	0
	beacon int:	100

BSS 8a:d1:5e:b6:9f:28 (on wlp2s0) -- associated
        TSF: 228209755134 usec (2d, 15:23:29)
        freq: 2412
        beacon interval: 100
        capability: ESS ShortSlotTime (0x0401)
        signal: -88.00 dBm
        last seen: 140 ms ago
        Information elements from Probe Response frame:
        SSID: BTWiFi
        Supported rates: 1.0* 2.0* 5.5* 11.0* 18.0 24.0 36.0 54.0
        DS Parameter set: channel 1
        ERP: <no flags>
        Extended supported rates: 6.0 9.0 12.0 48.0
        HT capabilities:
                Capabilities: 0x18fc
                        HT20
                        SM Power Save disabled
                        RX Greenfield
                        RX HT20 SGI
                        RX HT40 SGI
                        TX STBC
                        No RX STBC
                        Max AMSDU length: 7935 bytes
                        DSSS/CCK HT40
                Maximum RX AMPDU length 65535 bytes (exponent: 0x003)
                Minimum RX AMPDU time spacing: 8 usec (0x06)
                HT RX MCS rate indexes supported: 0-15
                HT TX MCS rate indexes are undefined
        HT operation:
                 * primary channel: 1
                 * secondary channel offset: no secondary
                 * STA channel width: 20 MHz
                 * RIFS: 0
                 * HT protection: nonmember
                 * non-GF present: 0
                 * OBSS non-GF present: 1
                 * dual beacon: 0
                 * dual CTS protection: 0
                 * STBC beacon: 0
                 * L-SIG TXOP Prot: 0
                 * PCO active: 0
                 * PCO phase: 0
        WMM:     * Parameter version 1
                 * u-APSD
                 * BE: CW 15-1023, AIFSN 3
                 * BK: CW 15-1023, AIFSN 7
                 * VI: CW 7-15, AIFSN 2, TXOP 3008 usec
                 * VO: CW 3-7, AIFSN 2, TXOP 1504 usec
Comment 10 Daniel Stone 2013-05-15 06:00:56 EDT
Right, and just as I was trying to upload the comment saying I'd be happy to test it but it hadn't happened in a couple of days, it happens again!

Connected to 8a:d1:5e:b6:9f:28 (on wlp2s0)
	SSID: BTWiFi
	freq: 2412
	RX: 37826503 bytes (44934 packets)
	TX: 3986981 bytes (29538 packets)
	signal: -79 dBm
	tx bitrate: 1.0 MBit/s

	bss flags:	short-preamble short-slot-time
	dtim period:	0
	beacon int:	100

BSS 8a:d1:5e:b6:9f:28 (on wlp2s0) -- associated
        TSF: 229265948007 usec (2d, 15:41:05)
        freq: 2412
        beacon interval: 100
        capability: ESS ShortSlotTime (0x0401)
        signal: -78.00 dBm
        last seen: 125 ms ago
        Information elements from Probe Response frame:
        SSID: BTWiFi
        Supported rates: 1.0* 2.0* 5.5* 11.0* 18.0 24.0 36.0 54.0
        DS Parameter set: channel 1
        ERP: <no flags>
        Extended supported rates: 6.0 9.0 12.0 48.0
        HT capabilities:
                Capabilities: 0x18fc
                        HT20
                        SM Power Save disabled
                        RX Greenfield
                        RX HT20 SGI
                        RX HT40 SGI
                        TX STBC
                        No RX STBC
                        Max AMSDU length: 7935 bytes
                        DSSS/CCK HT40
                Maximum RX AMPDU length 65535 bytes (exponent: 0x003)
                Minimum RX AMPDU time spacing: 8 usec (0x06)
                HT RX MCS rate indexes supported: 0-15
                HT TX MCS rate indexes are undefined
        HT operation:
                 * primary channel: 1
                 * secondary channel offset: no secondary
                 * STA channel width: 20 MHz
                 * RIFS: 0
                 * HT protection: no
                 * non-GF present: 0
                 * OBSS non-GF present: 0
                 * dual beacon: 0
                 * dual CTS protection: 0
                 * STBC beacon: 0
                 * L-SIG TXOP Prot: 0
                 * PCO active: 0
                 * PCO phase: 0
        WMM:     * Parameter version 1
                 * u-APSD
                 * BE: CW 15-1023, AIFSN 3
                 * BK: CW 15-1023, AIFSN 7
                 * VI: CW 7-15, AIFSN 2, TXOP 3008 usec
                 * VO: CW 3-7, AIFSN 2, TXOP 1504 usec
Comment 11 John Greene 2013-05-15 11:45:29 EDT
Piotr, thanks for jumping in.  I'm on a release now but monitoring your work here.  Will be happy to assist in getting any fixes release here or upstream..
Comment 12 Daniel Stone 2013-05-15 13:15:34 EDT
Oh, presumably relevant is dmesg being spammed with:

May 12 21:18:02 nightslugs kernel: [ 2344.279539] brcmsmac bcma0:0: brcms_c_ampdu_dotxstatus_complete: ampdu tx phy error (0x10)
May 12 21:18:04 nightslugs kernel: [ 2345.751751] brcmsmac bcma0:0: phyerr 0x10, rate 0x14
May 12 21:18:12 nightslugs kernel: [ 2353.972395] brcmsmac bcma0:0: brcms_c_ampdu_dotxstatus_complete: ampdu tx phy error (0x1)
May 12 21:18:18 nightslugs kernel: [ 2360.125246] brcmsmac bcma0:0: phyerr 0x1, rate 0x14

The rate changes between a number of values.
Comment 13 Kjartan Maraas 2013-05-25 14:27:39 EDT
I see the same thing here and would be happy to test fixes.
Comment 14 Piotr Haber 2013-07-31 04:22:39 EDT
More discussion on this problem in bug 989269
Could you apply the patches and provide the logs?
(If the problem is still reproducible)

*** This bug has been marked as a duplicate of bug 989269 ***