Bug 233011
Summary: | bcm43xx_mac80211 crashes into debugger on 12" powerbook | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Will Woods <wwoods> | ||||||||||||||
Component: | kernel | Assignee: | John W. Linville <linville> | ||||||||||||||
Status: | CLOSED RAWHIDE | QA Contact: | Brian Brock <bbrock> | ||||||||||||||
Severity: | medium | Docs Contact: | |||||||||||||||
Priority: | medium | ||||||||||||||||
Version: | rawhide | CC: | cebbert, davej, dwmw2, larry.finger | ||||||||||||||
Target Milestone: | --- | Keywords: | Reopened | ||||||||||||||
Target Release: | --- | ||||||||||||||||
Hardware: | powerpc | ||||||||||||||||
OS: | Linux | ||||||||||||||||
Whiteboard: | |||||||||||||||||
Fixed In Version: | 2.6.23-0.202.rc8.fc8 | Doc Type: | Bug Fix | ||||||||||||||
Doc Text: | Story Points: | --- | |||||||||||||||
Clone Of: | Environment: | ||||||||||||||||
Last Closed: | 2007-09-25 21:26:02 UTC | Type: | --- | ||||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||||
Documentation: | --- | CRM: | |||||||||||||||
Verified Versions: | Category: | --- | |||||||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||
Embargoed: | |||||||||||||||||
Bug Depends On: | |||||||||||||||||
Bug Blocks: | 235704 | ||||||||||||||||
Attachments: |
|
Description
Will Woods
2007-03-19 21:12:11 UTC
Tried kernel-2.6.20-1.3016.2.1.fc7.jwltest.6.ppc.rpm, with the same results - Machine Check in ssb_pci_read16 while doing bcm43xx_phy_initb5. At https://lists.berlios.de/pipermail/bcm43xx-dev/2007-March/004324.html there's a hack which helps the kernel recover from these machine checks, for bcm43xx. You could do the same in the bcm43xx_mac80211 driver. The mac80211 driver is a little behind with the fixes we've put into bcm43xx recently. Note: I mean you can do the same for bcm43xx_mac80211 in order to find the places we're poking at bad registers and fix them. I wouldn't recommend _shipping_ anything like that :) I have added an experimental patch to the test kernels here: http://people.redhat.com/linville/kernels/fc7/ These are based on portions of the patches that fixed the softmac version of the driver. Please give them a try and post the results here...thanks! Crashes in the same place with kernel 2.6.20-1.3059.fc7 and 2.6.20-2.1.fc7.jwltest.8 - same NIP and everything. Created attachment 152576 [details] dwmw2-mac80211.patch mac80211 version of patch from comment 2 Will, please apply this patch to a current rawhide kernel and reproduce the issue. Let me know if you need assistance! Patch applied to kernel 3062. It still defaults to using bcm43xx instead of bcm43xx_mac80211. That doesn't work because the firmware is too new. But after I rmmod both and modprobe bcm43xx_mac80211 - holy cow, my wireless works! Which presents a problem: how do I know which register access broke stuff when it's not breaking stuff anymore? Falling back to the normal 3062 kernel it still breaks in the usual way, so I'm not really sure what to report. I can capture the copious spew from the kernel when the wireless works, if it helps, but AFACT no machine check happens at all. Any advice on what phy_write/phy_read calls would be interesting? FWIW you can have both versions of firmware present in /lib/firmware and use the 'fwpostfix' module option to make the mac80211 driver look for the v4 firmware. I have this in /etc/modprobe.conf: options bcm43xx-mac80211 fwpostfix=-v4 My v4 firmware is /lib/firmware/bcm43xx_*-v4.fw There's a case to be made for having the fwpostfix option set by _default_ for the mac80211 module, and having bcm43xx_fwcutter generate files with the appropriate filenames by default. Created attachment 152683 [details]
dwmw2-bcm43xx-mac80211-debug.patch
Undoubtedly better version directly from dwmw2...
Okay, I've applied that patch and I've got some interesting output in /var/log/messages and dmesg. I'm assuming that the "Read phy reg %x; got 0xFFFF" messages are the interesting ones. Here are the registers listed: 41a 41b 420 429 42b 481 482 488 489 48c 496 4a0 4a1 4a2 4a5 4a8 4ab 814 815 There's also a good amount of this message: IN from bad port ed2773fe at ec157dbc Here's a sample trace: Badness at arch/powerpc/kernel/traps.c:347 Call Trace: [CB8E1A40] [C0008C18] show_stack+0x50/0x184 (unreliable) [CB8E1A60] [C012C358] report_bug+0x84/0xc8 [CB8E1A70] [C02CBD54] __kprobes_text_start+0x154/0x538 [CB8E1AC0] [C00136D4] ret_from_except_full+0x0/0x4c --- Exception: 700 at machine_check_exception+0x170/0x2f0 LR = machine_check_exception+0x15c/0x2f0 [CB8E1BA0] [C00136D4] ret_from_except_full+0x0/0x4c --- Exception: 200 at ssb_modexit+0x28/0x1740 [ssb] LR = bcm43xx_phy_read+0x78/0xb4 [bcm43xx_mac80211] [CB8E1C60] [00004C00] 0x4c00 (unreliable) [CB8E1C70] [EC323024] bcm43xx_phy_read+0x78/0xb4 [bcm43xx_mac80211] [CB8E1C80] [EC3240EC] bcm43xx_phy_agcsetup+0x324/0x580 [bcm43xx_mac80211] [CB8E1CA0] [EC32A214] bcm43xx_phy_inita+0xb68/0xdf0 [bcm43xx_mac80211] [CB8E1CC0] [EC32B30C] bcm43xx_phy_initg+0x50/0xcdc [bcm43xx_mac80211] [CB8E1D30] [EC32C0DC] bcm43xx_phy_early_init+0x144/0x164 [bcm43xx_mac80211] [CB8E1D50] [EC31F2E0] bcm43xx_wireless_core_init+0x4b4/0xac4 [bcm43xx_mac80211] [CB8E1DA0] [EC320C9C] bcm43xx_add_interface+0x68/0x12c [bcm43xx_mac80211] [CB8E1DC0] [EC247CF0] ieee80211_open+0x250/0x3d0 [mac80211] [CB8E1E00] [C02533A4] dev_open+0x60/0xc8 [CB8E1E20] [C0251124] dev_change_flags+0x70/0x148 [CB8E1E40] [C029DB58] devinet_ioctl+0x274/0x6b8 [CB8E1EA0] [C029E478] inet_ioctl+0xa8/0xdc [CB8E1EB0] [C0244ED4] sock_ioctl+0x248/0x284 [CB8E1ED0] [C00A6058] do_ioctl+0x38/0x84 [CB8E1EE0] [C00A6484] vfs_ioctl+0x3e0/0x414 [CB8E1F10] [C00A6520] sys_ioctl+0x68/0x98 [CB8E1F40] [C0013078] ret_from_syscall+0x0/0x38 --- Exception: c00 at 0xf3f5008 LR = 0xf3f4fa0 Read phy reg 481; got 0xFFFF. IN from bad port ed2773fe at ec157dbc Most of them seem to be approximately the same thing, but from from various addresses in bcm43xx_phy_agcsetup or bcm43xx_radio_init2050. Created attachment 152718 [details]
kernel messages while running dwmw's patch
generated with sed -rne 's/^Apr 16 .* kernel: //p' /var/log/messages
Will, please try the latest rawhide kernel you can grab (straight from brew if earlier than 4/20) and try to recreate this...thanks! It works! Hooray! I still haven't quite figured out how to make it stop loading the old bcm43xx driver, so I just blacklisted it. The wireless comes right up now, just like it should. I'm closing this now. Thanks! b43_legacy, in current rawhide, appears to have the same problem as originally described here. I'll attach the log but the symptom is the same - "Oops: Machine check, sig: 7 [#1]" while in b43legacy_phy_initb5 -> b43legacy_phy_read -> ssb_pci_read16. Created attachment 193621 [details]
portion of /var/log/messages showing crash dump
One pertinent detail was not listed in the original post, but I'm assuming that this is a BCM4306/r that has a rev 1 PHY. If not, please let me know. dmesg says: b43legacy-phy0: Broadcom 4306 WLAN found b43legacy-phy0 debug: Found PHY: Analog 1, Type 2, Revision 1 b43legacy-phy0 debug: Found Radio: Manuf 0x17F, Version 0x2050, Revision 2 b43legacy-phy0 debug: Radio turned off So I guess that's a yes. Created attachment 195141 [details]
Trial patch for problem
Please try this patch to see if it fixes your problem.
Thanks, Larry
Created attachment 196261 [details]
Patch that may resolve some endian issues
This patch fixes most of the warnings issued by sparse on the b43legacy code.
Some of them are endian-related and may resolve the problem with ppc
architecture.
Everything seems fine with current rawhide kernel (2.6.23-rc8). Closing. |