Bug 233011 - bcm43xx_mac80211 crashes into debugger on 12" powerbook
Summary: bcm43xx_mac80211 crashes into debugger on 12" powerbook
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: rawhide
Hardware: powerpc
OS: Linux
medium
medium
Target Milestone: ---
Assignee: John W. Linville
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks: F8Target
TreeView+ depends on / blocked
 
Reported: 2007-03-19 21:12 UTC by Will Woods
Modified: 2007-11-30 22:11 UTC (History)
4 users (show)

Fixed In Version: 2.6.23-0.202.rc8.fc8
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2007-09-25 21:26:02 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
dwmw2-mac80211.patch (1.51 KB, patch)
2007-04-13 18:46 UTC, John W. Linville
no flags Details | Diff
dwmw2-bcm43xx-mac80211-debug.patch (2.67 KB, patch)
2007-04-16 13:35 UTC, John W. Linville
no flags Details | Diff
kernel messages while running dwmw's patch (210.53 KB, text/plain)
2007-04-16 21:29 UTC, Will Woods
no flags Details
portion of /var/log/messages showing crash dump (4.22 KB, text/plain)
2007-09-12 16:21 UTC, Will Woods
no flags Details
Trial patch for problem (1.17 KB, patch)
2007-09-13 19:39 UTC, Larry Finger
no flags Details | Diff
Patch that may resolve some endian issues (5.12 KB, patch)
2007-09-14 21:37 UTC, Larry Finger
no flags Details | Diff

Description Will Woods 2007-03-19 21:12:11 UTC
When NetworkManager starts on my 12" powerbook, the kernel (2.6.20-1.2997.fc7)
crashes into the debugger. Info follows:

Mar 19 16:04:55 albook kernel: bcm43xx_mac80211: Adding Interface type 2
Mar 19 16:04:55 albook kernel: bcm43xx_mac80211: Found PHY: Analog 1, Type 2,
Revision 1
Mar 19 16:04:55 albook kernel: bcm43xx_mac80211: Found Radio: Manuf 0x17F,
Version 0x2050, Revision 2
Mar 19 16:54:12 albook kernel: Machine check in kernel mode.
Mar 19 16:54:12 albook kernel: Caused by (from SRR1=149030): Transfer error ack
signal
Mar 19 16:54:12 albook kernel: BUG: soft lockup detected on CPU#0!
Mar 19 16:54:12 albook kernel: Call Trace:
Mar 19 16:54:12 albook kernel: [E39A5BB0] [C0008C18] show_stack+0x50/0x184
(unreliable)
Mar 19 16:54:12 albook kernel: [E39A5BD0] [C0065060] softlockup_tick+0xb4/0xd0
Mar 19 16:54:12 albook kernel: [E39A5BF0] [C003DE30] run_local_timers+0x18/0x28
Mar 19 16:54:12 albook kernel: [E39A5C00] [C003DE80] update_process_times+0x40/0x7c
Mar 19 16:54:12 albook kernel: [E39A5C10] [C001023C] timer_interrupt+0xcc/0x580
Mar 19 16:54:12 albook kernel: --- Exception: 0 at 0xc1c65b10
Mar 19 16:54:12 albook kernel:     LR = 0x0
Mar 19 16:54:12 albook kernel: [E39A5C80] [C00136B8] ret_from_except+0x0/0x14
(unreliable)
Mar 19 16:54:12 albook kernel: --- Exception: 901 at handle_IRQ_event+0x30/0xa0
Mar 19 16:54:12 albook kernel:     LR = handle_fasteoi_irq+0xc4/0x128
Mar 19 16:54:12 albook kernel: [E39A5D40] [00000000] 0x0 (unreliable)
Mar 19 16:54:12 albook kernel: [E39A5D60] [C0066AB4] handle_fasteoi_irq+0xc4/0x128
Mar 19 16:54:12 albook kernel: [E39A5D80] [C0006828] do_IRQ+0x8c/0xcc
Mar 19 16:54:12 albook kernel: [E39A5D90] [C00136B8] ret_from_except+0x0/0x14
Mar 19 16:54:12 albook kernel: --- Exception: 501 at _spin_unlock_irq+0x1c/0x2c
Mar 19 16:54:12 albook kernel:     LR = _spin_unlock_irq+0x10/0x2c
Mar 19 16:54:12 albook kernel: [E39A5E60] [C02C83F0] schedule+0x664/0x6f0
Mar 19 16:54:12 albook kernel: [E39A5E90] [C0034CE8] do_syslog+0x12c/0x4b8
Mar 19 16:54:12 albook kernel: [E39A5EE0] [C00DF040] kmsg_read+0x50/0x64
Mar 19 16:54:12 albook kernel: [E39A5EF0] [C0097C4C] vfs_read+0xec/0x1c8
Mar 19 16:54:12 albook kernel: [E39A5F10] [C00980AC] sys_read+0x4c/0x8c
Mar 19 16:54:12 albook kernel: [E39A5F40] [C0013010] ret_from_syscall+0x0/0x38
Mar 19 16:54:12 albook kernel: --- Exception: c00 at 0x1ff387d4
Mar 19 16:54:12 albook kernel:     LR = 0x2000241c
Mar 19 16:54:12 albook kernel: Oops: Machine check, sig: 7 [#1]
Mar 19 16:54:12 albook kernel: 
Mar 19 16:54:12 albook kernel: Modules linked in: sg(U) scsi_mod(U) autofs4(U)
hidp(U) hci_usb(U) rfcomm(U) l2cap(U) bluetooth(U) sunrpc(U) ip_tables(U)
xt_tcpudp(U) ip6t_REJECT(U) ip6table_
filter(U) ip6_tables(U) x_tables(U) ipv6(U) nls_utf8(U) hfsplus(U) parport_pc(U)
lp(U) parport(U) snd_aoa_i2sbus(U) snd_powermac(U) snd_seq_dummy(U)
snd_seq_oss(U) snd_seq_midi_event(U) snd_
seq(U) snd_seq_device(U) snd_pcm_oss(U) snd_mixer_oss(U) snd_pcm(U) snd_timer(U)
snd_page_alloc(U) snd(U) soundcore(U) ide_cd(U) cdrom(U) snd_aoa_soundbus(U)
fw_ohci(U) fw_core(U) bcm43xx(U)
 ieee80211softmac(U) ieee80211(U) ieee80211_crypt(U) arc4(U) ecb(U) blkcipher(U)
rc80211_simple(U) bcm43xx_mac80211(U) ssb(U) mac80211(U) cfg80211(U) sungem(U)
sungem_phy(U) dm_snapshot(U) d
m_zero(U) dm_mirror(U) dm_mod(U) ext3(U) jbd(U) mbcache(U) ehci_hcd(U)
ohci_hcd(U) uhci_hcd(U)
Mar 19 16:54:12 albook kernel: NIP: EC130834 LR: EC487FD4 CTR: EC1307E0
Mar 19 16:54:12 albook kernel: REGS: de65d8c0 TRAP: 0200   Not tainted 
(2.6.20-1.2997.fc7)
Mar 19 16:54:12 albook kernel: MSR: 00149030 <EE,ME,IR,DR>  CR: 24000422  XER:
20000000
Mar 19 16:54:12 albook kernel: TASK = df172680[2400] 'NetworkManager' THREAD:
de65c000
Mar 19 16:54:12 albook kernel: GPR00: 0000DD80 DE65D970 DF172680 C0BA2D40
C0BA2E2C 0000042B 00000002 FFFF659F 
Mar 19 16:54:12 albook kernel: GPR08: 00000000 EA098000 24000084 00001032
00000000 1006AF82 00000000 00000000 
Mar 19 16:54:12 albook kernel: GPR16: 00000000 00000000 00000000 00000000
00000000 00000000 E5787120 C0BA2D40 
Mar 19 16:54:12 albook kernel: GPR24: 00000001 00000002 00000000 00000002
E5709254 000000C7 000003FE C0BA2D40 
Mar 19 16:54:12 albook kernel: NIP [EC130834] ssb_pci_read16+0x54/0x74 [ssb]
Mar 19 16:54:12 albook kernel: LR [EC487FD4] bcm43xx_phy_read+0x7c/0x94
[bcm43xx_mac80211]
Mar 19 16:54:12 albook kernel: Call Trace:
Mar 19 16:54:12 albook kernel: [DE65D970] [000000C7] 0xc7 (unreliable)
Mar 19 16:54:12 albook kernel: [DE65D980] [EC487FD4] bcm43xx_phy_read+0x7c/0x94
[bcm43xx_mac80211]
Mar 19 16:54:12 albook kernel: [DE65D990] [EC48FE30]
bcm43xx_phy_initb5+0x16c/0x4a0 [bcm43xx_mac80211]
Mar 19 16:54:12 albook kernel: [DE65D9B0] [EC49018C]
bcm43xx_phy_initg+0x28/0xc5c [bcm43xx_mac80211]
Mar 19 16:54:12 albook kernel: [DE65DA20] [EC490F04]
bcm43xx_phy_early_init+0x144/0x164 [bcm43xx_mac80211]
Mar 19 16:54:12 albook kernel: [DE65DA40] [EC484284]
bcm43xx_wireless_core_init+0x4cc/0xae4 [bcm43xx_mac80211]
Mar 19 16:54:12 albook kernel: [DE65DA90] [EC485C48]
bcm43xx_add_interface+0x68/0x12c [bcm43xx_mac80211]
Mar 19 16:54:12 albook kernel: [DE65DAB0] [EC247DAC] ieee80211_open+0x250/0x3b0
[mac80211]
Mar 19 16:54:12 albook kernel: [DE65DAF0] [C0251FA8] dev_open+0x60/0xc8
Mar 19 16:54:12 albook kernel: [DE65DB10] [C024FD28] dev_change_flags+0x70/0x148
Mar 19 16:54:12 albook kernel: [DE65DB30] [C025B500] rtnl_setlink+0x278/0x3f0
Mar 19 16:54:12 albook kernel: [DE65DBC0] [C025AC80] rtnetlink_rcv_msg+0x218/0x244
Mar 19 16:54:12 albook kernel: [DE65DBF0] [C0269F4C] netlink_run_queue+0x78/0x148
Mar 19 16:54:12 albook kernel: [DE65DC20] [C025A9E0] rtnetlink_rcv+0x40/0x6c
Mar 19 16:54:12 albook kernel: [DE65DC50] [C026A57C] netlink_data_ready+0x28/0x84
Mar 19 16:54:12 albook kernel: [DE65DC60] [C0268FB0] netlink_sendskb+0x34/0x70
Mar 19 16:54:12 albook kernel: [DE65DC80] [C026A538] netlink_sendmsg+0x2a0/0x2bc
Mar 19 16:54:12 albook kernel: [DE65DCD0] [C0244498] sock_sendmsg+0xd0/0x100
Mar 19 16:54:12 albook kernel: [DE65DDC0] [C0244690] sys_sendmsg+0x1c8/0x254
Mar 19 16:54:12 albook kernel: [DE65DF00] [C0245BD4] sys_socketcall+0x1c4/0x1fc
Mar 19 16:54:12 albook kernel: [DE65DF40] [C0013010] ret_from_syscall+0x0/0x38
Mar 19 16:54:12 albook kernel: --- Exception: c00 at 0xf1dc62c
Mar 19 16:54:12 albook kernel:     LR = 0xfb63838
Mar 19 16:54:12 albook kernel: Instruction dump:
Mar 19 16:54:12 albook kernel: 7fe3fb78 7f804800 41be0018 4bfffe39 2f830000
38600000 6063ffff 409e0020 
Mar 19 16:54:12 albook kernel: 813f0000 7c09f214 7c0004ac 7c00062c <0c000000>
4c00012c 5403043e 80010014

Comment 1 Will Woods 2007-04-04 19:44:59 UTC
Tried kernel-2.6.20-1.3016.2.1.fc7.jwltest.6.ppc.rpm, with the same results -
Machine Check in ssb_pci_read16 while doing bcm43xx_phy_initb5.

Comment 2 David Woodhouse 2007-04-05 18:47:12 UTC
At https://lists.berlios.de/pipermail/bcm43xx-dev/2007-March/004324.html there's
a hack which helps the kernel recover from these machine checks, for bcm43xx.
You could do the same in the bcm43xx_mac80211 driver.

The mac80211 driver is a little behind with the fixes we've put into bcm43xx
recently.

Comment 3 David Woodhouse 2007-04-05 18:49:13 UTC
Note: I mean you can do the same for bcm43xx_mac80211 in order to find the
places we're poking at bad registers and fix them. I wouldn't recommend
_shipping_ anything like that :)

Comment 4 John W. Linville 2007-04-09 20:48:13 UTC
I have added an experimental patch to the test kernels here:

   http://people.redhat.com/linville/kernels/fc7/

These are based on portions of the patches that fixed the softmac version of 
the driver.  Please give them a try and post the results here...thanks!

Comment 5 Will Woods 2007-04-12 21:17:34 UTC
Crashes in the same place with kernel 2.6.20-1.3059.fc7 and
2.6.20-2.1.fc7.jwltest.8 - same NIP and everything.

Comment 6 John W. Linville 2007-04-13 18:46:17 UTC
Created attachment 152576 [details]
dwmw2-mac80211.patch

mac80211 version of patch from comment 2

Comment 7 John W. Linville 2007-04-13 19:42:59 UTC
Will, please apply this patch to a current rawhide kernel and reproduce the 
issue.  Let me know if you need assistance!

Comment 8 Will Woods 2007-04-13 22:42:35 UTC
Patch applied to kernel 3062. It still defaults to using bcm43xx instead of
bcm43xx_mac80211. That doesn't work because the firmware is too new. But after I
rmmod both and modprobe bcm43xx_mac80211 - holy cow, my wireless works!

Which presents a problem: how do I know which register access broke stuff when
it's not breaking stuff anymore?

Falling back to the normal 3062 kernel it still breaks in the usual way, so I'm
not really sure what to report. I can capture the copious spew from the kernel
when the wireless works, if it helps, but AFACT no machine check happens at all.

Any advice on what phy_write/phy_read calls would be interesting?

Comment 9 David Woodhouse 2007-04-16 01:12:06 UTC
FWIW you can have both versions of firmware present in /lib/firmware and use the
'fwpostfix' module option to make the mac80211 driver look for the v4 firmware.

I have this in /etc/modprobe.conf:
        options bcm43xx-mac80211 fwpostfix=-v4

My v4 firmware is /lib/firmware/bcm43xx_*-v4.fw

There's a case to be made for having the fwpostfix option set by _default_ for
the mac80211 module, and having bcm43xx_fwcutter generate files with the
appropriate filenames by default.

Comment 10 John W. Linville 2007-04-16 13:35:28 UTC
Created attachment 152683 [details]
dwmw2-bcm43xx-mac80211-debug.patch

Undoubtedly better version directly from dwmw2...

Comment 11 Will Woods 2007-04-16 21:25:44 UTC
Okay, I've applied that patch and I've got some interesting output in
/var/log/messages and dmesg. I'm assuming that the "Read phy reg %x; got 0xFFFF"
messages are the interesting ones. Here are the registers listed:

41a 41b 420 429 42b 481 482 488 489 48c 496 4a0 4a1 4a2 4a5 4a8 4ab 814 815

There's also a good amount of this message:

IN from bad port ed2773fe at ec157dbc

Here's a sample trace:
Badness at arch/powerpc/kernel/traps.c:347
Call Trace:
[CB8E1A40] [C0008C18] show_stack+0x50/0x184 (unreliable)
[CB8E1A60] [C012C358] report_bug+0x84/0xc8
[CB8E1A70] [C02CBD54] __kprobes_text_start+0x154/0x538
[CB8E1AC0] [C00136D4] ret_from_except_full+0x0/0x4c
--- Exception: 700 at machine_check_exception+0x170/0x2f0
    LR = machine_check_exception+0x15c/0x2f0
[CB8E1BA0] [C00136D4] ret_from_except_full+0x0/0x4c
--- Exception: 200 at ssb_modexit+0x28/0x1740 [ssb]
    LR = bcm43xx_phy_read+0x78/0xb4 [bcm43xx_mac80211]
[CB8E1C60] [00004C00] 0x4c00 (unreliable)
[CB8E1C70] [EC323024] bcm43xx_phy_read+0x78/0xb4 [bcm43xx_mac80211]
[CB8E1C80] [EC3240EC] bcm43xx_phy_agcsetup+0x324/0x580 [bcm43xx_mac80211]
[CB8E1CA0] [EC32A214] bcm43xx_phy_inita+0xb68/0xdf0 [bcm43xx_mac80211]
[CB8E1CC0] [EC32B30C] bcm43xx_phy_initg+0x50/0xcdc [bcm43xx_mac80211]
[CB8E1D30] [EC32C0DC] bcm43xx_phy_early_init+0x144/0x164 [bcm43xx_mac80211]
[CB8E1D50] [EC31F2E0] bcm43xx_wireless_core_init+0x4b4/0xac4 [bcm43xx_mac80211]
[CB8E1DA0] [EC320C9C] bcm43xx_add_interface+0x68/0x12c [bcm43xx_mac80211]
[CB8E1DC0] [EC247CF0] ieee80211_open+0x250/0x3d0 [mac80211]
[CB8E1E00] [C02533A4] dev_open+0x60/0xc8
[CB8E1E20] [C0251124] dev_change_flags+0x70/0x148
[CB8E1E40] [C029DB58] devinet_ioctl+0x274/0x6b8
[CB8E1EA0] [C029E478] inet_ioctl+0xa8/0xdc
[CB8E1EB0] [C0244ED4] sock_ioctl+0x248/0x284
[CB8E1ED0] [C00A6058] do_ioctl+0x38/0x84
[CB8E1EE0] [C00A6484] vfs_ioctl+0x3e0/0x414
[CB8E1F10] [C00A6520] sys_ioctl+0x68/0x98
[CB8E1F40] [C0013078] ret_from_syscall+0x0/0x38
--- Exception: c00 at 0xf3f5008
    LR = 0xf3f4fa0
Read phy reg 481; got 0xFFFF.
IN from bad port ed2773fe at ec157dbc

Most of them seem to be approximately the same thing, but from from various
addresses in bcm43xx_phy_agcsetup or bcm43xx_radio_init2050.

Comment 12 Will Woods 2007-04-16 21:29:29 UTC
Created attachment 152718 [details]
kernel messages while running dwmw's patch

generated with sed -rne 's/^Apr 16 .* kernel: //p' /var/log/messages

Comment 13 John W. Linville 2007-04-19 18:37:41 UTC
Will, please try the latest rawhide kernel you can grab (straight from brew if 
earlier than 4/20) and try to recreate this...thanks!

Comment 14 Will Woods 2007-04-20 20:43:44 UTC
It works! Hooray!

I still haven't quite figured out how to make it stop loading the old bcm43xx
driver, so I just blacklisted it. The wireless comes right up now, just like it
should. 

I'm closing this now. Thanks!

Comment 15 Will Woods 2007-09-12 16:20:12 UTC
b43_legacy, in current rawhide, appears to have the same problem as originally
described here. I'll attach the log but the symptom is the same - "Oops: Machine
check, sig: 7 [#1]" while in b43legacy_phy_initb5 -> b43legacy_phy_read ->
ssb_pci_read16.

Comment 16 Will Woods 2007-09-12 16:21:31 UTC
Created attachment 193621 [details]
portion of /var/log/messages showing crash dump

Comment 17 Larry Finger 2007-09-12 20:17:09 UTC
One pertinent detail was not listed in the original post, but I'm assuming that
this is a BCM4306/r that has a rev 1 PHY. If not, please let me know.

Comment 18 Will Woods 2007-09-12 20:24:17 UTC
dmesg says:

b43legacy-phy0: Broadcom 4306 WLAN found
b43legacy-phy0 debug: Found PHY: Analog 1, Type 2, Revision 1
b43legacy-phy0 debug: Found Radio: Manuf 0x17F, Version 0x2050, Revision 2
b43legacy-phy0 debug: Radio turned off

So I guess that's a yes.

Comment 19 Larry Finger 2007-09-13 19:39:49 UTC
Created attachment 195141 [details]
Trial patch for problem

Please try this patch to see if it fixes your problem.

Thanks,  Larry

Comment 20 Larry Finger 2007-09-14 21:37:45 UTC
Created attachment 196261 [details]
Patch that may resolve some endian issues

This patch fixes most of the warnings issued by sparse on the b43legacy code.
Some of them are endian-related and may resolve the problem with ppc
architecture.

Comment 21 Will Woods 2007-09-25 21:26:02 UTC
Everything seems fine with current rawhide kernel (2.6.23-rc8). Closing.


Note You need to log in before you can comment on or make changes to this bug.