When NetworkManager starts on my 12" powerbook, the kernel (2.6.20-1.2997.fc7) crashes into the debugger. Info follows: Mar 19 16:04:55 albook kernel: bcm43xx_mac80211: Adding Interface type 2 Mar 19 16:04:55 albook kernel: bcm43xx_mac80211: Found PHY: Analog 1, Type 2, Revision 1 Mar 19 16:04:55 albook kernel: bcm43xx_mac80211: Found Radio: Manuf 0x17F, Version 0x2050, Revision 2 Mar 19 16:54:12 albook kernel: Machine check in kernel mode. Mar 19 16:54:12 albook kernel: Caused by (from SRR1=149030): Transfer error ack signal Mar 19 16:54:12 albook kernel: BUG: soft lockup detected on CPU#0! Mar 19 16:54:12 albook kernel: Call Trace: Mar 19 16:54:12 albook kernel: [E39A5BB0] [C0008C18] show_stack+0x50/0x184 (unreliable) Mar 19 16:54:12 albook kernel: [E39A5BD0] [C0065060] softlockup_tick+0xb4/0xd0 Mar 19 16:54:12 albook kernel: [E39A5BF0] [C003DE30] run_local_timers+0x18/0x28 Mar 19 16:54:12 albook kernel: [E39A5C00] [C003DE80] update_process_times+0x40/0x7c Mar 19 16:54:12 albook kernel: [E39A5C10] [C001023C] timer_interrupt+0xcc/0x580 Mar 19 16:54:12 albook kernel: --- Exception: 0 at 0xc1c65b10 Mar 19 16:54:12 albook kernel: LR = 0x0 Mar 19 16:54:12 albook kernel: [E39A5C80] [C00136B8] ret_from_except+0x0/0x14 (unreliable) Mar 19 16:54:12 albook kernel: --- Exception: 901 at handle_IRQ_event+0x30/0xa0 Mar 19 16:54:12 albook kernel: LR = handle_fasteoi_irq+0xc4/0x128 Mar 19 16:54:12 albook kernel: [E39A5D40] [00000000] 0x0 (unreliable) Mar 19 16:54:12 albook kernel: [E39A5D60] [C0066AB4] handle_fasteoi_irq+0xc4/0x128 Mar 19 16:54:12 albook kernel: [E39A5D80] [C0006828] do_IRQ+0x8c/0xcc Mar 19 16:54:12 albook kernel: [E39A5D90] [C00136B8] ret_from_except+0x0/0x14 Mar 19 16:54:12 albook kernel: --- Exception: 501 at _spin_unlock_irq+0x1c/0x2c Mar 19 16:54:12 albook kernel: LR = _spin_unlock_irq+0x10/0x2c Mar 19 16:54:12 albook kernel: [E39A5E60] [C02C83F0] schedule+0x664/0x6f0 Mar 19 16:54:12 albook kernel: [E39A5E90] [C0034CE8] do_syslog+0x12c/0x4b8 Mar 19 16:54:12 albook kernel: [E39A5EE0] [C00DF040] kmsg_read+0x50/0x64 Mar 19 16:54:12 albook kernel: [E39A5EF0] [C0097C4C] vfs_read+0xec/0x1c8 Mar 19 16:54:12 albook kernel: [E39A5F10] [C00980AC] sys_read+0x4c/0x8c Mar 19 16:54:12 albook kernel: [E39A5F40] [C0013010] ret_from_syscall+0x0/0x38 Mar 19 16:54:12 albook kernel: --- Exception: c00 at 0x1ff387d4 Mar 19 16:54:12 albook kernel: LR = 0x2000241c Mar 19 16:54:12 albook kernel: Oops: Machine check, sig: 7 [#1] Mar 19 16:54:12 albook kernel: Mar 19 16:54:12 albook kernel: Modules linked in: sg(U) scsi_mod(U) autofs4(U) hidp(U) hci_usb(U) rfcomm(U) l2cap(U) bluetooth(U) sunrpc(U) ip_tables(U) xt_tcpudp(U) ip6t_REJECT(U) ip6table_ filter(U) ip6_tables(U) x_tables(U) ipv6(U) nls_utf8(U) hfsplus(U) parport_pc(U) lp(U) parport(U) snd_aoa_i2sbus(U) snd_powermac(U) snd_seq_dummy(U) snd_seq_oss(U) snd_seq_midi_event(U) snd_ seq(U) snd_seq_device(U) snd_pcm_oss(U) snd_mixer_oss(U) snd_pcm(U) snd_timer(U) snd_page_alloc(U) snd(U) soundcore(U) ide_cd(U) cdrom(U) snd_aoa_soundbus(U) fw_ohci(U) fw_core(U) bcm43xx(U) ieee80211softmac(U) ieee80211(U) ieee80211_crypt(U) arc4(U) ecb(U) blkcipher(U) rc80211_simple(U) bcm43xx_mac80211(U) ssb(U) mac80211(U) cfg80211(U) sungem(U) sungem_phy(U) dm_snapshot(U) d m_zero(U) dm_mirror(U) dm_mod(U) ext3(U) jbd(U) mbcache(U) ehci_hcd(U) ohci_hcd(U) uhci_hcd(U) Mar 19 16:54:12 albook kernel: NIP: EC130834 LR: EC487FD4 CTR: EC1307E0 Mar 19 16:54:12 albook kernel: REGS: de65d8c0 TRAP: 0200 Not tainted (2.6.20-1.2997.fc7) Mar 19 16:54:12 albook kernel: MSR: 00149030 <EE,ME,IR,DR> CR: 24000422 XER: 20000000 Mar 19 16:54:12 albook kernel: TASK = df172680[2400] 'NetworkManager' THREAD: de65c000 Mar 19 16:54:12 albook kernel: GPR00: 0000DD80 DE65D970 DF172680 C0BA2D40 C0BA2E2C 0000042B 00000002 FFFF659F Mar 19 16:54:12 albook kernel: GPR08: 00000000 EA098000 24000084 00001032 00000000 1006AF82 00000000 00000000 Mar 19 16:54:12 albook kernel: GPR16: 00000000 00000000 00000000 00000000 00000000 00000000 E5787120 C0BA2D40 Mar 19 16:54:12 albook kernel: GPR24: 00000001 00000002 00000000 00000002 E5709254 000000C7 000003FE C0BA2D40 Mar 19 16:54:12 albook kernel: NIP [EC130834] ssb_pci_read16+0x54/0x74 [ssb] Mar 19 16:54:12 albook kernel: LR [EC487FD4] bcm43xx_phy_read+0x7c/0x94 [bcm43xx_mac80211] Mar 19 16:54:12 albook kernel: Call Trace: Mar 19 16:54:12 albook kernel: [DE65D970] [000000C7] 0xc7 (unreliable) Mar 19 16:54:12 albook kernel: [DE65D980] [EC487FD4] bcm43xx_phy_read+0x7c/0x94 [bcm43xx_mac80211] Mar 19 16:54:12 albook kernel: [DE65D990] [EC48FE30] bcm43xx_phy_initb5+0x16c/0x4a0 [bcm43xx_mac80211] Mar 19 16:54:12 albook kernel: [DE65D9B0] [EC49018C] bcm43xx_phy_initg+0x28/0xc5c [bcm43xx_mac80211] Mar 19 16:54:12 albook kernel: [DE65DA20] [EC490F04] bcm43xx_phy_early_init+0x144/0x164 [bcm43xx_mac80211] Mar 19 16:54:12 albook kernel: [DE65DA40] [EC484284] bcm43xx_wireless_core_init+0x4cc/0xae4 [bcm43xx_mac80211] Mar 19 16:54:12 albook kernel: [DE65DA90] [EC485C48] bcm43xx_add_interface+0x68/0x12c [bcm43xx_mac80211] Mar 19 16:54:12 albook kernel: [DE65DAB0] [EC247DAC] ieee80211_open+0x250/0x3b0 [mac80211] Mar 19 16:54:12 albook kernel: [DE65DAF0] [C0251FA8] dev_open+0x60/0xc8 Mar 19 16:54:12 albook kernel: [DE65DB10] [C024FD28] dev_change_flags+0x70/0x148 Mar 19 16:54:12 albook kernel: [DE65DB30] [C025B500] rtnl_setlink+0x278/0x3f0 Mar 19 16:54:12 albook kernel: [DE65DBC0] [C025AC80] rtnetlink_rcv_msg+0x218/0x244 Mar 19 16:54:12 albook kernel: [DE65DBF0] [C0269F4C] netlink_run_queue+0x78/0x148 Mar 19 16:54:12 albook kernel: [DE65DC20] [C025A9E0] rtnetlink_rcv+0x40/0x6c Mar 19 16:54:12 albook kernel: [DE65DC50] [C026A57C] netlink_data_ready+0x28/0x84 Mar 19 16:54:12 albook kernel: [DE65DC60] [C0268FB0] netlink_sendskb+0x34/0x70 Mar 19 16:54:12 albook kernel: [DE65DC80] [C026A538] netlink_sendmsg+0x2a0/0x2bc Mar 19 16:54:12 albook kernel: [DE65DCD0] [C0244498] sock_sendmsg+0xd0/0x100 Mar 19 16:54:12 albook kernel: [DE65DDC0] [C0244690] sys_sendmsg+0x1c8/0x254 Mar 19 16:54:12 albook kernel: [DE65DF00] [C0245BD4] sys_socketcall+0x1c4/0x1fc Mar 19 16:54:12 albook kernel: [DE65DF40] [C0013010] ret_from_syscall+0x0/0x38 Mar 19 16:54:12 albook kernel: --- Exception: c00 at 0xf1dc62c Mar 19 16:54:12 albook kernel: LR = 0xfb63838 Mar 19 16:54:12 albook kernel: Instruction dump: Mar 19 16:54:12 albook kernel: 7fe3fb78 7f804800 41be0018 4bfffe39 2f830000 38600000 6063ffff 409e0020 Mar 19 16:54:12 albook kernel: 813f0000 7c09f214 7c0004ac 7c00062c <0c000000> 4c00012c 5403043e 80010014
Tried kernel-2.6.20-1.3016.2.1.fc7.jwltest.6.ppc.rpm, with the same results - Machine Check in ssb_pci_read16 while doing bcm43xx_phy_initb5.
At https://lists.berlios.de/pipermail/bcm43xx-dev/2007-March/004324.html there's a hack which helps the kernel recover from these machine checks, for bcm43xx. You could do the same in the bcm43xx_mac80211 driver. The mac80211 driver is a little behind with the fixes we've put into bcm43xx recently.
Note: I mean you can do the same for bcm43xx_mac80211 in order to find the places we're poking at bad registers and fix them. I wouldn't recommend _shipping_ anything like that :)
I have added an experimental patch to the test kernels here: http://people.redhat.com/linville/kernels/fc7/ These are based on portions of the patches that fixed the softmac version of the driver. Please give them a try and post the results here...thanks!
Crashes in the same place with kernel 2.6.20-1.3059.fc7 and 2.6.20-2.1.fc7.jwltest.8 - same NIP and everything.
Created attachment 152576 [details] dwmw2-mac80211.patch mac80211 version of patch from comment 2
Will, please apply this patch to a current rawhide kernel and reproduce the issue. Let me know if you need assistance!
Patch applied to kernel 3062. It still defaults to using bcm43xx instead of bcm43xx_mac80211. That doesn't work because the firmware is too new. But after I rmmod both and modprobe bcm43xx_mac80211 - holy cow, my wireless works! Which presents a problem: how do I know which register access broke stuff when it's not breaking stuff anymore? Falling back to the normal 3062 kernel it still breaks in the usual way, so I'm not really sure what to report. I can capture the copious spew from the kernel when the wireless works, if it helps, but AFACT no machine check happens at all. Any advice on what phy_write/phy_read calls would be interesting?
FWIW you can have both versions of firmware present in /lib/firmware and use the 'fwpostfix' module option to make the mac80211 driver look for the v4 firmware. I have this in /etc/modprobe.conf: options bcm43xx-mac80211 fwpostfix=-v4 My v4 firmware is /lib/firmware/bcm43xx_*-v4.fw There's a case to be made for having the fwpostfix option set by _default_ for the mac80211 module, and having bcm43xx_fwcutter generate files with the appropriate filenames by default.
Created attachment 152683 [details] dwmw2-bcm43xx-mac80211-debug.patch Undoubtedly better version directly from dwmw2...
Okay, I've applied that patch and I've got some interesting output in /var/log/messages and dmesg. I'm assuming that the "Read phy reg %x; got 0xFFFF" messages are the interesting ones. Here are the registers listed: 41a 41b 420 429 42b 481 482 488 489 48c 496 4a0 4a1 4a2 4a5 4a8 4ab 814 815 There's also a good amount of this message: IN from bad port ed2773fe at ec157dbc Here's a sample trace: Badness at arch/powerpc/kernel/traps.c:347 Call Trace: [CB8E1A40] [C0008C18] show_stack+0x50/0x184 (unreliable) [CB8E1A60] [C012C358] report_bug+0x84/0xc8 [CB8E1A70] [C02CBD54] __kprobes_text_start+0x154/0x538 [CB8E1AC0] [C00136D4] ret_from_except_full+0x0/0x4c --- Exception: 700 at machine_check_exception+0x170/0x2f0 LR = machine_check_exception+0x15c/0x2f0 [CB8E1BA0] [C00136D4] ret_from_except_full+0x0/0x4c --- Exception: 200 at ssb_modexit+0x28/0x1740 [ssb] LR = bcm43xx_phy_read+0x78/0xb4 [bcm43xx_mac80211] [CB8E1C60] [00004C00] 0x4c00 (unreliable) [CB8E1C70] [EC323024] bcm43xx_phy_read+0x78/0xb4 [bcm43xx_mac80211] [CB8E1C80] [EC3240EC] bcm43xx_phy_agcsetup+0x324/0x580 [bcm43xx_mac80211] [CB8E1CA0] [EC32A214] bcm43xx_phy_inita+0xb68/0xdf0 [bcm43xx_mac80211] [CB8E1CC0] [EC32B30C] bcm43xx_phy_initg+0x50/0xcdc [bcm43xx_mac80211] [CB8E1D30] [EC32C0DC] bcm43xx_phy_early_init+0x144/0x164 [bcm43xx_mac80211] [CB8E1D50] [EC31F2E0] bcm43xx_wireless_core_init+0x4b4/0xac4 [bcm43xx_mac80211] [CB8E1DA0] [EC320C9C] bcm43xx_add_interface+0x68/0x12c [bcm43xx_mac80211] [CB8E1DC0] [EC247CF0] ieee80211_open+0x250/0x3d0 [mac80211] [CB8E1E00] [C02533A4] dev_open+0x60/0xc8 [CB8E1E20] [C0251124] dev_change_flags+0x70/0x148 [CB8E1E40] [C029DB58] devinet_ioctl+0x274/0x6b8 [CB8E1EA0] [C029E478] inet_ioctl+0xa8/0xdc [CB8E1EB0] [C0244ED4] sock_ioctl+0x248/0x284 [CB8E1ED0] [C00A6058] do_ioctl+0x38/0x84 [CB8E1EE0] [C00A6484] vfs_ioctl+0x3e0/0x414 [CB8E1F10] [C00A6520] sys_ioctl+0x68/0x98 [CB8E1F40] [C0013078] ret_from_syscall+0x0/0x38 --- Exception: c00 at 0xf3f5008 LR = 0xf3f4fa0 Read phy reg 481; got 0xFFFF. IN from bad port ed2773fe at ec157dbc Most of them seem to be approximately the same thing, but from from various addresses in bcm43xx_phy_agcsetup or bcm43xx_radio_init2050.
Created attachment 152718 [details] kernel messages while running dwmw's patch generated with sed -rne 's/^Apr 16 .* kernel: //p' /var/log/messages
Will, please try the latest rawhide kernel you can grab (straight from brew if earlier than 4/20) and try to recreate this...thanks!
It works! Hooray! I still haven't quite figured out how to make it stop loading the old bcm43xx driver, so I just blacklisted it. The wireless comes right up now, just like it should. I'm closing this now. Thanks!
b43_legacy, in current rawhide, appears to have the same problem as originally described here. I'll attach the log but the symptom is the same - "Oops: Machine check, sig: 7 [#1]" while in b43legacy_phy_initb5 -> b43legacy_phy_read -> ssb_pci_read16.
Created attachment 193621 [details] portion of /var/log/messages showing crash dump
One pertinent detail was not listed in the original post, but I'm assuming that this is a BCM4306/r that has a rev 1 PHY. If not, please let me know.
dmesg says: b43legacy-phy0: Broadcom 4306 WLAN found b43legacy-phy0 debug: Found PHY: Analog 1, Type 2, Revision 1 b43legacy-phy0 debug: Found Radio: Manuf 0x17F, Version 0x2050, Revision 2 b43legacy-phy0 debug: Radio turned off So I guess that's a yes.
Created attachment 195141 [details] Trial patch for problem Please try this patch to see if it fixes your problem. Thanks, Larry
Created attachment 196261 [details] Patch that may resolve some endian issues This patch fixes most of the warnings issued by sparse on the b43legacy code. Some of them are endian-related and may resolve the problem with ppc architecture.
Everything seems fine with current rawhide kernel (2.6.23-rc8). Closing.