Bug 190776 - 2.6.16-1.2185_FC6 spinlocks CPU #0
2.6.16-1.2185_FC6 spinlocks CPU #0
Status: CLOSED RAWHIDE
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
rawhide
powerpc Linux
medium Severity medium
: ---
: ---
Assigned To: David Woodhouse
Brian Brock
:
: 190592 (view as bug list)
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2006-05-04 21:13 EDT by Joshua Wulf
Modified: 2014-10-19 18:54 EDT (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2006-05-05 12:34:28 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Joshua Wulf 2006-05-04 21:13:38 EDT
Description of problem: When booting this kernel the machine locks up with a
spin lock error


Version-Release number of selected component (if applicable):
2.6.16-1.2185_FC6

How reproducible:
100%

Steps to Reproduce:
1. Boot this kernel on an iBook G4 1.33Mhz
2. Watch it lock up after udev
3.
  
Actual results:
Locks up

Expected results:
I guess it should boot up like normal

Additional info:
Comment 1 David Woodhouse 2006-05-04 21:46:17 EDT
That's a sucky message. Wot no backtrace?

Thank $DEITY for xmon. In the 2.6.16-1.2187 kernel, it's a spinlock at
ieee80211softmac_start_scan+0x58/0xd4 from
ieee80211softmac_assoc_work+0x380/0x52x from run_workqueue.
Comment 2 David Woodhouse 2006-05-04 22:08:21 EDT
Quite possibly caused by my linux-2.6-bcm43xx-assoc-on-startup.patch which 
adds a schedule_work(&bcm->softmac->associnfo.work); in bcm43xx_init_board() to
make sure we actually associate when the link is brought up.

Did this ever get fixed properly in softmac?

Building a test kernel now to verify...
Comment 3 David Woodhouse 2006-05-04 22:29:10 EDT
Yeah, removing that patch fixes the reported problem. 

Leaves us with a machine check in bcm43xx_phy_read+0x1c/0x2c from
bcm43xx_phy_initg+0xe04/0xe54
bcm43xx_phy_calibrate+0xe8/0x118
bcm43xx_init_board+0x2f8/0x624
dev_open dev_change_flags devinet_ioctl blah...
Comment 4 Joshua Wulf 2006-05-04 22:33:44 EDT
Does this mean that bcm43xx will load up without manual intervention with this
kernel? At the moment it doesn't load on my machine without a modprobe.
Comment 5 David Woodhouse 2006-05-04 22:46:22 EDT
We deliberately prevented it from autoloading in FC5 because it was a bit too
new and exciting. It's loaded automatically in rawhide though -- and that's what
is killing your machine. I've removed the patch from CVS, so after the next
build it won't die like that -- it'll die differently, as shown in comment #3.
Comment 6 David Woodhouse 2006-05-04 22:52:07 EDT
I can't see anything obvious which has changed recently in softmac or bcm43xx
which should cause this. Nothing changed upstream since April 26th. 

Which was the latest rawhide kernel that worked?
Comment 7 David Woodhouse 2006-05-05 11:28:34 EDT
Looks like it never worked since it was merged with Linus' tree. Our FC5 kernel
actually had a slightly older snapshot, from just before it got broken.

I can 'fix' it by doing this...

-- bcm43xx_phy.c.orig	2006-05-05 16:26:43.000000000 +0100
+++ bcm43xx_phy.c	2006-05-05 16:27:28.000000000 +0100
@@ -1288,10 +1288,14 @@ static void bcm43xx_phy_initg(struct bcm
 		bcm43xx_phy_write(bcm, 0x0805, 0x3230);
 	bcm43xx_phy_init_pctl(bcm);
 	if (bcm->chip_id == 0x4306 && bcm->chip_package != 2) {
+		printk("Would kill you now. chip_package %d\n",
+			bcm->chip_package);
+#if 0
 		bcm43xx_phy_write(bcm, 0x0429,
 				  bcm43xx_phy_read(bcm, 0x0429) & 0xBFFF);
 		bcm43xx_phy_write(bcm, 0x04C3,
 				  bcm43xx_phy_read(bcm, 0x04C3) & 0x7FFF);
+#endif
 	}
 }
 

We still have the problem that it doesn't associate on 'ifconfig up' though,
since I had to remove the patch which fixes that.
Comment 8 David Woodhouse 2006-05-05 11:51:00 EDT
Going back to the original problem... the offending spinlock isn't sm->lock. 

It's sm->ieee->dev->xmit_lock (which is locked in netif_tx_disable()).
Comment 9 David Woodhouse 2006-05-05 12:17:50 EDT
And it's fixed if I rediff the original patch so that it doesn't get misapplied.
Comment 10 David Woodhouse 2006-05-05 12:34:28 EDT
Should both be fixed in kernel-2_6_16-1_2194_FC6.

I've sent the patch for the machine check upstream, and I've also re-sent the
(re-diffed) patch to associate on startup.
Comment 11 David Woodhouse 2006-05-07 08:23:46 EDT
*** Bug 190592 has been marked as a duplicate of this bug. ***
Comment 12 Steve Grubb 2006-05-08 09:07:59 EDT
I have updated to the 2196 kernel. It does not lock up with bad magic like the
previous versions, but networking doesn't work either. The 2139 kernel does
work. I first noticed the problem in the 2174 build. So, somewhere between 2139
& 2174 the problem was introduced.

The error I get is "Error: Microcode "bcm43xx_microcode5.fw" not available or
load failed."
Comment 13 David Woodhouse 2006-05-08 09:17:52 EDT
(In reply to comment #12)
> The error I get is "Error: Microcode "bcm43xx_microcode5.fw" not available or
> load failed."

It's failing to load the firmware. Does /lib/firmware/bcm43xx_microcode5.fw exist?
Comment 14 Steve Grubb 2006-05-08 09:23:17 EDT
No. Locate does not show that file anywhere on my system. Is it supposed to be
packaged?
Comment 15 David Woodhouse 2006-05-08 09:34:42 EDT
No, it's not packaged. It's firmware which needs to be extracted from the MacOS
or Windows driver. Install bcm43xx-fwcutter and follow the instructions therein.

In comment #12 you said that the 2139 kernel does work. Are you telling me that
you had the bcm43xx driver working _without_ having the firmware for it
installed? I find that unlikely.
Comment 16 Steve Grubb 2006-05-08 10:04:48 EDT
Problem turned out to be a device re-ordering problem. Adding HWADDR to eth0
fixed it.

Note You need to log in before you can comment on or make changes to this bug.