Bug 190776

Summary:	2.6.16-1.2185_FC6 spinlocks CPU #0
Product:	[Fedora] Fedora	Reporter:	Joshua Wulf <jwulf>
Component:	kernel	Assignee:	David Woodhouse <dwmw2>
Status:	CLOSED RAWHIDE	QA Contact:	Brian Brock <bbrock>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	rawhide	CC:	davej, lcarlon, linville, sgrubb, wtogami
Target Milestone:	---
Target Release:	---
Hardware:	powerpc
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2006-05-05 16:34:28 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Joshua Wulf 2006-05-05 01:13:38 UTC

Description of problem: When booting this kernel the machine locks up with a
spin lock error


Version-Release number of selected component (if applicable):
2.6.16-1.2185_FC6

How reproducible:
100%

Steps to Reproduce:
1. Boot this kernel on an iBook G4 1.33Mhz
2. Watch it lock up after udev
3.
  
Actual results:
Locks up

Expected results:
I guess it should boot up like normal

Additional info:

Comment 1 David Woodhouse 2006-05-05 01:46:17 UTC

That's a sucky message. Wot no backtrace?

Thank $DEITY for xmon. In the 2.6.16-1.2187 kernel, it's a spinlock at
ieee80211softmac_start_scan+0x58/0xd4 from
ieee80211softmac_assoc_work+0x380/0x52x from run_workqueue.

Comment 2 David Woodhouse 2006-05-05 02:08:21 UTC

Quite possibly caused by my linux-2.6-bcm43xx-assoc-on-startup.patch which 
adds a schedule_work(&bcm->softmac->associnfo.work); in bcm43xx_init_board() to
make sure we actually associate when the link is brought up.

Did this ever get fixed properly in softmac?

Building a test kernel now to verify...

Comment 3 David Woodhouse 2006-05-05 02:29:10 UTC

Yeah, removing that patch fixes the reported problem. 

Leaves us with a machine check in bcm43xx_phy_read+0x1c/0x2c from
bcm43xx_phy_initg+0xe04/0xe54
bcm43xx_phy_calibrate+0xe8/0x118
bcm43xx_init_board+0x2f8/0x624
dev_open dev_change_flags devinet_ioctl blah...

Comment 4 Joshua Wulf 2006-05-05 02:33:44 UTC

Does this mean that bcm43xx will load up without manual intervention with this
kernel? At the moment it doesn't load on my machine without a modprobe.

Comment 5 David Woodhouse 2006-05-05 02:46:22 UTC

We deliberately prevented it from autoloading in FC5 because it was a bit too
new and exciting. It's loaded automatically in rawhide though -- and that's what
is killing your machine. I've removed the patch from CVS, so after the next
build it won't die like that -- it'll die differently, as shown in comment #3.

Comment 6 David Woodhouse 2006-05-05 02:52:07 UTC

I can't see anything obvious which has changed recently in softmac or bcm43xx
which should cause this. Nothing changed upstream since April 26th. 

Which was the latest rawhide kernel that worked?

Comment 7 David Woodhouse 2006-05-05 15:28:34 UTC

Looks like it never worked since it was merged with Linus' tree. Our FC5 kernel
actually had a slightly older snapshot, from just before it got broken.

I can 'fix' it by doing this...

-- bcm43xx_phy.c.orig	2006-05-05 16:26:43.000000000 +0100
+++ bcm43xx_phy.c	2006-05-05 16:27:28.000000000 +0100
@@ -1288,10 +1288,14 @@ static void bcm43xx_phy_initg(struct bcm
 		bcm43xx_phy_write(bcm, 0x0805, 0x3230);
 	bcm43xx_phy_init_pctl(bcm);
 	if (bcm->chip_id == 0x4306 && bcm->chip_package != 2) {
+		printk("Would kill you now. chip_package %d\n",
+			bcm->chip_package);
+#if 0
 		bcm43xx_phy_write(bcm, 0x0429,
 				  bcm43xx_phy_read(bcm, 0x0429) & 0xBFFF);
 		bcm43xx_phy_write(bcm, 0x04C3,
 				  bcm43xx_phy_read(bcm, 0x04C3) & 0x7FFF);
+#endif
 	}
 }
 

We still have the problem that it doesn't associate on 'ifconfig up' though,
since I had to remove the patch which fixes that.

Comment 8 David Woodhouse 2006-05-05 15:51:00 UTC

Going back to the original problem... the offending spinlock isn't sm->lock. 

It's sm->ieee->dev->xmit_lock (which is locked in netif_tx_disable()).

Comment 9 David Woodhouse 2006-05-05 16:17:50 UTC

And it's fixed if I rediff the original patch so that it doesn't get misapplied.

Comment 10 David Woodhouse 2006-05-05 16:34:28 UTC

Should both be fixed in kernel-2_6_16-1_2194_FC6.

I've sent the patch for the machine check upstream, and I've also re-sent the
(re-diffed) patch to associate on startup.

Comment 11 David Woodhouse 2006-05-07 12:23:46 UTC

*** Bug 190592 has been marked as a duplicate of this bug. ***

Comment 12 Steve Grubb 2006-05-08 13:07:59 UTC

I have updated to the 2196 kernel. It does not lock up with bad magic like the
previous versions, but networking doesn't work either. The 2139 kernel does
work. I first noticed the problem in the 2174 build. So, somewhere between 2139
& 2174 the problem was introduced.

The error I get is "Error: Microcode "bcm43xx_microcode5.fw" not available or
load failed."

Comment 13 David Woodhouse 2006-05-08 13:17:52 UTC

(In reply to comment #12)
> The error I get is "Error: Microcode "bcm43xx_microcode5.fw" not available or
> load failed."

It's failing to load the firmware. Does /lib/firmware/bcm43xx_microcode5.fw exist?

Comment 14 Steve Grubb 2006-05-08 13:23:17 UTC

No. Locate does not show that file anywhere on my system. Is it supposed to be
packaged?

Comment 15 David Woodhouse 2006-05-08 13:34:42 UTC

No, it's not packaged. It's firmware which needs to be extracted from the MacOS
or Windows driver. Install bcm43xx-fwcutter and follow the instructions therein.

In comment #12 you said that the 2139 kernel does work. Are you telling me that
you had the bcm43xx driver working _without_ having the firmware for it
installed? I find that unlikely.

Comment 16 Steve Grubb 2006-05-08 14:04:48 UTC

Problem turned out to be a device re-ordering problem. Adding HWADDR to eth0
fixed it.