Bug 533746

Summary: Fedora 12 livecd freezes at udev on Acer Aspire One D250
Product: [Fedora] Fedora Reporter: Vlad Dimitriu <vlad>
Component: kernelAssignee: John W. Linville <linville>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: urgent Docs Contact:
Priority: low    
Version: 12CC: acathrow, airlied, awilliam, beland, bruno, caldodge, dougsland, dxm523, gansalmon, hafflys, itamar, j.a.watson, jphuc, jvsmith, kernel-maint, kurt, larry.finger, linville, luke, M8R-7fin56, mads, ml
Target Milestone: ---Keywords: CommonBugs, Patch, Triaged
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard: https://fedoraproject.org/wiki/Common_F12_bugs#aspire-one-ssb-hang
Fixed In Version: kernel-2.6.32.10-90.fc12 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 574895 576278 (view as bug list) Environment:
Last Closed: 2010-03-26 23:38:49 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 574895, 576278    
Attachments:
Description Flags
Output of "lspci -vvv"
none
Output of smoltSendProfile
none
ssb_check_for_sprom.patch
none
0001-ssb-do-not-read-SPROM-if-it-does-not-exist.patch
none
0001-ssb-do-not-read-SPROM-if-it-does-not-exist.patch none

Description Vlad Dimitriu 2009-11-08 22:25:55 UTC
Description of problem:


Version-Release number of selected component (if applicable):


How reproducible:

Boot from SDCard the livecd.

write image on SDCard
dd if=F12-Beta-i686-Live-KDE.iso of=/dev/sdb
 or lxde-i386-20091107.20.iso

Steps to Reproduce:
1.write iso image on SDCard
dd if=F12-Beta-i686-Live-KDE.iso of=/dev/sdb
 or lxde-i386-20091107.20.iso
2.boot from the usb multicard with or without combinations of acpi=off pci=noacpi security=off intel_iommu=off without quiet ...
  
Actual results:
After udev daemon starts there are 3 seconds of disk activity and than the system freezes (No CapsLock).

Expected results:
To boot the Fedora12 livecd on the AccerAspireOneD250 and do a proper install.

Additional info:
On the SATA harddisk there are two ext4 partitions with Ubuntu Karmic.

Comment 1 Anonymous account 2009-11-09 03:01:54 UTC
I've had this problem on my PC with the Beta 2 LiveCD (non-KDE), the Beta 2 LXDE LiveCD, and a couple of LXDE nightlies since then (most recent being the 5th, those using LiveUSB creator and liveiso-to-usb).  I posted about it in a similar bug, but recieved no response.

Comment 2 Bug Zapper 2009-11-16 15:19:57 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 12 development cycle.
Changing version to '12'.

More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 3 Peter Robinson 2009-11-22 01:51:11 UTC
I now have a pretty paperweight (Acer Aspire one D250) these comments are for when this bug is addressed:

Please do subsequent testing on the D250 model, see below.  Acer has changed 'something' for this model.

Not specifically related to this bug, but I've read everyone's glowing statements about F10, F11 and F12 on the Acer Aspire one.  Those comments do not apply to the D250 model.

F10:  Live boots, installs fine. But the NIC and wireless h/w don't even appear to this OS.  Period.  No network connectivity possible even obtaining and compiling in the tg3 drivers manually.

F11:  live boots, installs fine.  The 10/100 NIC at least appears - you see the MAC address.  Wireless doesn't even appear.  Still no network access.  F11 seems slower than F10 - even keyboard responsiveness.

Willing to be a guinea pig for any rush patches to F12 assist!

Comment 4 Peter Robinson 2009-11-22 02:24:06 UTC
Install DVD i386 gets to 'waiting for hardware to initialize...'

F12 Live I've tried removing 'quiet rhgb' and all combinations of 'noprobe nomodeset acpi=off acpi=noirq noacpi pci=noacpi noapm nodma nolapic noapic nolapic_timer'

Still no luck.

Comment 5 Calvin Dodge 2009-11-24 18:22:21 UTC
I hate to be a "me, too" guy, but I have the same experience while trying to install F12 on my D250-1165 (which hangs during "initializing hardware").

Some other versions of Linux will boot on this (Gparted 0.4.8.6, Knoppix 6.0, F11 install (it just doesn't recognize the built-in NIC)). I hope that provides some clue as to what's happening.

Comment 6 Stephen Haffly 2009-11-27 22:52:35 UTC
Please add me to the list.  I can't even get the live distro on an SD card to boot, let alone install.

Comment 7 Christopher Beland 2009-11-30 08:14:23 UTC
I also have an Aspire One D250, and I can confirm that the LiveCD locks up at the udev step with desktop-i386-20091129.00.iso (Fedora 13 Rawhide) as well as Fedora-12-i686-Live.iso, but I can boot up and log in with Fedora-11-i686-Live.iso.  There's only the original Windows partition on this machine.

Comment 8 Christopher Beland 2009-12-01 15:21:55 UTC
I did some testing (with original F11 RPMs - no updates except kernel-firmware-2.6.30.9-100.fc11.noarch), and it appears this problem was introduced in the rebase to the 2.6.30 kernel. kernel-2.6.29.6-217.2.6.fc11.i586 boots OK, but as reported, kernel-2.6.30.5-43.fc11.i586 wedges during udev startup, badly enough that pressing Caps Lock doesn't affect the corresponding LED.

It looks like other hardware configurations also stopped booting after this update; see: https://admin.fedoraproject.org/updates/F11/FEDORA-2009-9167

Comment 9 Dave Airlie 2009-12-01 21:06:41 UTC
can you try ignore_loglevel on boot and see what the last printed thing is?

Comment 10 Christopher Beland 2009-12-02 17:23:16 UTC
Booting Fedora-12-i686-Live.iso without rhgb and quiet, but with ignore_loglevel, the last lines printed are:

Starting udev: udev: starting version 145
ACPI: WMI: Mapper loaded
b43-pci-bridge 000:01:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
b43-pci-bridge 000:01:00.0: setting latency timer to 64
intel_rng: FWH not detected

Comment 11 Christopher Beland 2009-12-08 17:54:04 UTC
Created attachment 376960 [details]
Output of "lspci -vvv"

Doing the same thing with desktop-i386-20091203.16.iso (which has kernel-2.6.32-0.65.rc8.git5) I get only:

Starting udev: udev: starting version 147
ACPI: WMI: Mapper loaded
b43-pci-bridge 000:01:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
b43-pci-bridge 000:01:00.0: setting latency timer to 64

I'll attach hardware profile info obtained from booting under F11.

Comment 12 Christopher Beland 2009-12-08 17:55:07 UTC
Created attachment 376962 [details]
Output of smoltSendProfile

Comment 13 Adam Williamson 2009-12-09 17:07:48 UTC
http://fedoraproject.org/wiki/Acer_Aspire_One suggests that the kernel parameter 'ssb.blacklist=1' helps with the AO751h model - could you try that with this model and see if it's maybe the same?

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 14 Christopher Beland 2009-12-09 18:37:55 UTC
Yes, ssb.blacklist=1 enables my F12 LiveUSB image to boot.  Networking isn't working out of the box, but it wasn't working on F11 either.

Comment 15 Vlad Dimitriu 2009-12-09 19:31:10 UTC
I used unetbootin to write the ISO image on the SD Card.
I booted successfully but, the instal stoped with no root device found.

Comment 16 Stephen Haffly 2009-12-10 03:34:21 UTC
I used livecd-iso-to-disk to write the ISO to the SD card after both unetbootin and the Fedora liveusb-creator failed to create the SD card properly.

Adding the ssb.blacklist=1 gives an error that the parameter is not recognized and is being ignored, but F12 will boot. Wired networking is functional out-of-box, but wireless is not.

I did a quick download of gparted to resize the NTFS partition and I am working on installing F12 now.  Thanks Adam.

Comment 17 Stephen Haffly 2009-12-10 05:39:47 UTC
See this thread on Fedora Forum about getting wireless to work.  I just tried it on a fresh F12 installation on the Acer Aspire D250, and it works.  Be sure to add ssb and b43 to the /etc/modprobe.d/blacklist.conf file too.

http://forums.fedoraforum.org/showthread.php?t=234055&highlight=ssb.blacklist%3D1

Comment 18 Stephen Haffly 2009-12-10 05:42:28 UTC
Correction:  It seems that installing kmod-wl from rpmfusion creates a broadcom-wl-blacklist.conf file in which bcm43xx, ssb, b43, and ndiswrapper are all specified, so manually entering them in the blacklist.conf file is probably not needed.

Comment 19 Calvin Dodge 2009-12-10 14:59:16 UTC
(In reply to comment #13)
> http://fedoraproject.org/wiki/Acer_Aspire_One suggests that the kernel
> parameter 'ssb.blacklist=1' helps with the AO751h model - could you try that
> with this model and see if it's maybe the same?
> 

Yes, that did the trick with my D250-1165.

I then ran into the "the installer has tried to mount image 1" problem with my flash drive, but merely added "askmethod" to the boot line, then pointed the computer to an NFS share (yes, the F12 install kernel recognizes the NIC).

It's installing right now. I'll post again on the install's success when it's done.

Comment 20 Christopher Beland 2009-12-10 18:16:59 UTC
Assuming it's the same cause, Fedora-12-i386-netinst.iso gets stuck after the "detecting hardware..." line is printed, wedging with the same symptoms (Caps Lock doesn't work).  Using "ssb.blacklist=1" or "noprobe" gets past the hang.

But the installer assumes I'm doing a hard drive installation and asks me which partition the installation image is.  When I try to force a URL install by using the Back button, it cannot detect either the wired nor wireless network interfaces.  I do have an Ethernet cable plugged in that I tested with a different machine, so I'm not sure why the wired interface isn't working for me but is working for Calvin.

Comment 21 Adam Williamson 2009-12-11 21:31:18 UTC
your systems may have different wired ethernet adapters, I suppose. the 'ssb' module is vital not just for the b43 module (for Broadcom BCM43xx wireless controllers) but also for the b44 module (for Broadcom BCM44xx wired controllers). If your wired controller happens to use the b44 module, blacklisting ssb will cause it not to work...

kernel folks, looks like ssb is busted.

can anyone try loading ssb *after* booting and see if you get some fun logs? or even if it works (that'd be annoying)?

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 22 Adam Williamson 2009-12-11 21:33:12 UTC
"Adding the ssb.blacklist=1 gives an error that the parameter is not recognized
and is being ignored"

this is normal, btw. See http://bugzilla.kernel.org/show_bug.cgi?id=14164 .

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 23 Christopher Beland 2009-12-12 22:09:23 UTC
When I do "modprobe ssb" with kernel-2.6.31.6-166.fc12.i686 and while logged in under Gnome, the system wedges hard enough that Caps Lock doesn't work, no error messages are printed on the screen, and nothing is added to /var/log/messages.

Comment 24 Jason Smith 2010-01-07 20:01:43 UTC
*** Bug 551747 has been marked as a duplicate of this bug. ***

Comment 25 Jason Smith 2010-01-07 20:17:59 UTC
I have an HP Mini 311 with the same Broadcom Corporation BCM4312 802.11b/g (rev
01). Here's what I've found. I downloaded the kernel rpm builds from
https://admin.fedoraproject.org/updates/search/kernel?_csrf_token=b76346f9c9e9fbef7773431aed20e2accef5059a

kernel 2.6.29.4-167.fc11.i686.PAE default F11 kernel boots fine with no issues.
kernel 2.6.29.6-213.fc11.i686.PAE an updated F11 kernel boots w/o issue.
kernel 2.6.29.6-217.2.16.fc11.i686.PAE another updated F11 kernel boots w/o
issue.
kernel 2.6.30.5-43.fc11 locks the system up hard at starting udev. If I had
ssb.blacklist=1 when booting 2.6.30.5-43.fc11.i686.PAE with ssb.blacklist=1 the
system boots w/o issue. 

Hope this helps.

Comment 26 Jason Smith 2010-01-08 23:03:02 UTC
Using kernel 2.6.32.3-10.fc12.i686.PAE downloaded from koji my 311 system still doesn't boot except with adding ssb to the blocklist. 

Just some more information that I don't see in this report. Using lspci -vvvn for the wireless card in the 311 the first line is

03:00.0 0280: 14e4:4315 (rev 01)

According to http://wireless.kernel.org/en/users/Drivers/b43#Known_PCI_devices 14e4:4315 should be supported with 2.6.32 or later.

Comment 27 Adam Williamson 2010-01-11 18:11:27 UTC
has anyone looked to see if they can get any kind of useful traceback from loading ssb? perhaps by booting with it blacklisted then manually loading it after boot ('modprobe ssb')? thanks.



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 28 Jason Smith 2010-01-11 19:59:46 UTC
Adam, thanks for your help. Once I added only 'blacklist ssb' to /etc/modprobe.d/anaconda.conf I could do a modprobe ssb  w/o getting the 'Unknown parameter...' in /var/log/messages. The problem is I don't get any useful information as my machine locks up hard. Same as on boot if I uncommented the 'blacklist ssb' line in /etc/modprobe.d/anaconda.conf. I did this running 2.6.32.3-10.fc12.i686.PAE.

Comment 29 Adam Williamson 2010-01-12 23:19:01 UTC
well, thanks for trying :( so you get nothing at all relevant in /var/log/messages from the time you tried the modprobe and saw the lockup?



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 30 Jason Smith 2010-01-13 04:06:21 UTC
(In reply to comment #29)
> well, thanks for trying :( so you get nothing at all relevant in
> /var/log/messages from the time you tried the modprobe and saw the lockup?

Nothing when doing modprobe ssb. Just a quick hard lockup. Do a modprobe b43 get a few lines of lib80211 stuff then some frequency information and that's all.

Comment 31 Kurt Seifried 2010-02-07 08:22:15 UTC
Acer D250 with a 40gig Intel X25-V SSD, Fedora 12 from DVD (external USB drive) fails to install (waiting for hardware to initialize...), CentOS 5.4 i386 DVD installs fine.

Comment 32 Christopher Beland 2010-02-12 21:59:37 UTC
Nominating as F13 beta blocker, because this is a serious regression (if hardware-specific) and listed on Common Bugs.  I assume it could be fixed using older software or at least hacked so that the machines boot with the problem hardware disabled.  Fix by beta would allow time for adequate testing.

Comment 33 John W. Linville 2010-03-12 18:57:07 UTC
*** Bug 532369 has been marked as a duplicate of this bug. ***

Comment 34 John W. Linville 2010-03-12 19:11:36 UTC
Can the Acer Aspire One D250 users confirm that the hang when loading the ssb driver occurs even after removing b43-openfwwf (indeed, with no /lib/firmware/b43 at all) and even when running a 2.6.32-based kernel?  FWIW, has anyone tried a 2.6.33 (or later) kernel yet?

Comment 35 Adam Williamson 2010-03-12 20:21:33 UTC
Note - testing with 2.6.33 can easily be achieved by trying a Fedora 13 image, either the Alpha or the nightly builds at http://alt.fedoraproject.org/pub/alt/nightly-composes/desktop/ .



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 36 Kurt Seifried 2010-03-12 21:11:11 UTC
Ok got the i386 iso (verified sha256sum), booted, choosing "Boot" option and the logo fills up most of the way and then dies, DVD drive spins down. ctrl-alt-del doesn't work. Power cycled laptop. Tried again, chose "verify and boot", same deal, logo fills up most of the way, system stalls and becomes non responsive, DVD drive spins down. Also tried letting it do automatic boot, same results.

I guess that means it is still broken on the D250 =(.

Comment 37 Christopher Beland 2010-03-12 21:17:24 UTC
I tried desktop-i386-20100310.20.iso.  Got stuck at the udev stage again, until I added ssb.blacklist=1 to the boot parameters.

Comment 38 Christopher Beland 2010-03-13 00:27:38 UTC
With kernel-2.6.32.9-70.fc12.i686, even after removing b43-openfwwf-5.2-3.fc12.noarch and confirming that /lib/firmware/b43 does not exist, I tried rebooting and got the hang again until I added ssb.blacklist=1 as a boot parameter.

Comment 39 Larry Finger 2010-03-13 00:45:19 UTC
After booting with ssb blacklistedf, is it possible to 'modprobe -v ssb'? This command requires root privilege (sudo) and may not be in the default path (I'm not a Fedora user.). If it works, please check the tail of dmesg for any output.

If that works, then try 'modprobe -v mac80211'. Again check dmesg output.

Finally, if that works, try 'modprobe -v b43' and check dmesg output.

I have downloaded the 32-bit live CD and will be trying it.

Comment 40 Adam Williamson 2010-03-13 01:11:53 UTC
larry: I already asked that earlier. Several people have replied that attempting to load ssb freezes the machine.



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 41 Larry Finger 2010-03-13 01:40:49 UTC
Sorry, I missed that info. Is there any possibility of either serial or network console to get any dump info?

It seems that a 'modprobe -v b43' does not lock up instantly (Comment #30). Could someone try issuing that command and immediately switch to a logging console? Is that Ctrl-Alt-F10 on Fedora, or is it somewhere else? We might get some info that way.

On my i686 system with both b43 and b43legacy devices, the Live CD booted just fine and I was able to connect with the b43 using firmware from the openfwwf project. In any case, ssb loaded without error.

Comment 42 Adam Williamson 2010-03-13 01:55:17 UTC
larry: this affects specific models - so far we know for sure of the Acer Aspire One 751h and D250. Many other systems with Broadcom adapters are known to boot successfully.



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 43 Larry Finger 2010-03-13 02:44:58 UTC
Have I told you I hate Netbooks? There are a number of machines that generate DMA errors when trying to use the BCM4312 devices; however, none of them crashed on booting; however, we have been unable to discover the reason. All reported so far will work if the kernel is configured for PIO rather than DMA.

Perhaps the diagnostics from these systems will help with the others. Any info from the crash would be really useful.

Comment 44 Christopher Beland 2010-03-13 03:45:31 UTC
With desktop-i386-20100310.20.iso (which looks like kernel-2.6.33-1), if I add ignore_loglevel to the kernel parameters, I'm now getting:

>>
Starting udev: udev: starting version 151
ACPI: WMI: Mapper loaded
b43-pci-bridge 000:01:00.0: PCI INT A -> GSI (level, low) -> IRQ 16
b43-pci-bridge 000:01:00.0: setting latency timer to 64
ssb: Core 0 found: ChipCommon (cc 0x800, rev 0x16, vendor 0x4243)
ssb: Core 1 found: IEE 802.11 (cc 0x812, rev 0x0F, vendor 0x4243)
ssb: Core 2 found: PCMCIA (cc 0x08D, rev 0x0A, vendor 0x4243)
ssb: Core 3 found: PCI-E (cc 0x820, rev 0x09, vendor 0x4243)
<<

That's transcribed from a photo.  This netbook has no serial cable, only USB, monitor, audio, and Ethernet that's not working because of all this Broadcom craziness.

Comment 45 Christopher Beland 2010-03-13 04:02:06 UTC
I played around with "modprobe -v b43"; all I learned was that it wedges the machine hard when it reaches the insmod step that loads ssb.ko (which I had to do manually because it was confused by the "blacklist" parameter).

Comment 46 Larry Finger 2010-03-13 04:14:17 UTC
Doing the modprobe won't help unless you get to the logging console before it freezes. It may not show anything, but it might.

BTW, the Broadcom wl driver should work. The article at 
http://fedoramobile.org/fc-wireless/broadcom-linux-sta-driver gives the F9 and
F10 links for sneakernetting the necessary RPM file. It will taint your kernel, but you will have networking.

Comment 47 Larry Finger 2010-03-13 04:35:48 UTC
Question for the people with this problem - are any of you in the Kansas City, MO area?

Comment 48 Christopher Beland 2010-03-13 07:45:31 UTC
I can set up networking that works in F12 using F12 RPMs from RPMFusion, but that blacklists ssb as a side effect.  I was tailing /var/log/messages while testing, and if the kernel didn't have time to get output from there back to the screen, I certainly wouldn't have made it to a different virtual terminal.  Normally kernel oopses during boot would be printed directly to the screen where I could see them, but in this case I don't see anything, so I'm thinking the kernel might not survive long enough to even spit out an error.  The hard wedge after insmod is pretty instantaneous.

Comment 49 Jason Smith 2010-03-14 02:11:38 UTC
On my HP Mini 311 that I posted about previously and using the F13 Alpha release the only way I can get the machine to boot is by adding ssb.blacklist=1 to the boot command.

Comment 50 John W. Linville 2010-03-15 15:31:11 UTC
FWIW, having rebuilt 2.6.32.9-67.fc12 locally sith B43 (and B43LEGACY) disabled (along with the B43_PCI_BRIDGE) I was able to load ssb without a hang on the HP Mini here (which hangs w/ the stock configuration).  Trying a build now (taking forever, damned netbooks) w/ B43_PIO=y...

Comment 51 John W. Linville 2010-03-15 15:40:48 UTC
...and that still locks-up tight on modprobe.  I'm not sure if that really points at the problem, since the SSB code won't be exercised (much) w/o b43 to use it.  It is a shame that we can't plug a b44 adapter into these netbooks...

I guess I'll try some "printk" debugging to pinpoint the failure...

Comment 52 Larry Finger 2010-03-15 15:51:48 UTC
I'm confused. At first I thought that modprobing ssb was OK if b43 was not available, now I'm not sure. What is the exact configuration in Comment 51?

Comment 53 Michael Buesch 2010-03-15 16:00:34 UTC
Trying random kernels over and over again certainly is not going to fix the issue. There are basically no changes in the bootup/initialization code since months (years?).
I think somebody has to insert a fair amount of printks to find the place where it hangs.
Also keep in mind that the PCI-E core code _is_ broken. We know that from debugging of the DMA problem. Just as a hint.

Comment 54 John W. Linville 2010-03-15 17:29:28 UTC
Larry, modprobing ssb is fine if you have disabled b43.  If you enable b43 it will lock-up when modprobing ssb (most likely due to the subsequent load of b43), even if using B43_PIO=y -- sorry if that was unclear.

Michael, the "random kernels" is to try to determine a rough cut for pinpointing the problem.  I could start with a printk in start_kernel, but I suspect it may take a while to pinpoint things that way. :-)

Comment 55 Michael Buesch 2010-03-15 17:43:50 UTC
(In reply to comment #54)
> Michael, the "random kernels" is to try to determine a rough cut for
> pinpointing the problem.

Well, is this a regression? Didn't sound like one to me.

> I could start with a printk in start_kernel, but I
> suspect it may take a while to pinpoint things that way. :-)    

That would be pretty silly, because we know that the problem is within the ssb or b43 initialization code. I think that codepath is small enough to add some printks to track down the point of failure. The very first thing would be to check whether the failure occurs in ssb or b43, because I think that is still unclear. Once we got a _rough_ pointer to where the failure is, we can add more specific printks to find the exact place.

Comment 56 Larry Finger 2010-03-15 18:07:49 UTC
I think it is a "regression" only in that LP PHYs were not supported before 2.6.32, thus the problem exists in .32, but not in .31. I have assumed that John's system has PCI ID of 14e4:4315 - at least the other reporters have that card. If you have some other card, its identity is important.

John:

If you do a 'sleep 5; modprobe ssb' and switch to the logging console, does any logging output show up?

If you do have the 4315, does this patch keep the system from freezing?

Index: wireless-testing/drivers/net/wireless/b43/main.c
===================================================================
--- wireless-testing.orig/drivers/net/wireless/b43/main.c
+++ wireless-testing/drivers/net/wireless/b43/main.c
@@ -4024,7 +4024,7 @@ static int b43_phy_versioning(struct b43
 #endif
 #ifdef CONFIG_B43_PHY_LP
        case B43_PHYTYPE_LP:
-               if (phy_rev > 2)
+//             if (phy_rev > 2)
                        unsupported = 1;
                break;
 #endif

Comment 57 J.A. Watson 2010-03-15 21:08:16 UTC
I have just run into the same problem, when loading F13 Alpha.  Once for certain, and once very probably but I am still trying to verify it.

For certain: HP Pavillion dm1-1020ez, Broadcom 4315 wireless adapter.  The Live Image freezes on boot every time.  Adding ssb.blacklist=1 to the boot command (at Adam's suggestion) fixes the freeze, and the Live Image boots and installs ok.  The installed image likewise freezes on boot, so I added the blacklist to the kernel line in the grub menu.lst file.  It then boots without trouble, but of course the Broadcom adapter doesn'w work.

Probable: HP Pavillion dv2-1010ez, Atheros 9285 wireless adapter.  The Live Image hangs on boot intermittently, not always.  Once I got it to boot and install, the installed image doesn't seem to hang at all, at least so far.

Comment 58 Adam Williamson 2010-03-15 21:15:19 UTC
j.a: your 'probable' is some other bug, this is specific to Broadcom chipsets (the ssb driver is only used with Broadcom chipsets). Just FYI, you can actually get the wireless working with ssb blacklisted by using the proprietary Broadcom driver, wl, which I think is available in RPMFusion.



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 59 Daniel Meyerholt 2010-03-16 10:29:16 UTC
Hi,
I can also confirm this bug on a HP Compaq 615. Same symptoms as described above. Not tied to fedora, though. I am running 64bit debian and can only use broadcom wl for wlan. BUT for some reason the b43 driver sometimes does work (every 20 times or so, I rebooted a lot while trying to install) but i can not relate it to some special event, whether it is running windows or the proprietary drivers. As I could see on the bcm43xx list there seem to be some problems related to the ssb code writing to memory ranges it should not to :)
Would be glad to help somehow but am not that kernel hacker ;) Comment 44 reflects that what i have seen, too. Afterwards the screen turns black, no kernel oops.

Comment 60 John W. Linville 2010-03-16 20:21:47 UTC
Larry, loading ssb locks it up tight -- nothing on the console, nothing on netconsole, nothing at all.  As for the LP phy patch from comment 56, that does _not_ avoid the hang.

I'm still poking at it, but I have yet to narrow-down the failing line -- still trying...

Comment 61 Larry Finger 2010-03-17 16:05:10 UTC
AFAIK, these are single CPU computers, thus we probably have a locking problem, or an infinite loop.

One way that might help eliminate a lot of steps would be to build a kernel with MMIO tracing enabled (CONFIG_MMIOTRACE=y). In one console, enter the following (as root):

echo 5600 > /sys/kernel/debug/tracing/buffer_size_kb
echo mmiotrace > /sys/kernel/debug/tracing/current_tracer
cat /sys/kernel/debug/tracing/trace_pipe &

In a second console, 'sleep 10; modprobe b43' and switch back to the first console before the system freezes. The last screenful of trace messages before the freeze should help us determine where it was in the initialization process.

Comment 62 Michael Buesch 2010-03-17 16:41:43 UTC
> AFAIK, these are single CPU computers, thus we probably have a locking problem,

On UP it is extremely unlikely to have locking problems that lock up the complete machine, because spinlocks don't exist. Also, in the initialization code there's basically no concurrency. But it's trivial to rule out locking problems by enabling lockdep.

> or an infinite loop.

I think that is rather unlikely, too.

I think it is just locking up on an invalid bus access. So the kernel tries to read (or possibly write) a register that does not exist and thus the device does not respond on the bus. That would lock up the CPU in hardware.
The Broadcom 43xx device is _known_ to behave undefined (lockups) on access of dangling registers.

> echo 5600 > /sys/kernel/debug/tracing/buffer_size_kb
> echo mmiotrace > /sys/kernel/debug/tracing/current_tracer
> cat /sys/kernel/debug/tracing/trace_pipe &

This might help a bit, but keep in mind that it will most likely not lead you to the actual line of code that locks up the whole machine. Userspace is involved in the "cat" and if it's not able to print the contentds of trace_pipe (because the CPU is busy running b43's initialization code), it won't print everything (or anything at all).
So for this to work you need a preemptible kernel. And even then it won't print everything that you want to see (if it prints something at all).

I think the only way to properly debug this is to insert synchronous printks into the code. But I think I already said that a few times...

(If it really locks up the CPU on an invalid bus access, a kernel level debugger also won't help.)

Comment 63 Larry Finger 2010-03-17 18:31:04 UTC
Once again I demonstrate my ignorance of the entire subject of locking.

I thought we had all the dangling register references worked out of b43. The x86 architecture has been fairly forgiving by returning all ones in reads of non-existent registers. Many were found because the PPC arch generates a machine check. At one point, I had instrumented all the ssb register read code to report the condition. In addition, all the instances that we had earlier were for registers that did not exist on older hardware. I would not expect such problems on new stuff.

I will put my checks back in to see if I find a read of a nonexistent register on my 4315.

Comment 64 Michael Buesch 2010-03-17 18:48:01 UTC
> I will put my checks back in to see if I find a read of a nonexistent register
on my 4315.

It is a lot easier to sprinkle printks over the code to first get an idea of what is going on at all. Nobody (including me) knows:
- Where it crashes (We don't even know for sure which module it crashes in, yet).
- What kind of crash occurs (loop, MMIO, lock, whatever etc...)

I think these two things are the very first things one need to find out _before_ anything else is done.

The printk sprinkling can be done in several iterations. Think of it like a git-bisect. It's a very fast way of narrowing down this type of hard-to-track-down-bug to a few hundred lines of code. For the first patch version I would just add a few printks to determine whether it hangs in SSB or b43.

I do not know if it hangs on some kind of MMIO access, of course. It's just a theory. So it's probably not a good idea to waste a lot of time searching for some "nonexistent registers". (I don't know how you'd correctly do that anyway).

Comment 65 John W. Linville 2010-03-17 19:27:25 UTC
I'm doing the printk thing, currently narrowed down to call of ssb_pci_sprom_get in drivers/ssb/pci.c -- still iterating to pinpoint it further.

Comment 66 Michael Buesch 2010-03-17 20:11:49 UTC
Thanks John for tracking that down.

So this might be one of these devices that completely lack an SPROM (for whatever braindamaged reasons). I had reports of them in the past, but they didn't lock up back then.

We had a few discussions on how to handle these devices back then and it basically boiled down to a solution using the firmware loading mechanism.
It would basically work this way: A userspace script generates an SPROM image and stores it in /lib/firmware for the ssb kernel module to fetch via firmware loading mechanism. The main task of the userspace script is to generate a MAC address in the SPROM image. We cannot generate the SPROM inside of the kernel, because there's no sane way to generate a sufficiently unique MAC address that's also constant across reboots, kernel- or hardwarechanges. So the SPROM needs to be generated _once_ and then be stored on HDD.

I don't have an implementation for that, nor do I plan to do one, however.

Comment 67 Larry Finger 2010-03-17 21:12:54 UTC
John: Your comment rang a bell in my mind as well.

In my RE work, I have come across a routine named is_sprom_available() that returns a bool. I have not finished understanding the routine, but there is a section that refers to BCM4312 devices, i.e. those with PCI ID 14e4:4315.

To preserve clean-room conditions, I will not be able to write a patch for you, but I can give you a prescription (I don't think Michael will help here either.):

In struct ssb_bus, you should add a u32 to contain the chipcommon status.

In ssb_bus_scan() where the ssb routine reads the chipcommon id, revision, and capabilities, you should read the register with offset 0x2C (SSB_CHIPCO_CHIPSTAT) and save the result in the new word in ssb_bus. Ultimately, this read will be conditional on the rev >= 11, but that will be true for you.

Before the ssb code reads the SPROM in ssb_pci_sprom_get(), check if the status word from above & 3 is not equal to 2. If that is true, your device does not have an SPROM and we need to do some fixup similar to what Michael described above. For now, you can return ENOMEM or ENODEVICE.

If this Q&D patch keeps your machine from freezing, I'll work on a complete set of specs for a proper is_sprom_available() and Gabor or Rafal will be able to put together a set of patches and a userland utility to create a suitable SPROM replacement file in /lib/firmware.

Comment 68 Michael Buesch 2010-03-18 16:36:06 UTC
> In struct ssb_bus, you should add a u32 to contain the chipcommon status.

Please don't put the variable into the bus structure, but into the chipcommon data structure.

> For now, you can return ENOMEM or ENODEVICE.

Yeah as a quick fix for tha fatal hang this is an acceptable workaround.

> Gabor or Rafal will be able to
> put together a set of patches and a userland utility to create a suitable SPROM
> replacement file in /lib/firmware.

I will put that stuff into the b43-tools package, of course. But I'm currently unable to write these tools. I accept patches, of course. Note that you need to implement an asynchronous firmware (sprom) fetching mechanism using the asynchronous firmware library functions.
For the userspace tool it's probably best to extend the ssb-sprom tool to support generating a valid sprom.

Comment 69 John W. Linville 2010-03-18 18:41:25 UTC
http://bcm-v4.sipsolutions.net/802.11/IsSpromAvailable

Comment 70 John W. Linville 2010-03-18 18:43:22 UTC
Created attachment 401103 [details]
ssb_check_for_sprom.patch

Haven't tested this one yet, but an open-coded check just of the chipcommon status avoided the crash on the box here.  Also it still avoids a working device as well, but better not to crash... :-)

Comment 71 Larry Finger 2010-03-18 19:09:32 UTC
Michael and I are working out the details of the user-space and kernel components of supplying a virtual SPROM image, but that shouldn't take long.

Your patch looks fine. I probably would have coded it as

		return ((bus->chipco.status & 0x3) != 2);

rather than

		if ((bus->chipco.status & 0x3) != 2)
			return true;
		else
			return false;

We should also supply some defines for all those magic numbers, but that is also a matter of taste.

Thanks for the grunt work on this problem.

Comment 72 John W. Linville 2010-03-18 19:17:15 UTC
Scratch build w/ above patch is (or will be) available here:

http://koji.fedoraproject.org/koji/taskinfo?taskID=2061828

Again, the wireless still won't work but (hopefully) it won't crash on load of
ssb.ko...

Re: style -- I was debating about that.  Honestly, both ways seem a bit awkward/ugly. :-(  Do you have any suggestions for naming the magic numbers?  Or might they already be defined somewhere?

Comment 73 Michael Buesch 2010-03-18 19:23:34 UTC
(In reply to comment #70)
> Created an attachment (id=401103) [details]
> ssb_check_for_sprom.patch
> 
> Haven't tested this one yet, but an open-coded check just of the chipcommon
> status avoided the crash on the box here.  Also it still avoids a working
> device as well, but better not to crash... :-)    

Please send patches for review via email. It's a pain to comment on patches here.

The patch has a few problems:

1) Don't read chipstat if it doesn't exist (>= chipcommon rev 11). We know what happens on reading registers that don't exist. ;)
2) You are checking the chip revision where you should check the chipcommon core revision.
3) Please create a defined name for chipcommon capability 0x40000000. It obviously is a "SPROM-present" capability flag.

Just as a sidenote: The patch does have a potential for creating regressions, IMO. I think we should not blindly apply it without any testing on a fair amount of devices.

I also support Larry's comment.

Comment 74 Adam Williamson 2010-03-18 19:34:26 UTC
John: with this patch, should the wired networking now work on systems which have b44 wired adapters? previously, because you had to blacklist ssb, you also lost wired functionality on systems whose wired adapter is also broadcom...



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 75 Michael Buesch 2010-03-18 19:39:58 UTC
(In reply to comment #74)
> John: with this patch, should the wired networking now work on systems which
> have b44 wired adapters? previously, because you had to blacklist ssb, you also
> lost wired functionality on systems whose wired adapter is also broadcom...

Most likely, yes. b44 should most likely work with this patch.
At least we don't know of b44 devices without SPROM. You should simply try it.

Comment 76 Adam Williamson 2010-03-18 19:56:34 UTC
I can't, I don't have an affected system. I'm just trying to track the issue for documentation purposes.



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 77 Larry Finger 2010-03-18 21:27:34 UTC
John:

Some defines for the magic numbers:

0x4000000 can be called SSB_CHIPCO_CAP_SPROM and defined in include/linux/ssb/ssb_driver_chipcommon.h

The 0x40 in the 0x4322 branch is BCM4322_SPROM_PRESENT (Note: I simplified the specs.).

The 0x1 in the 0x4325 branch is BCM4325_SPROM_PRESENT.

Comment 78 John W. Linville 2010-03-19 00:23:35 UTC
Cool, thanks Larry...any suggestions for the numbers in the 0x4312 branch?  I suspect that it is similar to the mappings for SSB_CHIPCO_CHST_4325_SPROM_OTP_SEL and the related definitions below it...?

Comment 79 Larry Finger 2010-03-19 04:41:51 UTC
The 3 is SSB_CHIPCO_CHST_4325_SPROM_OTP_SEL.

The 2 is  SSB_CHIPCO_CHST_4325_OTP_SEL.

These two definitions have 4325 in them because they originated from the N PHY code, but still apply to the 4312.

Comment 80 Daniel Meyerholt 2010-03-19 11:43:19 UTC
Larry's patch did work for me. modprobe b43 which loads ssb does not hang any more. Heres the dmesg after modprobe:

b43-pci-bridge 0000:06:00.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
b43-pci-bridge 0000:06:00.0: setting latency timer to 64
ssb: Core 0 found: ChipCommon (cc 0x800, rev 0x16, vendor 0x4243)
ssb: Core 1 found: IEEE 802.11 (cc 0x812, rev 0x0F, vendor 0x4243)
ssb: Core 2 found: PCMCIA (cc 0x80D, rev 0x0A, vendor 0x4243)
ssb: Core 3 found: PCI-E (cc 0x820, rev 0x09, vendor 0x4243)
b43-pci-bridge 0000:06:00.0: PCI INT A disabled
Broadcom 43xx driver loaded [ Features: PL, Firmware-ID: FW13 ]

thanks a lot, looking forward to the SPROM userspace stuff

Comment 81 Daniel Meyerholt 2010-03-19 11:47:29 UTC
sorry it is actually Johns patch ;) btw my uname -a if it matters somehow:

Linux mobilemog-ng 2.6.34-rc1-next-20100319+ #1 SMP Fri Mar 19 11:40:44 CET 2010 x86_64 GNU/Linux

Comment 82 John W. Linville 2010-03-19 20:47:54 UTC
Created attachment 401347 [details]
0001-ssb-do-not-read-SPROM-if-it-does-not-exist.patch

Comment 83 John W. Linville 2010-03-19 22:14:29 UTC
Created attachment 401358 [details]
0001-ssb-do-not-read-SPROM-if-it-does-not-exist.patch

Comment 85 Fedora Update System 2010-03-23 14:56:34 UTC
kernel-2.6.32.10-90.fc12 has been submitted as an update for Fedora 12.
http://admin.fedoraproject.org/updates/kernel-2.6.32.10-90.fc12

Comment 86 Adam Williamson 2010-03-23 21:42:36 UTC

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 87 Fedora Update System 2010-03-24 23:40:42 UTC
kernel-2.6.32.10-90.fc12 has been pushed to the Fedora 12 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update kernel'.  You can provide feedback for this update here: http://admin.fedoraproject.org/updates/kernel-2.6.32.10-90.fc12

Comment 88 Adam Williamson 2010-03-26 17:10:41 UTC
Reporters - could someone please test this with Fedora 13 Beta RC1 (that's the release candidate _to be the Beta_) - it's at http://serverbeach1.fedoraproject.org/pub/alt/stage/13-Beta.RC1/Fedora/ ? We need to confirm that the fix is in for F13 Beta. Thanks.



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 89 Kurt Seifried 2010-03-26 23:18:31 UTC
Ok on the Acer D250 the installer runs (so we're already ahead), ran the install, got a dependancy error during package selection (I assume some packages are out of synch, no biggie), it said installing boot loader, ejected the dvd and stopped responding. Rebooted, it had not done the bootloader, but at least the install ran, so that's good.

Comment 90 Adam Williamson 2010-03-26 23:38:49 UTC
those are both known bugs - https://bugzilla.redhat.com/show_bug.cgi?id=577196 and https://bugzilla.redhat.com/show_bug.cgi?id=577100 - so it looks like you confirm this bug is fixed. thanks!



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 91 Fedora Update System 2010-03-30 02:24:10 UTC
kernel-2.6.32.10-90.fc12 has been pushed to the Fedora 12 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 92 Jean-Pierre Huc 2010-04-01 16:56:50 UTC
I am running on a laptop Aspire 5630 I am now booted with 2.6.32.9-70.fc12.i686.PAE all is well I have both wireless and eth0 working, no problem !

I have upgraded to vmlinuz-2.6.32.10-90.fc12.i686.PAE using yum, On reboot this hang on udev, so after reading the above descripption I added a ssb.blacklist=1 to grub menu.list and rebooted. 

This booted, did not hang on udev, But ethO is NOT seen and so FAILS, the wireless interface a "iwl3945" on this Aspire 5630 works.


Here is extract of lpci -vvv for network interface card

------------------------------

06:01.0 Ethernet controller: Broadcom Corporation BCM4401-B0 100Base-TX (rev 02)
	Subsystem: Acer Incorporated [ALI] Device 0090
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 64
	Interrupt: pin A routed to IRQ 21
	Region 0: Memory at d0000000 (32-bit, non-prefetchable) [size=8K]
	Capabilities: [40] Power Management version 2
		Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=2 PME-
	Kernel driver in use: b44
	Kernel modules: b44

------------------------------

I am new to this bugzilla interface/report maybe I needed to file a new bug for 
acer aspire 5630 Acer, but this seemed related to the above;

Can I help with so more info ?, just ask! IF not please point me in the right direction so to help get this moving along.

whatever,Thank you for your support.

Comment 93 John W. Linville 2010-04-01 17:07:29 UTC
Jean-Pierre Huc, the problem you describe is covered by bug 578217.  The fix is available in kernel-2.6.32.10-94.fc12.

Comment 94 Larry Finger 2010-04-01 17:25:47 UTC
Unfortunately, that fix is wrong.

I'm not a Fedora user. Where do I look to see what changed between 2.6.32.9-90.fe12 and 2.6.32.10-90.fc12?

Comment 95 Adam Williamson 2010-04-01 17:41:27 UTC
In the changelog, as you do on any RPM-based distribution: rpm -q --changelog <packagename>. You may want to pipe it through head: rpm -q --changelog <packagename> | head -100 . It'll be pretty long.



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 96 Larry Finger 2010-04-01 18:44:34 UTC
Remember, I am NOT a Fedora user. I don't have that rpm, and I could not install it if I wanted to. AFAIK, the command you gave me needs the rpm to be installed.

There is nothing that should affect b44 or ssb in the changes between 2.6.32.9 and 2.6.32.10 as distributed by kernel.org. I just need to have a look at the Fedora changes.

Comment 97 John W. Linville 2010-04-01 18:56:42 UTC
The changelog is viewable at the link here:

http://koji.fedoraproject.org/koji/buildinfo?buildID=163138

Comment 98 Larry Finger 2010-04-01 19:07:16 UTC
Thanks John. It looks as if Jean-Pierre's problem is yet another fallout from the incorrect SSB-SPROM "fixes". I don't see any other changes that might affect ssb or b44.

Comment 99 John W. Linville 2010-04-01 19:23:04 UTC
Right -- the -94.fc12 kernels should work for him.  They assume that a device w/o chipcommon will have an SPROM.  I don't know if that is completely valid, but it seems to be working for now.

Comment 100 Jean-Pierre Huc 2010-04-01 20:48:23 UTC
Thanks,for the accuracy of your reply bug 578217 it is 
I am writing this reply from within -94.fc12 PAE kernel using b44 nic so all is well !

If I can be of assistance please feel free. Thanks john.

Comment 101 John W. Linville 2010-04-07 13:13:06 UTC
*** Bug 575470 has been marked as a duplicate of this bug. ***