Description of problem: F8 (KDE) Live-CD hangs in udev on boot. How reproducible: On booting, the livecd gets as far as displaying 'udev' but the green 'OK' never appears. There is some brief cd activity (and also hard disk activity, which seems strange to me) and then indefinite silence. The livecd boots successfully on another machine. Additional info: I have also booted without 'rhgb' and 'quiet' and with 'udevinfo' and 'vga=791'. udevinfo results in a huge amount of text output after starting udev. I have attached a picture of the last few lines. I can attempt to record some of the preceding lines if that would be helfpful.
Created attachment 260051 [details] Shot of udevinfo output
This is a kernel module crashing the kernel. The picture would be useful, if only I could read the last modprobe lines :)
I too am having this problem; hopefully my description/information can assist in tracking this down. I have 2 boxes, one a desktop, the other a laptop. Fedora 8 works flawlessly on the desktop; the issue is on the laptop. I first tried the "Live" version of the CD; the machine hung at the 'udev' step. I then downloaded & burned the full install disc; I was able to go through the entire install process fine, but it hung in the same place (udev). From googling around, I've tried all sorts of kernel options which had NO discernable affect: cbiosize=4096, cbmemsize=128M,floppy.allowed_drive_mask=0,acpi=off,noapic,nohz=off,nolapic,nolapic_timer,pci=nomsi,nommconf,hires=off,pnpacpi=off. All of those options (in seemingly countless varying combos) had no impact, the box would hang as described above. This morning, I tried two more options; these got further in the sense that something spit out after UDEV 'stopped' after about 10 seconds of disk activity. These two options (done separately) were these: pci=noacpi bcm43xx.blacklist=yes The message was the same both times: CPU0: Machine Check Exception 0000000000000004 Bank 4: b200000000070f0f Kernel panic - not syncing: cpu context corrupt I don't have a working linux instance on this box right now, so here's some non-lspci hardware info. I have a Netgear WG511T card has been unplugged during these attempts; I don't use the built-in Broadcom wireless. Hewlett-Packard Pavilion zv5200 (DP299AV) F.34 Board: Compal 08A0 32.41 Bus Clock: 133 megahertz BIOS: Hewlett-Packard F.34 12/23/2004 Athlon64 3200 1024 Megabytes Installed Memory (512MBx2) TOSHIBA DVD-ROM SD-R2512 [CD-ROM drive] HITACHI_DK23FA-80 [Hard drive] (80.03 GB) -- drive 0, s/n 26R734, rev 00M3A0A0, SMART Status: Healthy Standard floppy disk controller NVIDIA nForce3 Parallel ATA Controller (v2.6) NVIDIA GeForce4 440 Go 64M [Display adapter] SoundMAX Integrated Digital Audio Texas Instruments PCI-1620 CardBus Controller with UltraMedia (2x) Standard Enhanced PCI to USB Host Controller Standard OpenHCD USB Host Controller (2x) 1394 Net Adapter Broadcom 802.11b NETGEAR 108 Mbps Wireless PC Card WG511T Nortel IPSECSHM Adapter Realtek RTL8139 Family PCI Fast Ethernet NIC Texas Instruments OHCI Compliant IEEE 1394 Host Controller Microsoft AC Adapter Microsoft ACPI-Compliant Control Method Battery USB Human Interface Device Standard 101/102-Key or Microsoft Natural PS/2 Keyboard Alps Pointing-device [Mouse] HID-compliant mouse USB Root Hub (3x)
I had a problem where my system began to hang on the udev line as described in the first post after upgrading to 2.6.23 in fedora 7 (should be similar to the version on the f8 live cd). It was previously working with a 2.6.22 kernel. I was able to solve it by blacklisting the b43 module and associated modules. Note that blacklisting b43 alone did not solve the problem, and that bcm43xx no longer exists in the 2.6.>21 kernels. My broadcom card is detected as 0f:00.0 0280: 14e4:4328 (rev 01). In /etc/modprobe.d/blacklist add: blacklist b43 blacklist sbs blacklist mac80211 blacklist cfg80211 In /etc/modprobe.conf add alias b43 off alias sbs off alias mac80211 off alias cfg80211 off
"install xyz /bin/true" instead of "alias xyz off". "alias xyz off" is deprecated, IIRC.
Yep, i had started to suspect the bcm43xx device. Here's the ouptut of 'dmesg | grep 43', on my existing FC6 installation. bcm43xx driver bcm43xx: Chip ID 0x4301, rev 0x0 bcm43xx: Number of cores: 5 bcm43xx: Core 0: ID 0x812, rev 0x2, vendor 0x4243 bcm43xx: Core 1: ID 0x80d, rev 0x0, vendor 0x4243 bcm43xx: Core 2: ID 0x806, rev 0x2, vendor 0x4243 bcm43xx: Core 3: ID 0x807, rev 0x1, vendor 0x4243 bcm43xx: Core 4: ID 0x804, rev 0x3, vendor 0x4243 bcm43xx: PHY connected bcm43xx: Detected PHY: Analog: 0, Type 1, Revision 4 bcm43xx: Detected Radio: ID: 2205317f (Manuf: 17f Ver: 2053 Rev: 2) bcm43xx: Radio turned off bcm43xx: Radio turned off The Core 1-4 lines seem to match with the modprobe lines in the lower half of the screenshot (if you zoom in a bit and squint, or I can upload better resolution crops of those lines if you need them). I guess that 'ssb' in the screenshot stands for "Sonics Silicon Backplace".
And last night I found out about the excellent but not yet well documented new option to blacklist modules at boot time. So when I booted with the 'blacklist=b43legacy' kernel option everything worked fine. I did notice that dmesg mentioned a b44 device. A similar problem is mentioned on http://fedoraproject.org/wiki/JeremyKatz/Laptops - in the section for the HP Pavilion zd7000.
Same here. My Asus laptop hanged on boot, lspci shows a: Network controller: Broadcom Corporation BCM4303 802.11b Wireless LAN Controller (rev 02) and Fedora 8 hangs at "Starting udev" right after installation. For some reason the "blacklist=b43legacy" option did not worked for me, so I booted a old Knoppix CD I had around and added to /etc/modprobe.d/blacklist a single line: blacklist b43legacy After that Fedora boots fine. By the way, while searching for similar bugs, I found bug #383281, which is also related to b43legacy. Maybe the driver hangs because it can't find the firmware file?
Yes, I since discovered that the blacklist option is not a kernel option - it only works for the LiveCD and adds a line to blacklist the module in /etc/modprobe.conf. I don't know whether this works with anaconda too.
I now want to use my wireless card, so I downloaded & extracted the version 3 firmware from openwrt.org, as recommended by the b43-fwcutter package. It still freezes at boot (now with kernel 2.6.23.8-63.fc8) but now I've discovered I can produce a freeze at runtime by loading the b43legacy module. First I modprobe -rv'd the b44 module, I don't quite understand what this is - is it an ethernet or wireless driver? either way, I don't think it should be loaded. This also umloaded the ssb driver. Then: $modprobe -v b43legacy insmod /lib/modules/2.6.23.8-65.fc8/kernel/drivers/input/input-polldev.ko insmod /lib/modules/2.6.23.8-65.fc8/kernel/net/wireless/cfg80211.ko insmod /lib/modules/2.6.23.8-65.fc8/kernel/net/mac80211/mac80211.ko insmod /lib/modules/2.6.23.8-65.fc8/kernel/net/rfkill/rfkill.ko insmod /lib/modules/2.6.23.8-65.fc8/kernel/drivers/ssb/ssb.ko And then the system freezes. I'm not sure where to go from here - what information do you need to pin this down? Is it failing while loading ssb, or after?
Please attach the output of running 'lspci -n'. Also, any chance you can capture the output of 'Alt-SysRq-T' during the hang?
Created attachment 285841 [details] output of lspci -n
No, I turned SysRq on (and checked that e.g. Alt-SysRq-h worked beforehand) but it didn't respond to SysRq at all during the hang. All I have is the last two lines from /var/log/messages, corresponding to when I removed b44 and attempted to load b43legacy, respectively: Dec 12 16:44:16 mostin kernel: ACPI: PCI interrupt for device 0000:00:09.0 disabled Dec 12 16:44:41 mostin kernel: ACPI: PCI Interrupt 0000:00:09.0[A] -> Link [LNKC] -> GSI 11 (level, low) -> IRQ 11
Can you try a netconsole or a serial console (if you still have a serial port)?
Created attachment 289736 [details] /var/log/messages modprobing b43legacy No luck with the netconsole, since the pegasus URB ethernet driver doesn't seem to have netpoll support. I have a serial interface on this computer, but no real good choices for where to put the other end (of the cable I don't currently own). But I can boot into my FC6 installation and look into F8's /var/log. Attached is the contents of /var/log/messages after I: * blacklisted b43legacy, b44 and ssb in modprobe.conf * booted with options "udevinfo modprobedebug debug 3" * enabled sysrq and set the loglevel to 9 * ran 'modprobe -v b43legacy' The last lines in the file correspond to the last thing I saw on the console. I hope there's some clues in there somewhere.
Created attachment 289737 [details] dmesg corresponding to the previous message, for reference
> Dec 16 22:29:48 mostin kernel: b43legacy-phy0: Broadcom 4301 WLAN found Ok. That's the last thing you see before the lockup? Well. I guess then it possibly crashes somewhere in the b43 attach stage. Can you try sprinkling a few printk() calls into driver/net/wireless/b43legacy/main.c:b43legacy_probe()? So that we see if the machine survives executing that function or if it crashes somewhere inside of it.
Created attachment 289739 [details] /var/log/messages for 'modprobe b44' followed by 'modprobe b43legacy' I'll have a crack at the printk thing tomorrow. Since b44 was getting loaded for some reason (and involves loading ssb) I also tried repeating the exercise by: * booting as before * Alt+SysRq+9 * 'modprove -v b44' * Alt+SysRq+h (purely to put an intervening line in the logs) * 'modprove -v b43legacy' End results as before.
Created attachment 290296 [details] printk'd output tracing progress through b43legacy_probe After have crazy amounts of trouble even getting the stock kernel srpm to rebuild, for reasons that are beyond me, I've starting using mock and have gone through a couple of iterations of debugging kernels. Attached is the output on loading the b43legacy. Hopefully, it's self-explanatory - the main point is that the kernel gets lost after b43legacy_probe -> b43legacy_one_core_attach -> b43legacy_wireless_core_attach. I'll continue digging inside b43legacy_wireles_core_attach to try and find out how far it gets.
Created attachment 290653 [details] Reconstructed log tracing progress through b43legacy and ssb functions Attached is a log showing how far 'modprobe -v b43legacy' gets before giving up the ghost. The log is reconstructed from the contents of /var/log/messages and a photo of the screen (timestamps ending in :xx appeared on the console but never made it to file). If I'm reading it right, the kernel gets lost at the start of ssb_device_disable, in drivers/ssb/main.c, at the line "if (ssb_read32(dev, SSB_TMSLOW) & SSB_TMSLOW_RESET)". This is as close to the metal as I've got - there didn't seem to be much point in adding printks inside the read32 or write32 functions. Any ideas?
Created attachment 290654 [details] patch to 2.6.23.8-63 kernel adding copious printks to b43legacy and ssb module loading And here's the patch I used, in case it helps to read the output.
Created attachment 290662 [details] Add device software state Can you try this patch please? The problem you describe is rather strange. If this patch fixes it, I'd call this a silicon bug.
If the patch does _not_ fix it (which I guess), please add printks down in the low level ssb_read32() function. There's also a lot of complex and tricky code down there.
I have a Sony Vaio desktop with a Linksys WMP11 (bcm4303) PCI card I'm installing F8 on, and I'm seeing the same problem. I can boot successfully on the LiveCD using the option: b43legacy.blacklist=yes The applicable lspci output is: 02:0b.0 Network controller: Broadcom Corporation BCM4303 802.11b Wireless LAN Controller (rev 02) With the '-n' option: 02:0b.0 0280: 14e4:4301 (rev 02) While I have installed the system, updated to current and loaded the b43legacy firmware (using b43-fwcutter, version 3 firmware), I can't boot with the card installed. Normally, this would be the only network connection on this machine (I relocated the machine to get an Ethernet connection to update). I just wanted to let folks know that this is impacting desktops as well as laptops. It would be nice if the installed system honored the blacklist options as well, for such situations.
Mace, what kernel are you using?
I'm using kernel 2.6.23.9-85.fc8. Is there another one you'd like me to test?
Could you try this one? http://koji.fedoraproject.org/koji/buildinfo?buildID=31896
Sorry, that kernel (2.6.23.14-111.fc8) made no difference.
Created attachment 293058 [details] Reconstructed log tracing through b43legacy/main.c, ssb/main.c and ssb/pci.c Your "device software state" patch doesn't fix the freeze, but it does move it to a different place. Here's a reconstructed log (*) with my original b43legacy/ssb printk patch (fixed up for the device software state patch) and some printks in drivers/ssb/pci.c. As far as I understand the trace, the kernel now avoids the ssb_read32 in ssb_device_disable (and succeeds at a write32 in ssb_device_enable) but fails in ioread32 in the read32 called from ssb_flush_tmslow. I'm going to re-compile again to try and confirm this for sure. I also looked into lib/iomap.c which I assume just calls readl. I don't know where to find readl or whether it's worth me digging in to. * The two lines in brackets are guesses, based on earlier logs - they weren't caught in /var/log/messages and scrolled off the screenshot.
Created attachment 293059 [details] patch to 2.6.23.9-85 adding printks in ssb/pci.c For reference
Are you sure the device works at all? Did you try with other operating systems?
Yes, it works on Windows Me and on FC6, with the old bcm43xx stack.
I've now confirmed that the freeze can be isolated in the read32 called by ssb_flush_tmslow - I see a printk before the ioread32 but nothing after the return to ssb_flush_tmslow. I also enabled CONFIG_SSB_DEBUG to see if the power state tracking made a difference, but it doesn't.
I suspect a bug in the ssb PCI crystal powerup routine. I'm not sure where it is exactly, yet.
I rechecked the PCI powerup code and it really is the same as the bcm43xx code (although it's slightly different structured). So well, I'm not sure how I can help you.
Well I have nothing specific to offer, but I have to ask if this problem persists in recent kernels?
Still no luck with latest stable kernel-2.6.24.3-12.fc8. I wonder if anyone of the subsequent koji builds are relevant - if nothing else, it looks like b43legacy is a little more verbose.
(In reply to comment #37) > it looks like b43legacy is a little more verbose. So what does it tell?
Is it possible to get the exactly same card as that from somewhere? I'd like to debug this here in my testbed. This is pretty much impossible to debug remotely.
Sorry, what I meant is that I had peeked at koji build logs and expected that b43legacy would be more verbose with future kernels. I can see that there should be more info printed out by b43legacydbg calls. But I've now tried kernel-2.6.24.3-22.fc8 and there's no more information than before. I see that CONFIG_B43LEGACY_DEBUG=y in the fedora default config file, should I be seeing any more output than I already am? Here's what I have now, just as a reminder: Mar 14 00:59:35 mostin kernel: SysRq : Changing Loglevel Mar 14 00:59:35 mostin kernel: Loglevel set to 9 Mar 14 00:59:44 mostin kernel: SysRq : Emergency Sync Mar 14 00:59:44 mostin kernel: Emergency Sync complete Mar 14 01:00:12 mostin kernel: ACPI: PCI Interrupt Link [LNKC] enabled at IRQ 11 Mar 14 01:00:12 mostin kernel: ACPI: PCI Interrupt 0000:00:09.0[A] -> Link [LNKC] -> GSI 11 (level, low) -> IRQ 11 Mar 14 01:00:12 mostin kernel: ssb: Sonics Silicon Backplane found on PCI device 0000:00:09.0 Mar 14 01:00:12 mostin kernel: b43legacy-phy0: Broadcom 4301 WLAN found
Created attachment 298154 [details] Can you test this?
Created attachment 298194 [details] extract from /var/log/messages showing traceback Crashes in ssb_bus_may_powerdown, with a traceback.
Created attachment 298210 [details] [PATCH v2] [RFT] b43legacy: fix bcm4303 crash Please test this.
Created attachment 298595 [details] Console output for patch v2 Some success with Patch v2 applied to the 2.6.24.3-22.fc8 kernel! It doesn't crash or freeze, which is news. Doesn't seem to know what to do with firmware though. Here's the console log, /var/log/messages is similar with less debug messages and more NetworkManager output.
Created attachment 301430 [details] [PATCH RFT] b43legacy: fix initvals loading on bcm4303
Comment on attachment 301430 [details] [PATCH RFT] b43legacy: fix initvals loading on bcm4303 Please test this. Thank you for your reports.
Created attachment 301806 [details] dmesg output from slightly more successful b43legacy loading Loads the firmware now, but doesn't quite work yet. It's a PCI card so I'm fairly sure there's no hardware switch.
The "LEDs: Unknown behaviour" bit sounds like you have unknown (or corrupted) SPROM data. I dont' know if those LED messages are a problem, but it suggest that anything else taken from your SPROM may be suspect...? Hopefully Stefano can suggest how to proceed.
Created attachment 301817 [details] bcm4301-rfkill-hack.patch Try this to see if we are just reading the RF-Kill status incorrectly?
Created attachment 302013 [details] extracts from /var/log/messages The output about the hardware switch is gone, but it isn't any better. Attached are some lines cherry-picked from /var/log/messages, for context.
Well, it was a long shot. I think it still seems likely that the driver does not understand your SPROM data. Whether or not that is the root of the problem is beyond me ATM. Hopefully Stefano has more ideas?
Created attachment 302273 [details] [PATCH] ssb-pcicore: Fix IRQ TPS flag handling This patch by Larry Finger makes the RX path to work. I'm still investigating why TX doesn't work, now.
Problem still exists in kernel-2.6.24.5-85.fc8, dmesg output as in attachment #301806 [details] but without the last three lines after "Radio hardware status changed to DISABLED". But at least the f8 kernel boots now, without b43legacy blacklisted.
Works for me with current wireless-testing tree. In particular, patch: [PATCH V2] ssb: Fix case where board flags are unset in SPROM by Larry Finger is needed.
Can you replicate this with current F8 kernels? http://koji.fedoraproject.org/koji/buildinfo?buildID=50245
No substantial change on 2.6.25.4-10.fc8.
Created attachment 308869 [details] dmesg on 2.6.25.4-10.fc8 with a little more debug info
With kernel-2.6.25.6-27.fc8 everything works for me. Hurray! Asus L5800C laptop with BCM4303: 00:0c.0 Network controller: Broadcom Corporation BCM4303 802.11b Wireless LAN Controller (rev 02) Subsystem: ASUSTeK Computer Inc. WL-103b Wireless LAN PC Card Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 32 Interrupt: pin A routed to IRQ 17 Region 0: Memory at e4000000 (32-bit, non-prefetchable) [size=8K] Capabilities: [40] Power Management version 2 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA PME(D0+,D1+,D2+,D3hot+,D3cold+) Status: D0 PME-Enable- DSel=0 DScale=2 PME+
Can others try -27.fc8? Does it resolve the issue for others as well?
Righty ho. It actually does work now. I tried again and realised that I hadn't set up wlan0 in system-config-network, or I hadn't stared hard enough at the network-manager applet, or both. So I went back through the old kernels I still have installed to see when it started working: 2.6.24.4-64.fc8 - fails to boot in udev, the original problem 2.6.24.5-85.fc8 - boots, NetworkManager finds the local network on a scan but can't connect. 2.6.24.7-92.fc8 - boots, scans and connects! The only remaining wrinkle is that "Connection Information" on the nm-applet tells me I'm connected at only 1Mb/s. I'd expect 802.11b to suck, but don't know if it's sucking more than it should. Thanks for the fix.
Closing based on "boots, scans and connects!" -- the nm-applet speed info may just reflect that the rate control algorithm has not yet scaled-up the speed or it may be some issue with nm-applet. You may want to open a new bug if that problem persists.