Bug 754305
| Summary: | Kernel deadlocks when both brcm80211 and atl1c modules installed | ||
|---|---|---|---|
| Product: | [Fedora] Fedora | Reporter: | Darryl Bond <darryl.bond> |
| Component: | kernel | Assignee: | John W. Linville <linville> |
| Status: | CLOSED WORKSFORME | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 16 | CC: | brcm80211-dev-list, gansalmon, itamar, jonathan, kernel-maint, madhu.chinakonda, redhat |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2012-07-13 23:27:52 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Darryl Bond
2011-11-16 02:34:23 UTC
Does blacklisting the brcmsmac driver also allow for a successful boot? yes, it works fine with just the atl1c. I did some more testing yesterday. If I boot with the atl1c blacklisted and probe it in after I have logged in and connected over wireless, it runs fine off the wireless for several minutes, but eventually locks solid. Could you boot as you describe in comment 2, then capture the contents of /proc/interrupts (before it locks-up!) and post that here? Wireless operating and logged in
[root@Netbook ~]# cat /proc/interrupts
CPU0 CPU1
0: 141 1 IO-APIC-edge timer
1: 18 352 IO-APIC-edge i8042
7: 1 0 IO-APIC-edge
8: 1 0 IO-APIC-edge rtc0
9: 1 227 IO-APIC-fasteoi acpi
12: 90 3544 IO-APIC-edge i8042
16: 1 330 IO-APIC-fasteoi snd_hda_intel
17: 1 36 IO-APIC-fasteoi ehci_hcd:usb1, ehci_hcd:usb2
18: 0 0 IO-APIC-fasteoi ohci_hcd:usb3, ohci_hcd:usb4
19: 364 19104 IO-APIC-fasteoi ahci, brcmsmac
40: 153 11592 PCI-MSI-edge radeon
41: 1 106 PCI-MSI-edge snd_hda_intel
NMI: 4 4 Non-maskable interrupts
LOC: 50459 59470 Local timer interrupts
SPU: 0 0 Spurious interrupts
PMI: 4 4 Performance monitoring interrupts
IWI: 0 0 IRQ work interrupts
RES: 17456 13181 Rescheduling interrupts
CAL: 251 147 Function call interrupts
TLB: 782 700 TLB shootdowns
TRM: 0 0 Thermal event interrupts
THR: 0 0 Threshold APIC interrupts
MCE: 0 0 Machine check exceptions
MCP: 1 1 Machine check polls
ERR: 1
MIS: 0
[root@Netbook ~]# modprobe atl1c
[root@Netbook ~]# cat /proc/interrupts
CPU0 CPU1
0: 141 1 IO-APIC-edge timer
1: 18 352 IO-APIC-edge i8042
7: 1 0 IO-APIC-edge
8: 1 0 IO-APIC-edge rtc0
9: 1 227 IO-APIC-fasteoi acpi
12: 90 3544 IO-APIC-edge i8042
16: 1 330 IO-APIC-fasteoi snd_hda_intel
17: 1 36 IO-APIC-fasteoi ehci_hcd:usb1, ehci_hcd:usb2
18: 0 0 IO-APIC-fasteoi ohci_hcd:usb3, ohci_hcd:usb4
19: 417 20185 IO-APIC-fasteoi ahci, brcmsmac
40: 167 13021 PCI-MSI-edge radeon
41: 1 106 PCI-MSI-edge snd_hda_intel
42: 0 0 PCI-MSI-edge eth1
NMI: 4 4 Non-maskable interrupts
LOC: 52039 62372 Local timer interrupts
SPU: 0 0 Spurious interrupts
PMI: 4 4 Performance monitoring interrupts
IWI: 0 0 IRQ work interrupts
RES: 17858 13560 Rescheduling interrupts
CAL: 259 159 Function call interrupts
TLB: 788 707 TLB shootdowns
TRM: 0 0 Thermal event interrupts
THR: 0 0 Threshold APIC interrupts
MCE: 0 0 Machine check exceptions
MCP: 1 1 Machine check polls
ERR: 1
MIS: 0
Tail end of dmesg with both modules inserted: [ 30.782124] wlan0: authenticate with 00:13:10:92:16:60 (try 1) [ 30.785431] wlan0: authenticated [ 30.789455] wlan0: associate with 00:13:10:92:16:60 (try 1) [ 30.796398] wlan0: RX AssocResp from 00:13:10:92:16:60 (capab=0x411 status=0 aid=8) [ 30.796412] wlan0: associated [ 30.798105] ieee80211 phy0: brcms_ops_bss_info_changed: qos enabled: true (implement) [ 30.798121] ieee80211 phy0: brcmsmac: brcms_ops_bss_info_changed: associated [ 30.798135] ieee80211 phy0: changing basic rates failed: -22 [ 30.798147] ieee80211 phy0: brcms_ops_bss_info_changed: arp filtering: enabled true, count 0 (implement) [ 30.800092] ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready [ 34.708703] hda-intel: IRQ timing workaround is activated for card #0. Suggest a bigger bdl_pos_adj. [ 38.857025] EXT4-fs (dm-2): mounted filesystem with ordered data mode. Opts: (null) [ 38.857056] SELinux: initialized (dev dm-2, type ext4), uses xattr [ 41.618310] wlan0: no IPv6 routers present [ 51.005793] ieee80211 phy0: brcms_ops_bss_info_changed: arp filtering: enabled true, count 1 (implement) [ 86.864226] fuse init (API version 7.17) [ 86.900567] SELinux: initialized (dev fusectl, type fusectl), uses genfs_contexts [ 86.919754] SELinux: initialized (dev fuse, type fuse), uses genfs_contexts [ 95.615216] lp: driver loaded but no devices found [ 95.706620] ppdev: user-space parallel port driver [ 137.512375] nf_conntrack version 0.5.0 (16384 buckets, 65536 max) [ 137.697883] ip6_tables: (C) 2000-2006 Netfilter Core Team [ 275.206922] atl1c 0000:06:00.0: PCI INT A -> GSI 18 (level, low) -> IRQ 18 [ 275.207057] atl1c 0000:06:00.0: setting latency timer to 64 [ 275.309420] atl1c 0000:06:00.0: version 1.0.1.0-NAPI [ 275.347928] udevd[1535]: renamed network interface eth0 to eth1 [ 275.384415] atl1c 0000:06:00.0: irq 42 for MSI/MSI-X [ 275.473859] ADDRCONF(NETDEV_UP): eth1: link is not ready From "Arend van Spriel" <arend>: "I had a look in our driver and it is pretty straight-forward. Checks if the interrupt is ours look at the chip interrupt status and return accordingly. In atl1c a do-while look is used in the isr with a number of possible code paths changing the return value. So I don't dare to predict that behaviour. Not seeing an obvious mistake there though." I see the same thing on an Acer Aspire One using the same drivers in Fedora 15. Additional data: wifi on, ether plugged in: works fine wifi off, ether unplugged: works fine wifi on, ether unplugged, boot order HDD first: hangs wifi on, ether unplugged, boot order network first: works fine Note that with the network first in the boot order, the BIOS accesses the ethernet controller to search for a PXE boot image. This implies to me that the likely culprit is the atl1c driver not initializing something. So setting the network first in the boot order will work around the problem, although it seems potentially dangerous if you're plugged into a public ethernet. Works for me too. * Updated to 3.1.5 kernel * Remove blacklist from atl1c * reboot * Deadlocks at logon screen * Restart * Change BIOS to PXE boot first * Cable unplugged so PXE fails immediately * Continues to boot from HD * Boot to login works fine * Both atl1c and wireless modules installed together. Updated to 3.1.6-1.f16.x86_64. The fault seems to have gone away. I no longer need to blacklist one or the other, nor do I need to enable PXE boot. I notice that the brcmsmac modules versions changed while the atl1c has not. oops, yes it does. Just not every time. It locked up again. I suppose I had 6 successful boots beforehand though. Somewhere around 3.4.3 this bug was fixed. It no longer requires PXE boot enabled to prevent the deadlocks. |