Bug 180953 - SoftMAC scanning appears to cause a hard lockup
SoftMAC scanning appears to cause a hard lockup
Status: CLOSED RAWHIDE
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
rawhide
i386 Linux
medium Severity high
: ---
: ---
Assigned To: John W. Linville
Brian Brock
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2006-02-10 14:18 EST by Bernard Johnson
Modified: 2007-11-30 17:11 EST (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2006-07-12 01:52:06 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Bernard Johnson 2006-02-10 14:18:10 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.0.1) Gecko/20060208 Fedora/1.5.0.1-2.1 Firefox/1.5.0.1

Description of problem:
As new bcm43xx/SoftMAC changes come out in the kernel, I'm trying them with my laptop to try to get wireless working.  Today's rawhide kernel causes a hard lockup when I execute the following commands:

ifconfig eth1 192.168.1.150
iwconfig eth1 key 0 restricted ffffffffff
iwconfig eth1 essid mynetwork
(wait two seconds - hard lockup)



Version-Release number of selected component (if applicable):
kernel-2.6.15-1.1928_FC5

How reproducible:
Always

Steps to Reproduce:
1.  ifconfig eth1 192.168.1.150
2.  iwconfig eth1 key 0 restricted ffffffffff
3.  iwconfig eth1 essid mynetwork
4.  wait two seconds - hard lockup
  

Actual Results:  The last two things I see in my messages file are:

Feb 10 11:59:31 localhost kernel: SoftMAC: Start scanning with channel: 1
Feb 10 11:59:31 localhost kernel: SoftMAC: Scanning 14 channels

then the system locks.

Expected Results:  I don't expect this to necessarily work, as I've been unable to get my bcm43xx wireless working, but I don't expect a hard lockup either.

Additional info:
Comment 1 Bernard Johnson 2006-02-16 04:06:09 EST
I am not experiencing this with my current kernel:
Linux localhost.localdomain 2.6.15-1.1948_FC5 #1 Mon Feb 13 21:05:36 EST 2006
i686 i686 i386 GNU/Linux
Comment 2 Bernard Johnson 2006-02-17 19:17:54 EST
Although, I am getting a rare random lockup.  I haven't been able to produce any
relevent log lines or debug info for it yet though.
Comment 3 John W. Linville 2006-02-20 11:58:09 EST
You may wish to try the (FC4-based) fedora-netdev kernels: 
 
   http://people.redhat.com/linville/kernels/fedora-netdev/ 
 
FC4.netdev.10 in particular has some fairly late softmac updates from 
upstream.  Please give those kernels a try and post the results here...thanks! 
Comment 4 Bernard Johnson 2006-02-22 16:34:07 EST
When I upgraded my kernel to 2.6.15-1.1948_FC5 the SoftMAC lockups ceased.  But,
at the time, there was a random rare lockup.  Since then, I've upgraded to the
most current rawhide kernel (as of today, 2.6.15-1.1975), and now I have neither
of these problems.
Comment 5 Bernard Johnson 2006-03-24 03:36:56 EST
This bug is back.  And it's 100% reproducible.  However, it only happens if I'm
using NetworkManager now.  If I bring up eth1 manually, everything seems to
work.  I'm using it that way now to make this comment.  I've include
/var/log/messages.  The last line is the last thing recorded before it locks up.

uname -a
Linux localhost.localdomain 2.6.16-1.2083_FC6 #1 Wed Mar 22 21:31:28 EST 2006
i686 i686 i386 GNU/Linux

Mar 24 01:12:21 localhost kernel: SoftMAC: Start scanning with channel: 1
Mar 24 01:12:21 localhost kernel: SoftMAC: Scanning 14 channels
Mar 24 01:12:21 localhost kernel: SoftMAC: Scanning finished
Mar 24 01:12:41 localhost kernel: SoftMAC: Start scanning with channel: 1
Mar 24 01:12:41 localhost kernel: SoftMAC: Scanning 14 channels
Mar 24 01:12:41 localhost kernel: SoftMAC: Scanning finished
Mar 24 01:12:57 localhost dhcdbd: Started up.
Mar 24 01:12:58 localhost dhcdbd: message_handler: message handler not found
under /com/redhat/dhcp/eth1 for sub-path eth1.dbus.get.reason
Mar 24 01:13:03 localhost kernel: bcm43xx: Radio turned off
Mar 24 01:13:03 localhost kernel: bcm43xx: DMA 0x0260 (RX) max used slots: 1/64
Mar 24 01:13:03 localhost kernel: bcm43xx: DMA 0x0200 (RX) max used slots: 2/64
Mar 24 01:13:03 localhost kernel: bcm43xx: DMA 0x0260 (TX) max used slots:
0/512Mar 24 01:13:03 localhost kernel: bcm43xx: DMA 0x0240 (TX) max used slots:
0/512Mar 24 01:13:03 localhost kernel: bcm43xx: DMA 0x0220 (TX) max used slots:
2/512Mar 24 01:13:03 localhost kernel: bcm43xx: DMA 0x0200 (TX) max used slots:
0/512Mar 24 01:13:03 localhost kernel: bcm43xx: Radio turned on
Mar 24 01:13:03 localhost kernel: bcm43xx: Chip initialized
Mar 24 01:13:03 localhost kernel: bcm43xx: DMA initialized
Mar 24 01:13:03 localhost kernel: bcm43xx: 80211 cores initialized
Mar 24 01:13:03 localhost kernel: bcm43xx: TODO: Incomplete code in
keymac_write() at drivers/net/wireless/bcm43xx/bcm43xx_main.c:1254
Mar 24 01:13:03 localhost last message repeated 7 times
Mar 24 01:13:03 localhost kernel: bcm43xx: TODO: Incomplete code in
keymac_write() at drivers/net/wireless/bcm43xx/bcm43xx_main.c:1256
Mar 24 01:13:03 localhost last message repeated 3 times
Mar 24 01:13:03 localhost kernel: bcm43xx: Keys cleared
Mar 24 01:13:03 localhost kernel: SoftMAC: Queueing Authentication Request to
00:13:10:a8:78:9d
Mar 24 01:13:03 localhost kernel: SoftMAC: cannot associate without being
authenticated, requested authentication
Mar 24 01:13:03 localhost kernel: SoftMAC: Sent Authentication Request to
00:0f:0f:a8:87:fd.
Mar 24 01:13:03 localhost kernel: bcm43xx: set security called
Mar 24 01:13:03 localhost kernel: bcm43xx:    .active_key = 0
Mar 24 01:13:03 localhost kernel: bcm43xx:    .level = 1
Mar 24 01:13:03 localhost kernel: bcm43xx:    .enabled = 1
Mar 24 01:13:03 localhost kernel: bcm43xx:    .encrypt = 1
Mar 24 01:13:03 localhost kernel: SoftMAC: cannot associate without being
authenticated, requested authentication
Mar 24 01:13:07 localhost dhclient: DHCPDISCOVER on eth1 to 255.255.255.255 port
67 interval 5
Mar 24 01:13:12 localhost dhclient: DHCPDISCOVER on eth1 to 255.255.255.255 port
67 interval 12
Mar 24 01:13:15 localhost kernel: SoftMAC: Sent Authentication Request to
00:13:10:a8:78:9d.
Mar 24 01:13:24 localhost dhclient: DHCPDISCOVER on eth1 to 255.255.255.255 port
67 interval 8
Mar 24 01:13:27 localhost kernel: SoftMAC: Sent Authentication Request to
00:13:10:a8:78:9d.
Mar 24 01:13:32 localhost dhclient: DHCPDISCOVER on eth1 to 255.255.255.255 port
67 interval 10
Mar 24 01:13:33 localhost kernel: bcm43xx: Radio turned off
Mar 24 01:13:33 localhost kernel: bcm43xx: DMA 0x0260 (RX) max used slots: 1/64
Mar 24 01:13:33 localhost kernel: bcm43xx: DMA 0x0200 (RX) max used slots: 1/64
Mar 24 01:13:33 localhost kernel: bcm43xx: DMA 0x0260 (TX) max used slots:
0/512Mar 24 01:13:33 localhost kernel: bcm43xx: DMA 0x0240 (TX) max used slots:
0/512Mar 24 01:13:33 localhost kernel: bcm43xx: DMA 0x0220 (TX) max used slots:
2/512Mar 24 01:13:33 localhost kernel: bcm43xx: DMA 0x0200 (TX) max used slots:
0/512Mar 24 01:13:33 localhost kernel: bcm43xx: Radio turned on
Mar 24 01:13:33 localhost kernel: bcm43xx: Chip initialized
Mar 24 01:13:33 localhost kernel: bcm43xx: DMA initialized
Mar 24 01:13:33 localhost kernel: bcm43xx: 80211 cores initialized
Mar 24 01:13:33 localhost kernel: bcm43xx: TODO: Incomplete code in
keymac_write() at drivers/net/wireless/bcm43xx/bcm43xx_main.c:1254
Mar 24 01:13:33 localhost last message repeated 7 times
Mar 24 01:13:33 localhost kernel: bcm43xx: TODO: Incomplete code in
keymac_write() at drivers/net/wireless/bcm43xx/bcm43xx_main.c:1256
Mar 24 01:13:33 localhost last message repeated 3 times
Mar 24 01:13:33 localhost kernel: bcm43xx: Keys cleared
Mar 24 01:13:33 localhost kernel: SoftMAC: Associate: Scanning for networks first.
Mar 24 01:13:33 localhost kernel: SoftMAC: Start scanning with channel: 1
Mar 24 01:13:33 localhost kernel: SoftMAC: Scanning 14 channels
Mar 24 01:13:33 localhost kernel: bcm43xx: set security called
Mar 24 01:13:33 localhost kernel: bcm43xx:    .active_key = 0
Mar 24 01:13:33 localhost kernel: bcm43xx:    .level = 1
Mar 24 01:13:33 localhost kernel: bcm43xx:    .enabled = 1
Mar 24 01:13:33 localhost kernel: bcm43xx:    .encrypt = 1
Mar 24 01:13:33 localhost kernel: SoftMAC: Associate: Scanning for networks first.
Mar 24 01:13:33 localhost kernel: SoftMAC: Associate: failed to initiate scan.
Is device up?
Mar 24 01:13:33 localhost kernel: SoftMAC: Scanning finished
Comment 6 Bernard Johnson 2006-03-24 04:02:47 EST
Ok, here's a clue.  If I execute these commands first:

ip link set eth1 down
ip link set eth1 up
iwconfig eth1 rate 11M essid mynetwork key ffffffffff
sleep 5
dhclient eth1

and then execute nm-applet, I don't get any lockup and everything working fine.
Comment 7 David Woodhouse 2006-03-24 05:19:28 EST
Can you hook up a serial console and try to reproduce, or at least make it lock
up while it's at text VT instead of in X -- so you have a chance of seeing its
final messages?

Also enable the NMI watchdog, and try using SysRq-P to see where it's locked up.
Comment 8 David Woodhouse 2006-03-24 05:21:01 EST
Also try the 2.6.16-1.2070 or later kernels -- I've changed some things related
to initialisation, although nothing which is specifically expected to address this.
Comment 9 David Woodhouse 2006-03-24 06:09:44 EST
Sorry, should have noticed that you _are_ using the latest kernels. 

The problem here looks like something is taking the device down while
NetworkManager is scanning. It looks like dhclient is running, and failing to
get an address. Then it takes the device down during the scan. When we attempt
to set the device back to its original channel at the end of the scan, the
bcm43xx driver locks up the machine by trying to poke at the device while it's
disabled. 

There's a patch at
http://david.woodhou.se/linux-2.6-bcm43xx-set-chan-lockup.patch which should be
in the next FC-5 and rawhide kernel builds (2071 and 2088, resp.)
Comment 10 Bernard Johnson 2006-03-25 14:11:29 EST
Installed 2088 and it seems to have fixed the problem.  I can now startup
nm-applet right away with no problem.

In regards to your comment #8 to making some changes related to initialization -
Have you done anything that might fix the the lockup in bug #184440?  I still
get the duplicate packets when using the bcm43xx driver, but I've not been able
to make it lockup with the 2088 kernel.
Comment 11 Bernard Johnson 2006-03-27 16:05:38 EST
Well, yesterday was my (un)lucky day.  I managed to get kernel 2088 to lockup
several times.  I can now tell it's definately related to getting a dhcp
address.  Even when I went to the manual method, I had to run dhclient several
times before it managed to get an address.  Once it had acquired an address, I
was able to run nm-applet again and everything was fine.

Today, I turned on the nmi watchdog, but couldn't get SysRq-P to work (well, it
may be my notebook with "special" little blue key magic to get to sysrq I
haven't found).

I removed all files in /var/lib/dhclient and then started nm-applet.

However, I did switch to a text console, and this was the last thing I saw:

954193>] ieee80211_rx_mgt+0x2ca/0x2e7 [ieee80211]
 [<c020575f>] __add_entropy_words+0x58/0x178 [<c0205b37>] extract_buf+0x95/0xc8
 [<c013e9e3>] get_page_from_freelist+0x1cf/0x335 [<c01519da>]
cache_alloc_debugcheck_after+0xb8/0xea
 [<c0152936>] cache_alloc_refill+0x217/0x5b0 [<c015194f>]
cache_alloc_debugcheck_after+0x2d/0xea
 [<e09d917a>] bcm43xx_rx+0x4a3/0x51d [bcm43xx] [<e09e637a>]
bcm43xx_dma_rx+0x306/0x3d7 [bcm43xx]
 [<e09d8189>] bcm43xx_interrupt_tasklet+0x61b/0x8ed [bcm43xx] [<c012cca3>]
ktime_get_ts+0x11/0x3a
 [<c012ccda>] ktime_get+0xe/0x2c [<c0117127>] try_to_wake_up+0xf0/0xfa
 [<c012221d>] run_timer_softirq+0x17b/0x186 [<c011eb86>] tasklet_action+0x37/0x57
 [<c011ec32>] __do_softirq+0x37/0x7f [<c0104c31>] do_softirq+0x38/0x3f
 =======================
 [<c0104bf0>] do_IRQ+0x6e/0x77 [<c010358e>] common_interrupt+0x1a/0x20
 [<c01e007b>] fb_try_mode+0x58/0x88 [<c01fda1f>] acpi_safe_halt+0x15/0x1b
 [<c01fdbb2>] acpi_processor_idle+0x137/0x329 [<c0101e4e>] cpu_idle+0x3a/0x4f
 [<c03a96b1>] start_kernel+0x29a/0x29c <0>Code: 00 00 00 84 c0 7f 67 31 f6 31 ff
eb 14 8b 03 c7 03 00 00 00 00 84 c0 7f 55 f3 90 83 c6 01 83 d7 00 69 05 88 53 34
c0 fa 00 00 00 <83> ff 00 77 04 39 c6 72 d9 85 c9 74 cf e8 e4 1e f5 ff ba 00 f0
Continuing in 1 seconds.
console shuts up


If you still want me to try to attach a serial console, I can give that a try...
Might take some effort though because it doesn't have a serial port ;)
Comment 12 Jon Masters 2006-05-01 18:00:44 EDT
FYI, I experience similar problems with my Powerbook (recent rawhide kernels).
In my case it's rare and random unless I'm in an area with a lot of APs - then
the laptop will last at most 15-20 minutes before it locks hard. So far, I can
really only reproduce this in conference type environments.
Comment 13 John W. Linville 2006-07-11 10:23:36 EDT
Well, this one has grown a beard...can someone confirm that this is still a 
problem w/ current FC5 and/or rawhide kernels?
Comment 14 Bernard Johnson 2006-07-12 01:52:06 EDT
Using the conditions in comment #11, I am no longer able to reproduce this bug
with a fully updated rawhide system.

Note You need to log in before you can comment on or make changes to this bug.