Bug 470225 - wireless disconnects and system lockups with iwl3945 and iwl4965
wireless disconnects and system lockups with iwl3945 and iwl4965
Status: CLOSED NEXTRELEASE
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
10
x86_64 Linux
medium Severity high
: ---
: ---
Assigned To: Stanislaw Gruszka
Fedora Extras Quality Assurance
http://www.intellinuxwireless.org/bug...
: Reopened
: 471464 472372 (view as bug list)
Depends On:
Blocks: F10Target
  Show dependency treegraph
 
Reported: 2008-11-06 06:53 EST by Kris
Modified: 2009-11-19 04:21 EST (History)
14 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-11-19 03:26:24 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
/var/log/messages (363.37 KB, text/plain)
2008-11-06 06:53 EST, Kris
no flags Details
iwlwifi-intel-bug-1822.patch (2.73 KB, patch)
2009-01-15 13:54 EST, John W. Linville
no flags Details | Diff
/var/log/messages showing disconnects (5.27 KB, text/plain)
2009-01-28 09:33 EST, Kevin R. Page
no flags Details

  None (edit)
Description Kris 2008-11-06 06:53:42 EST
Created attachment 322699 [details]
/var/log/messages

Description of problem:

I get random kernel-panics, when using PPTP. 

Version-Release number of selected component (if applicable):

kernel version:   tested and reproducible on 2.6.27.4-39.fc10.x86_64  till 2.6.27.4-68.fc10.x86_64.

pptp version is Version 1.7.2, release 3.fc10




How reproducible:

Use PPTP and wait, usually a few minutes. Happens even if i don't heavily use the network


Actual results:


Expected results:


Additional info:

I suspect this maybe related to the MPPE encryption module (?), as its the only kernelspace part of pptp afaik (?)
It could also be my wi-fi driver, but i doubt it, as it only happens when i use pptp
Attached is my /var/log/messages
Comment 1 Dave Jones 2008-11-06 10:53:22 EST
interesting log.

* selinux denials.  If you've booted at all with selinux=0 at some point, you'll want to run touch /.autorelabel and reboot.  It'll take a while, but you should then get all the correct file contexts.  If you still see the selinux denial messages, please file bugs on them, as they shouldn't be there.

* lots of packet loss. guess that's the nature of whatever your ppp link connects to

* finally a warning from the wireless stack.

I don't see an immediate correlation between the wireless & the ppp session, so it could be coincidence, but I'll leave it to John for deeper analysis.
Comment 2 Kris 2008-11-06 18:20:45 EST
Hmm.. I had/and still have SELinux set on permissive mode, so should this still be a problem?

Also, I think we should rule out the chance of it being coincidence, as I use my laptop a lot, and the panics only happen at my evil school where we use PPTP :(
Comment 3 John W. Linville 2008-11-14 09:27:20 EST
*** Bug 471464 has been marked as a duplicate of this bug. ***
Comment 4 John W. Linville 2008-11-14 09:28:58 EST
What driver is in use here?  Can we have the output from lsmod and/or lspci?
Comment 5 Ben Gamari 2008-11-14 14:32:43 EST
For me it's iwl3945.

$ lspci -v
0c:00.0 Network controller: Intel Corporation PRO/Wireless 3945ABG Network Connection (rev 02)
	Subsystem: Intel Corporation Device 1020
	Flags: bus master, fast devsel, latency 0, IRQ 17
	Memory at fe8ff000 (32-bit, non-prefetchable) [size=4K]
	Capabilities: <access denied>
	Kernel driver in use: iwl3945
	Kernel modules: iwl3945
Comment 6 Kris 2008-11-15 14:16:52 EST
iwl3945 here too..

$ sudo lspci -v
05:00.0 Network controller: Intel Corporation PRO/Wireless 3945ABG Network Connection (rev 02)
	Subsystem: Intel Corporation PRO/Wireless 3945ABG Network Connection
	Flags: fast devsel, IRQ 18
	Memory at f0200000 (32-bit, non-prefetchable) [size=4K]
	Capabilities: [c8] Power Management version 2
	Capabilities: [d0] Message Signalled Interrupts: Mask- 64bit+ Count=1/1 Enable-
	Capabilities: [e0] Express Legacy Endpoint, MSI 00
	Capabilities: [100] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSVoil-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSVoil-
		UESvrt:	DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSVoil-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
		AERCap:	First Error Pointer: 14, GenCap- CGenEn- ChkCap- ChkEn-
	Capabilities: [140] Device Serial Number 50-6f-43-ff-ff-bf-1c-00
	Kernel driver in use: iwl3945
	Kernel modules: iwl3945


$ lsmod (when not using pptp, and thus the mppe module is not shown)
Module                  Size  Used by
nls_utf8               10240  1 
fuse                   60992  2 
bridge                 56224  0 
stp                    10756  1 bridge
bnep                   22016  2 
sco                    19204  2 
l2cap                  28544  3 bnep
bluetooth              60068  5 bnep,sco,l2cap
sunrpc                191208  3 
ipv6                  287272  36 
cpufreq_ondemand       15504  1 
acpi_cpufreq           17552  2 
freq_table             12928  2 cpufreq_ondemand,acpi_cpufreq
dm_multipath           23704  0 
uinput                 16128  0 
arc4                   10240  2 
i2c_i801               17820  0 
ecb                    11264  2 
crypto_blkcipher       24196  1 ecb
wmi                    14912  0 
snd_hda_intel         476576  3 
i2c_core               29216  1 i2c_i801
iwl3945               151780  0 
video                  28316  0 
lirc_ite8709           15364  0 
r8169                  40964  0 
serio_raw              14084  0 
lirc_dev               20408  1 lirc_ite8709
mii                    13056  1 r8169
iTCO_wdt               20176  0 
iTCO_vendor_support    11652  1 iTCO_wdt
joydev                 19328  0 
snd_seq_dummy          11396  0 
snd_pcsp               18940  1 
snd_seq_oss            39104  0 
snd_seq_midi_event     14848  1 snd_seq_oss
snd_seq                61968  5 snd_seq_dummy,snd_seq_oss,snd_seq_midi_event
output                 11264  1 video
snd_seq_device         15380  3 snd_seq_dummy,snd_seq_oss,snd_seq
snd_pcm_oss            52224  0 
snd_mixer_oss          23168  1 snd_pcm_oss
snd_pcm                85512  3 snd_hda_intel,snd_pcsp,snd_pcm_oss
battery                21000  0 
ac                     13320  0 
snd_timer              30352  2 snd_seq,snd_pcm
snd_page_alloc         16656  2 snd_hda_intel,snd_pcm
snd_hwdep              16392  1 snd_hda_intel
rfkill                 17316  2 iwl3945
snd                    68984  19 snd_hda_intel,snd_seq_dummy,snd_pcsp,snd_seq_oss,snd_seq,snd_seq_device,snd_pcm_oss,snd_mixer_oss,snd_pcm,snd_timer,snd_hwdep
mac80211              213872  1 iwl3945
soundcore              14992  1 snd
cfg80211               32400  2 iwl3945,mac80211
Comment 7 Charles R. Anderson 2008-11-20 03:25:26 EST
I'm seeing the same WARNING on iwl4965 hardware with Fedora 10 kernel-2.6.27.5-117.fc10.x86_64.  See:

http://kerneloops.org/raw.php?rawid=103470&msgid=

and this for the complete list:

http://kerneloops.org/guilty.php?guilty=rs_get_rate&version=2.6.27-release&start=1802240&end=1835007&class=warn

Interestingly, I am *not* seeing this problem with Fedora 9 kernel-2.6.27.5-37.fc9.x86_64 on the same hardware.
Comment 8 Kris 2008-11-20 04:25:59 EST
More of the same warnings:

http://fpaste.org/paste/578


Also, I've taken out parts of my /var/log/messages that is logged just before the panics happen. Dunno if they're useful:

http://fpaste.org/paste/579
Comment 9 John W. Linville 2008-11-20 10:58:31 EST
*** Bug 472372 has been marked as a duplicate of this bug. ***
Comment 10 John W. Linville 2008-11-20 11:20:12 EST
Other than the annoying message, is there a functional problem?  That is, are you still able to get to Google or whatnot?
Comment 11 Charles R. Anderson 2008-11-20 11:49:29 EST
Yes, there is a functional problem.  It seems to disconnect from the wireless network when the kernel prints this message.  I've noticed WPA2 Enterprise disconnections with this kernel vs. no problems with the F9 one on the same hardware.
Comment 12 Bug Zapper 2008-11-25 23:52:36 EST
This bug appears to have been reported against 'rawhide' during the Fedora 10 development cycle.
Changing version to '10'.

More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Comment 13 Charles R. Anderson 2008-12-12 22:41:18 EST
Issue remains in -132 and -134 F10 kernels.
Comment 14 Charles R. Anderson 2008-12-16 11:21:02 EST
Issue happened again in 2.6.27.9-152.rc2.fc10.x86_64.  I had been running older firmware, and I've now updated to:

iwl4965-firmware-228.57.2.23-2.noarch

as suggested in bug #457154

and

kernel-2.6.27.9-157.fc10.x86_64

which has this fix mentioned:

* Mon Dec 15 2008 John W. Linville <linville@redhat.com> 2.6.27.9-157
- iwlagn: fix RX skb alignment

Hopefully between these two updates, we'll have some improvement.
Comment 15 Charles R. Anderson 2008-12-18 15:24:45 EST
iwl4965-firmware-228.57.2.23-2.noarch
kernel-2.6.27.9-157.fc10.x86_64

No improvement with the above packages.  I still see WARNING: at include/../net/mac80211/rate.h:152 rs_get_rate+0x254/0x28x [iwlagn]() (Not tainted), along with loss of wireless connectivity.
Comment 16 Kevin R. Page 2009-01-07 10:52:29 EST
I'm also seeing this on a Thinkpad X61s w/iwl3945, though I'm not using PPTP, and i686 not x86_64. I installed F10 before Christmas, and the warnings seem more frequent since getting a large batch of updates this week.

kernel-2.6.27.9-159.fc10.i686
iwl3945-firmware-15.28.2.8-2.noarch

The warning is accompanied by loss of network, and happens frequently though not at consistent intervals - sometimes after 2-3 minutes, sometimes it'll go ok for 20 minutes or so.

Some examples sent by kerneloops:
http://www.kerneloops.org/submitresult.php?number=169101
http://www.kerneloops.org/submitresult.php?number=170421
etc.

I note that all of the versions reported at kerneloops seem to be for .fc10 or .fc9:
http://kerneloops.org/search.php?filter=2.6.27.9&search=rs_get_rate
(though do other distros ship/enable kerneloops?)
Comment 17 Kevin R. Page 2009-01-07 13:28:50 EST
Having just returned to my laptop after a few minutes I found it locked hard (flashing Caps Lock). Could be a co-incidence, but it's usually very stable.

Nothing obviously useful in /var/log/messages, the last entries were:
Jan  7 18:13:09 mopp NetworkManager: <info>  (wlan0): supplicant connection state:  completed -> associating
Jan  7 18:13:09 mopp NetworkManager: <info>  (wlan0): supplicant connection state:  associating -> associated
Jan  7 18:13:09 mopp NetworkManager: <info>  (wlan0): supplicant connection state:  associated -> complete

(and before that a log full of kernel oops, obviously, but nothing immediately before the lockup)
Comment 18 Kevin R. Page 2009-01-15 00:39:22 EST
I've tried running with kernel-2.6.27.5-117.fc10.i686 and still experience problems, so I guess this is an F10 problem, not just limited to the latest kernels. I can't see an earlier iwl3945-firmware to revert to.

I hesitate to mention this as it's rather circumstantial evidence, but it seems that the problem occurs more when the network is "busy". e.g. this afternoon in the meeting room in an office with many laptops on wireless, looking back at /var/log/messages over a single hour I count 9 oopses.

This evening, in my room, very likely much lower density of wireless users, and I've been connected for 2 hours without issue.

Now this isn't a "native" network for me, and while it's part of the same system (same SSID etc.) I don't know that the AP hardware is the same. (But if it's not, perhaps the problem is only exhibited with some AP?)

This also aligns with my experiences over and after Christmas: at home (quiet network) I didn't see any problems, but on return to work (busy network) lots of oopses. I *know* the AP hardware isn't the same in this case, mind.

If there's anything I can do, any debug I can collect, to help get to the bottom of this, please let me know. Obviously 10 ooopses and hour - loosing network each time - makes F10 borderline unusable on wireless.

Similarly any hints, even if they're "known problem, but won't be fixed any time soon", would be appreciated so I can plan to work around it, somehow.

(FWIW, F8 and F9 ran fine on this hardware without oopsing)
Comment 19 Charles R. Anderson 2009-01-15 11:17:50 EST
See this upstream bug report with patch that appears to fix the issue:

http://www.intellinuxwireless.org/bugzilla/show_bug.cgi?id=1822
Comment 20 John W. Linville 2009-01-15 13:00:14 EST
Thanks for the info...FWIW, Intel hasn't pushed that upstream yet.
Comment 21 John W. Linville 2009-01-15 13:54:09 EST
Created attachment 329127 [details]
iwlwifi-intel-bug-1822.patch

Patch modified for current F10 kernels...
Comment 22 Kevin R. Page 2009-01-16 00:29:10 EST
Current = rawhide/devel?

Trying to apply to kernel-2.6.27.9-159.fc10 fails:

$ patch --verbose -F1 -p1 --dry-run < ../../../SOURCES/iwlwifi-intel-bug-1822.patch 
Hmm...  Looks like a unified diff to me...
The text leading up to this was:
--------------------------
|diff -up linux-2.6.28.noarch/drivers/net/wireless/iwlwifi/iwl-3945-rs.c.orig linux-2.6.28.noarch/drivers/net/wireless/iwlwifi/iwl-3945-rs.c
|--- linux-2.6.28.noarch/drivers/net/wireless/iwlwifi/iwl-3945-rs.c.orig	2009-01-15 13:30:23.000000000 -0500
|+++ linux-2.6.28.noarch/drivers/net/wireless/iwlwifi/iwl-3945-rs.c	2009-01-15 13:39:16.000000000 -0500
--------------------------
Patching file drivers/net/wireless/iwlwifi/iwl-3945-rs.c using Plan A...
Hunk #1 FAILED at 665.
1 out of 1 hunk FAILED -- saving rejects to file drivers/net/wireless/iwlwifi/iwl-3945-rs.c.rej
Hmm...  The next patch looks like a unified diff to me...
The text leading up to this was:
--------------------------
|diff -up linux-2.6.28.noarch/drivers/net/wireless/iwlwifi/iwl-agn-rs.c.orig linux-2.6.28.noarch/drivers/net/wireless/iwlwifi/iwl-agn-rs.c
|--- linux-2.6.28.noarch/drivers/net/wireless/iwlwifi/iwl-agn-rs.c.orig	2009-01-15 13:30:23.000000000 -0500
|+++ linux-2.6.28.noarch/drivers/net/wireless/iwlwifi/iwl-agn-rs.c	2009-01-15 13:37:37.000000000 -0500
--------------------------
Patching file drivers/net/wireless/iwlwifi/iwl-agn-rs.c using Plan A...
Hunk #1 FAILED at 775.
Hunk #2 succeeded at 798 (offset 6 lines).
Hunk #3 FAILED at 817.
Hunk #4 FAILED at 2123.
Hunk #5 FAILED at 2143.
4 out of 5 hunks FAILED -- saving rejects to file drivers/net/wireless/iwlwifi/iwl-agn-rs.c.rej
done

(applying to the rpmbuild prepared source tree here to get patch output, but previously following http://fedoraproject.org/wiki/Docs/CustomKernel )
Comment 23 John W. Linville 2009-01-16 10:02:23 EST
It is for 2.6.28-based kernels.  Unfortunately, the build was broken yesterday.  Hopefully this one will succeed:

   http://koji.fedoraproject.org/koji/buildinfo?buildID=78849

When/if it does, please try to recreate the issue with those kernels?
Comment 24 John W. Linville 2009-01-16 17:02:55 EST
That one seems to have built -- please test...thanks!
Comment 25 Kevin R. Page 2009-01-17 19:19:15 EST
I guess there's something I need to change (mkinitrd? - couldn't see any newer versions?) to get kernel-2.6.28.1-9.rc2.fc10.i686 to boot? I enter my LUKS passphrase, but then booting halts with "error mounting /dev/root on /sysroot as ext3: no such file or directory".

Or if not, and it's a bug with that kernel, let me know and I'll file something.

I'll test the iwlwifi fix when I can boot with the kernel ;)
Comment 26 John W. Linville 2009-01-19 10:11:22 EST
Hmmm...beats me.  That sounds more like the initrd didn't get recreated properly.  That happens from time to time.  Could I persuade you to try installing that kernel once more?  Sometimes that resolves such issues.
Comment 27 Kevin R. Page 2009-01-20 07:13:51 EST
No luck, I'm afraid.

Tried:
mkinitrd -v -f initrd-2.6.28.1-9.rc2.fc10.i686.img 2.6.28.1-9.rc2.fc10.i686

also (via bug #471093, bug #466607):
mkinitrd -v --with=scsi_wait_scan -f initrd-2.6.28.1-9.rc2.fc10.i686.img 2.6.28.1-9.rc2.fc10.i686

and tried scsi_mod.scan=sync in grub.conf, though the above bugs suggest this shouldn't be expected to work (scsi_mod no longer a module?).

I've submitted bug #480761, but if you've any other ideas so I can test this kernel, let me know.
Comment 28 Charles R. Anderson 2009-01-21 17:33:02 EST
(In reply to comment #23)
> It is for 2.6.28-based kernels.  Unfortunately, the build was broken yesterday.
>  Hopefully this one will succeed:
> 
>    http://koji.fedoraproject.org/koji/buildinfo?buildID=78849
> 
> When/if it does, please try to recreate the issue with those kernels?


This kernel fixes the issue for me completely.  I've seen no oopses or panics since using 2.6.28.1-9.rc2.fc10.x86_64.
Comment 29 John W. Linville 2009-01-22 14:22:37 EST
I'm going to go ahead and close on the basis of comment 28.  Please reopen if the problem continues with 2.6.28.1-9.rc2.fc10 or later kernels.
Comment 30 Kevin R. Page 2009-01-23 14:07:37 EST
Back on one of the previously problematic networks and 2.6.28.1-9.rc2.fc10.i686 fixes the issue for me as well - thanks!
Comment 31 Tom Davidson 2009-01-23 22:10:28 EST
When will this fix be pushed through the update system? I am running a fully updated stock F10 system and am experiencing frequent oopses and occasional hard lockups in the iwl3945 driver.

bugzilla.redhat.com/page.cgi?id=fields.html says that "CLOSED, CURRENTRELEASE" means "the problem described has been fixed and only ever appeared in unsupported or unreleased products" which doesn't seem to be correct.
Comment 32 John W. Linville 2009-01-26 09:48:10 EST
Thanks for the link -- I'm not sure how accurate it is w.r.t. actual usage, but whatever. :-)

As for when it will come through the update system, you may have to wait a bit -- there is some ongoing issue with 2.6.28 and later kernels on F10.  Hopefully that will get resolved before too long.
Comment 33 Tom Davidson 2009-01-27 13:44:52 EST
Can this fix be applied any sooner than 2.6.28, since it is a crasher for fairly common hardware? I notice that there are still 2.6.27 builds in the update pipeline on Koji.

Also, sorry to be a bug status nerd, but I've since come across http://fedoraproject.org/wiki/BugZappers/BugStatusWorkFlow. The kind folks in #fedora-bugzappers tell me that means this one should actually be OPEN,MODIFIED (since it's not in the testing repo, yet).
Comment 34 John W. Linville 2009-01-27 13:56:53 EST
OK, bug status nerds...who is going to make sure this disappears from my sight when the update is actually pushed?
Comment 35 John W. Linville 2009-01-27 14:02:13 EST
Nevermind, looks like it is already off My Bugzilla Front Page...

The continuing 2.6.27 stream seems to be forked from the normal process.  Maybe we can persuade cebbert or kylem to ask for that patch to be included.
Comment 36 Charles R. Anderson 2009-01-27 14:11:49 EST
This bug affects 3965 and 4965.  The patch fixes the problem completely for me.
Comment 37 Tom Davidson 2009-01-27 14:20:27 EST
The patched kernel also fixes the problem for me (iwl3945).

This is still the #1 issue on the frontpage at kerneloops.org right now (~1300 reported warnings this week, mostly from latest fc10 kernel), so pushing a timely update is probably a good idea, esp since this appears to be a crasher.
Comment 38 Nicholas LaRoche 2009-01-27 21:08:54 EST
(In reply to comment #17)
> Having just returned to my laptop after a few minutes I found it locked hard
> (flashing Caps Lock). Could be a co-incidence, but it's usually very stable.
> 
> Nothing obviously useful in /var/log/messages, the last entries were:
> Jan  7 18:13:09 mopp NetworkManager: <info>  (wlan0): supplicant connection
> state:  completed -> associating
> Jan  7 18:13:09 mopp NetworkManager: <info>  (wlan0): supplicant connection
> state:  associating -> associated
> Jan  7 18:13:09 mopp NetworkManager: <info>  (wlan0): supplicant connection
> state:  associated -> complete
> 
> (and before that a log full of kernel oops, obviously, but nothing immediately
> before the lockup)

I have the same issue with identical log entries.

On kernel 2.6.27-12-170.2.5.fc10.x86_64 the problem disappears when I disable wireless with the hardware switch which leads me to believe its related to iwl3945.
Comment 39 Kevin R. Page 2009-01-28 09:33:30 EST
Created attachment 330229 [details]
/var/log/messages showing disconnects

Possibly should be a new bug; possibly a re-occurrence of bug #235247 ? Or an artefact of this fix?

Since starting to use 2.6.28.1-9.rc2.fc10.i686 I get frequent (after a few minute) disconnects from the network after booting and connecting.

Forcing NM to reconnect doesn't help, however the problem seems to be cured if I use the kill switch to turn wireless off, then turn it back on.

/var/log/messages attached.
Comment 40 Tom Davidson 2009-01-29 02:54:04 EST
I don't believe that the patch mentioned above is the fix for this problem. 

kylem backported that patch to the current 2.6.27 kernel (rpms available at http://koji.fedoraproject.org/koji/taskinfo?taskID=1089515 ), but this does not fix the problem for me.

I also tried out the 2.6.28-3.fc10.i686 kernel, which I verified does *not* contain this patch (rpms at http://koji.fedoraproject.org/koji/buildinfo?buildID=78354 ), and it *does* fix the problem. 

So the success with 2.6.28.1-9 is apparently not due to this fix. 

One solution is just to wait for 2.6.28, but recent traffic on fedora-kernel-devel suggests that this is held up for a while ( http://koji.fedoraproject.org/koji/buildinfo?buildID=78354 ).
Comment 41 zhoujingmiller 2009-02-06 18:12:31 EST
I have tried kernel 2.6.28-3, and the oops of iwl3945 resurfaced after several suspension and hibernation (running 10 hours before getting the iwl3945 oops message). And 2.6.28.1-19, though seemingly solved issue (no oops reported so far), still has the "Disabling IRQ#16" message, yet in 2.6.28.1-19 my system seems to be able to live with the issue long enough. The system is able to shutdown in 2.6.28.1-19, but not in 2.6.27.12-170.2.5, with that "Disabling IRQ#16" recurring.
Comment 42 zhoujingmiller 2009-02-06 19:57:11 EST
(feel free to delete this comment)
Does Linville's comment #35 ("looks like it is already off My Bugzilla Front Page...") imply that this bug is not being worked on?
Comment 43 Tom Davidson 2009-02-06 21:35:55 EST
#41: The reports at http://kerneloops.org/searchweek.php?search=rs_get_rate agree with your experience: they show that this is still being hit in 2.6.28-1 and 2.6.28-3. 

I'm not too sure how to read these oopses, but there is a difference in the oopses generated by 2.6.27 and .28:

All the 2.6.27 reports show:
include/../net/mac80211/rate.h:152

While all the 2.6.28 reports show:
include/net/mac80211.h:1863 

Is this the calling function? If so, maybe the newer issue was either occluded by (or caused by) the patch. Just some very amateurish debugging--hopefully a useful clue to someone.


[For the record, this is still the number one issue on the front page of www.kerneloops.org (and is only reported by fc10 kernels--is this because it's a Fedora-specific issue, or because only Fedora is filing oops reports at kerneloops.org?)]
Comment 44 zhoujingmiller 2009-02-07 02:25:20 EST
Further testing here (around 3 hours) shows that 2.6.27.5-117.fc10.x86_64 with the wireless switch turned off has neither the wpa_supplicant/iwl3945 oops nor the "Disabling IRQ#16" warning, yet 2.6.28-3, 2.6.28.1-19, and 2.6.27.12-170.2.5 do (vide comment #41). Further to be added is that "su -c 'rmmod iwl3945'" will not solve the case for 2.6.27.12-170.2.5, which will eventually freeze everything, no matter whether the wireless switch is on or off, contradicting comment #38.
Comment 45 Kevin R. Page 2009-02-07 10:44:24 EST
(In reply to comment #43)
> #41: The reports at http://kerneloops.org/searchweek.php?search=rs_get_rate
> agree with your experience: they show that this is still being hit in 2.6.28-1
> and 2.6.28-3. 

Confirmed: I had an oops yesterday with 2.6.28.1-9.rc2.fc10.i686. This has taken 
~2 weeks to show up, rather than the 20 minutes I was seeing with 2.6.27 (same WLAN), so perhaps the fix has only mitigated whatever *triggers* the underlying oops problem (perhaps just on my hardware and network).

I've also been seeing occasional - but more frequent than F9 - disconnects from the WLAN. This is what I was also seeing in 2.6.27 just before every oops; with 2.6.28 it survived the reconnect without an oops each time until yesterday. This looks to be the error log on each drop:
Feb  5 16:59:00 mopp kernel: iwl3945: Error sending REPLY_SCAN_CMD: time out after 500ms.
Feb  5 16:59:01 mopp kernel: iwl3945: Error sending REPLY_RXON: time out after 500ms.
Feb  5 16:59:01 mopp kernel: iwl3945: Error setting new configuration (-110).
Feb  5 16:59:01 mopp kernel: iwl3945: Error sending REPLY_TX_PWR_TABLE_CMD: time out after 500ms.
Comment 46 zhoujingmiller 2009-02-22 19:42:46 EST
I tried the recent vanilla release as follows,

$ uname -r
2.6.28.7

, which runs without neither Xorg nor iwl3945 oopses for almost 2 hours right now, with the following,

$ rpm -qv kerneloops
kerneloops-0.12-1.fc10.x86_64

. So I could reasonably conclude that the upstream already had the issue fixed. And this bug could be closed I guess, albeit still we do not know what the cause is.
Comment 47 John W. Linville 2009-02-23 10:00:47 EST
Have you tried any of the 2.6.29-based F10 kernels available in Koji?

   http://koji.fedoraproject.org/koji/buildinfo?buildID=82744
Comment 48 zhoujingmiller 2009-02-23 10:58:20 EST
Well, not yet, but thanks for pointing that out, as hopefully I would prefer to keep my machine stable, id est, frozen, for a week. I will probably restore settings and check out 2.6.29 kernels this weekend. The vanilla 2.6.28.7 kernel has already run for almost 24 hours, and nothing bad happens so far. Wireless and Xorg is going well and no more oops messages, particularly about tainted or not-tainted module. I also have to make sure the issue will not resurface as it did in the previous 2.6.28.x kernel releases in koji.
Comment 49 Nicholas LaRoche 2009-02-23 17:49:01 EST
Are there any known good kernels from 2.6.28.x with regard to iwl3945 and the kernel oops?
Comment 50 Nicholas LaRoche 2009-02-23 17:49:42 EST
(In reply to comment #49)
> Are there any known good kernels from 2.6.28.x with regard to iwl3945 and the
> kernel oops?

From koji
Comment 51 John W. Linville 2009-02-24 09:36:20 EST
The powers that be in Fedora decided to skip 2.6.28 -- the last 2.6.28-based kernel build I see in Koji was based on 2.6.28.1 and is the kernel reference in comment 45 as still showing signs of the problem.
Comment 52 Kevin R. Page 2009-02-27 06:02:21 EST
(In reply to comment #47)
> Have you tried any of the 2.6.29-based F10 kernels available in Koji?

kernel-2.6.29-0.43.rc6.fc10.i686 wireless stalls after 30 secs or so, and doesn't seem to put anything useful in /var/log/messages either (i.e. nothing in messages, but my wireless ground to a halt). I'd say this is worse, not better. Incidentally, my bluetooth and UMTS modem haven't been detected either.

I note there's now a newer 2.6.29 kernel in koji. Given the intermittent nature of the problem (seems to only happen on "busy" networks?) it's tricky to track, install, and test the latest kernel - I'm not on a known problem network all the time, and installing a new kernel is tricky without network ;) . If you could point out specific kernels you'd like us to try - perhaps where fixes are known to have been committed - that would be helpful.
Comment 53 Kevin R. Page 2009-02-27 06:13:08 EST
(In reply to comment #52)
> I note there's now a newer 2.6.29 kernel in koji.

Apologies, this still seems to be the latest .29 F10 kernel after all. Let me know if there's anything else you'd like me to try with it.
Comment 54 Andrew Overholt 2009-03-26 15:10:49 EDT
A few of us at EclipseCon here this week have been experiencing this with our x61 ThinkPads.  At least we think this is the same bug :)
Comment 55 zhoujingmiller 2009-03-26 16:23:12 EDT
The issue seems to go away with kernel-2.6.29-3.fc10.x86_64, its been running for 2 days and I have not got any issue with wireless.
Comment 56 Chuck Ebbert 2009-03-27 12:51:13 EDT
Marking as tentatively fixed. Some more confirmations would be helpful...
Comment 57 Nicholas LaRoche 2009-03-27 13:30:26 EDT
I've had no problems with freezing/crashing with 2.6.29-0.43.rc6.fc10.x86_64 for several weeks now.
Comment 58 Kevin R. Page 2009-03-28 13:00:59 EDT
(In reply to comment #56)
> Marking as tentatively fixed. Some more confirmations would be helpful...  

I've installed kernel-2.6.29-3.fc10.x86_64 and will test next week on the known busy problem network to see it solves the stalls I had in comment #52 with 2.6.29-0.43.rc6

Related to NM: 2.6.29-3.fc10 certainly doesn't detect my wwan card (or bluetooth) as mentioned previously in comment #52. I've reported this against the kernel as bug #492709, but if you think it's more likely to be hal or NM, please say/re-assign.

This is hard to report on bugzilla as it spans multiple components, but F10 has been *terrible* for hardware regressions on this laptop. I've not yet had a set of packages for F10 that gives me more than two out of: working X, working wlan, working wwan/bluetooth. Worse still, testing kernels etc. for any one of the problems has exacerbated the problem for the others. I realise it's just an "unlucky" combination of hardware, but it's a combination many users will have - and that worked with F9. What's the best way to report this?
Comment 59 ls 2009-03-30 11:47:26 EDT
Random disconnects with 3956ABG on Thinkpad X60 (i386) here too.
Installed 2.6.29-10 from koji today but still random disconnects... Maybe i'll try 2.6.29-3 as mentioned later if it will fix the problem.
Comment 60 zhoujingmiller 2009-03-30 17:45:24 EDT
re: Comment #59
Then it's hardward-dependent as I am on a Thinkpad T61 with x86_64 and was with 2.6.29-3 and now with 2.6.29-10.
Comment 61 Nicholas LaRoche 2009-03-31 09:16:51 EDT
(In reply to comment #60)
> re: Comment #59
> Then it's hardward-dependent as I am on a Thinkpad T61 with x86_64 and was with
> 2.6.29-3 and now with 2.6.29-10.  

I'm using a Thinkpad T61p (Intel 3945ABG, iwl3945) and I haven't had any freezing issues with 2.6.29-0.43.rc6.fc10.x86_64.

Are you using the Intel 3945 wireless card or the newer one that was offered with the T61? (4965?)
Comment 62 zhoujingmiller 2009-04-01 01:19:12 EDT
re: Comment #60

I'm with the 3945 one.
Comment 63 ls 2009-04-01 02:41:53 EDT
(In reply to comment #59)
> Random disconnects with 3956ABG on Thinkpad X60 (i386) here too.
> Installed 2.6.29-10 from koji today but still random disconnects... Maybe i'll
> try 2.6.29-3 as mentioned later if it will fix the problem.  

I made a mistake, it's a 3945ABG of course.
Comment 64 Chuck Ebbert 2009-04-15 16:41:32 EDT
Please try kernel-2.6.29.1-30 from the updates-testing repository.
Comment 65 Andrew Overholt 2009-04-16 10:04:09 EDT
I haven't been experiencing any issues with 2.6.29-3.fc10.x86_64 but I'm also rarely on a high-traffic wifi network like I was at EclipseCon when I experienced this.
Comment 66 Kevin R. Page 2009-04-23 10:59:48 EDT
kernel-2.6.29.1-30.fc10.i686 and now kernel-2.6.29.1-42.fc10.i686 seem ok to me...

BUT I think we may have had another AP added to the previously problematic network here (due to high-traffic), so I wouldn't want to guarantee I'm on a high-traffic net anymore (if that was indeed the trigger for the bug).
Comment 67 Kevin R. Page 2009-05-20 12:50:39 EDT
2.6.29.1-42.fc10.i686 and back on the original "problem" (busy) network: I'm getting pretty regular disconnects from the network. It doesn't always seem to fully re-connect to the network without turning wireless off and on again (which I guess modprobes in and out).


I have installed 2.6.29.3-60.fc10 from updates-testing, and will boot into that tomorrow.


Got a couple of oops too, when I used the kill-switch, but I think these are from sierra (which the kill-switch also bounces, and probably triggered):
May 20 17:02:43 mopp kernel: BUG: sleeping function called from invalid context at kernel/mutex.c:88
May 20 17:02:43 mopp kernel: in_atomic(): 1, irqs_disabled(): 1, pid: 0, name: swapper
May 20 17:02:43 mopp kernel: Pid: 0, comm: swapper Not tainted 2.6.29.1-42.fc10.i686 #1
May 20 17:02:43 mopp kernel: Call Trace:
May 20 17:02:43 mopp kernel: [<c042843e>] __might_sleep+0xdf/0xe4
May 20 17:02:43 mopp kernel: [<c06d6957>] mutex_lock+0x18/0x2d
May 20 17:02:43 mopp kernel: [<c06d78a6>] ? _spin_unlock_irqrestore+0x22/0x38
May 20 17:02:43 mopp kernel: [<c058ed9e>] reset_buffer_flags+0x4f/0xd5
May 20 17:02:43 mopp kernel: [<c058ee31>] n_tty_flush_buffer+0xd/0x60
May 20 17:02:43 mopp kernel: [<c058f33a>] n_tty_receive_buf+0x4b6/0xff9
May 20 17:02:43 mopp kernel: [<c0428786>] ? default_wake_function+0xb/0xd
May 20 17:02:43 mopp kernel: [<c0424f25>] ? enqueue_entity+0x295/0x29d
May 20 17:02:43 mopp kernel: [<c041b421>] ? default_spin_lock_flags+0x8/0xb
May 20 17:02:43 mopp kernel: [<c06d78a6>] ? _spin_unlock_irqrestore+0x22/0x38
May 20 17:02:43 mopp kernel: [<c041b421>] ? default_spin_lock_flags+0x8/0xb
May 20 17:02:43 mopp kernel: [<c05918f5>] flush_to_ldisc+0xf1/0x17f
May 20 17:02:43 mopp kernel: [<c05919c4>] tty_flip_buffer_push+0x41/0x51
May 20 17:02:43 mopp kernel: [<f87088b9>] sierra_indat_callback+0x68/0xb0 [sierra]
May 20 17:02:43 mopp kernel: [<c05f129e>] usb_hcd_giveback_urb+0x63/0x97
May 20 17:02:43 mopp kernel: [<c0606f8a>] uhci_giveback_urb+0xe5/0x15f
May 20 17:02:43 mopp kernel: [<c06077bf>] uhci_scan_schedule+0x533/0x776
May 20 17:02:43 mopp kernel: [<c0447987>] ? clockevents_program_event+0xdb/0xea
May 20 17:02:43 mopp kernel: [<c06092fa>] uhci_irq+0x107/0x11c
May 20 17:02:43 mopp kernel: [<c05f0ee5>] usb_hcd_irq+0x40/0xa3
May 20 17:02:43 mopp kernel: [<c0466ee4>] handle_IRQ_event+0x2f/0x64
May 20 17:02:43 mopp kernel: [<c0468061>] handle_fasteoi_irq+0x7b/0xb5
May 20 17:02:43 mopp kernel: [<c0467fe6>] ? handle_fasteoi_irq+0x0/0xb5
May 20 17:02:43 mopp kernel: <IRQ>  [<c04043ac>] ? common_interrupt+0x2c/0x34
May 20 17:02:43 mopp kernel: [<c057e932>] ? acpi_idle_enter_bm+0x259/0x29a
May 20 17:02:43 mopp kernel: [<c0635464>] ? cpuidle_idle_call+0x60/0x94
May 20 17:02:43 mopp kernel: [<c0402dfc>] ? cpu_idle+0x6b/0x8b
May 20 17:02:43 mopp kernel: [<c06d3053>] ? start_secondary+0x1c9/0x1d1
May 20 17:02:44 mopp kerneloops: Submitted 1 kernel oopses to www.kerneloops.org
Comment 68 Kevin R. Page 2009-07-13 07:44:49 EDT
Upgraded to F11, and still having some disconnect issues on the "problem"/busy network. My feeling is that the problem is less frequent, but this could equally be the number of hosts on the network changing? Once a disconnect has occurred it seems to be much harder to get a stable connection (but again, this could just be a busy day?)

The problems seem to kick off with:
Jul 13 12:36:49 mopp NetworkManager: <info>  (wlan0): supplicant connection state:  completed -> disconnected
Jul 13 12:36:49 mopp NetworkManager: <info>  (wlan0): supplicant connection state:  disconnected -> scanning
Jul 13 12:36:49 mopp NetworkManager: <info>  (wlan0): supplicant connection state:  scanning -> associating
Jul 13 12:36:49 mopp kernel: iwl3945: Microcode SW error detected.  Restarting 0x82000008.
Jul 13 12:36:49 mopp kernel: iwl3945: Error Reply type 0x000002F0 cmd REPLY_TX_PWR_TABLE_CMD (0x97) seq 0x0443 ser 0x00000078
Jul 13 12:36:49 mopp kernel: iwl3945: Can't stop Rx DMA.
Jul 13 12:36:49 mopp kernel: Registered led device: iwl-phy0:radio
Jul 13 12:36:49 mopp kernel: Registered led device: iwl-phy0:assoc
Jul 13 12:36:49 mopp kernel: Registered led device: iwl-phy0:RX
Jul 13 12:36:49 mopp kernel: Registered led device: iwl-phy0:TX
Jul 13 12:36:49 mopp NetworkManager: <info>  (wlan0): supplicant connection state:  associating -> disconnected
Jul 13 12:37:04 mopp NetworkManager: <info>  (wlan0): device state change: 8 -> 3
Jul 13 12:37:04 mopp NetworkManager: <info>  (wlan0): deactivating device (reason: 11).
Jul 13 12:37:04 mopp NetworkManager: <info>  wlan0: canceled DHCP transaction, dhcp client pid 2120


Let me know if you need more/different logs.

On this occasion NM timed out; I could then force a reconnect by manually re-selecting the network.
Comment 69 Bug Zapper 2009-11-18 02:57:55 EST
This message is a reminder that Fedora 10 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 10.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '10'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 10's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 10 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Comment 70 Stanislaw Gruszka 2009-11-18 04:38:16 EST
According to comment #68 problem still occurs on F11.
Comment 71 Charles R. Anderson 2009-11-18 10:10:47 EST
I've not been having this problem on F11 or F12 on my ThinkPad T61/iwl4965.  I'm on a fairly large edu wireless network with lots of traffic.
Comment 72 Stanislaw Gruszka 2009-11-18 12:20:21 EST
Kevin, 

your problem on F11 seems to be different issue, could you please open new bug report for it ?
Comment 73 Kevin R. Page 2009-11-18 13:56:16 EST
I haven't seen this issue in the last month or so with recent F11 kernels (though obviously it was always pretty intermittent). If it shows up again I'll file a new bug.
Comment 74 Stanislaw Gruszka 2009-11-19 03:20:20 EST
Reversion back to F10, issue is not reproducible on F11.
Comment 75 Stanislaw Gruszka 2009-11-19 03:26:24 EST
I do not see any chance to fix this bug in F10. Since bug fixed on F11 I'm closing is with NEXTRELAESE.

Note You need to log in before you can comment on or make changes to this bug.