Bug 730653

Summary: kernel panic
Product: [Fedora] Fedora Reporter: Jan Teichmann <jan.teichmann>
Component: kernelAssignee: Stanislaw Gruszka <sgruszka>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 15CC: antonio.montagnani, gansalmon, itamar, jonathan, kernel-maint, linville, madhu.chinakonda
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: 2.6.40.6-0.fc15 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-10-13 13:30:25 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
screenshot
none
iwl3945-rs-debug.patch
none
compat-wireless-compile-fix.patch
none
output of dmesg after applying the patch
none
iwl3945-rs-debug_v2.patch
none
log of the kernel panic
none
001-iwlegacy-fix-BUG_ON-info-control.rates-0-.idx-0.patch
none
My Dmesg file after applying patches to wireless-compat none

Description Jan Teichmann 2011-08-15 09:11:13 UTC
Description of problem:
frequent kernel panics 

Version-Release number of selected component (if applicable):
kernel: 2.6.40-4.fc15.i686.PAE

03:00.0 Network controller: Intel Corporation PRO/Wireless 3945ABG [Golan] Network Connection (rev 02)

How reproducible:
frequently

Steps to Reproduce:
Unfortunately, the kernel doesn't have a chance to log the panic. All I have is a screen shot I did with my mobile.
What I noticed is, that it is for sure related to the Wifi. If I use the hardware switch to switch off the Wifi the kernel doesn't panic. If it is switched on I get a panic frequently within 30 minutes.
I also noticed that the panic only happens using the Wifi at work which is PEAP protected. I don't have the panic using an ordinary WPA2 at home.

Comment 1 Dave Jones 2011-08-15 21:56:04 UTC
even a photo of the panic would be helpful, as there's not much else to go on.

Comment 2 Jan Teichmann 2011-08-15 22:39:15 UTC
Created attachment 518364 [details]
screenshot

Comment 3 Stanislaw Gruszka 2011-08-17 15:34:46 UTC
Created attachment 518709 [details]
iwl3945-rs-debug.patch

I'm testing PEAP v1 with GTC inner authentication, but did not hit this bug yet, perhaps it happens only on some special radio conditions.

Attached patch should prevent kernel panic, and print warning and some additional data when bad condition occurs. Please apply it on compat-wireless package and provide dmesg when it generate call trace. Instructions hint:

$ wget http://wireless.kernel.org/download/compat-wireless-2.6/compat-wireless-2011-08-08.tar.bz2
$ tar xfj compat-wireless-2011-08-08.tar.bz2 
$ cd compat-wireless-2011-08-08
$ patch -p1 < ~/iwl3945-rs-debug.patch

and install modules according to README file.

Comment 4 Jan Teichmann 2011-08-18 20:16:57 UTC
doesn't compile:

./scripts/gen-compat-autoconf.sh config.mk > include/linux/compat_autoconf.h
make -C /lib/modules/2.6.40-4.fc15.i686.PAE/build M=/home/jan/Downloads/compat-wireless-2011-08-08 modules
make[1]: Entering directory `/usr/src/kernels/2.6.40-4.fc15.i686.PAE'
  CC [M]  /home/jan/Downloads/compat-wireless-2011-08-08/compat/main.o
  LD [M]  /home/jan/Downloads/compat-wireless-2011-08-08/compat/compat.o
  CC [M]  /home/jan/Downloads/compat-wireless-2011-08-08/drivers/bcma/main.o
In file included from include/linux/pci.h:43:0,
                 from /home/jan/Downloads/compat-wireless-2011-08-08/include/linux/bcma/bcma.h:4,
                 from /home/jan/Downloads/compat-wireless-2011-08-08/drivers/bcma/bcma_private.h:8,
                 from /home/jan/Downloads/compat-wireless-2011-08-08/drivers/bcma/main.c:8:
include/linux/mod_devicetable.h:386:8: error: redefinition of ‘struct bcma_device_id’
/home/jan/Downloads/compat-wireless-2011-08-08/include/linux/compat-3.0.h:32:8: note: originally defined here
make[3]: *** [/home/jan/Downloads/compat-wireless-2011-08-08/drivers/bcma/main.o] Error 1
make[2]: *** [/home/jan/Downloads/compat-wireless-2011-08-08/drivers/bcma] Error 2
make[1]: *** [_module_/home/jan/Downloads/compat-wireless-2011-08-08] Error 2
make[1]: Leaving directory `/usr/src/kernels/2.6.40-4.fc15.i686.PAE'
make: *** [modules] Error 2

Comment 5 Stanislaw Gruszka 2011-08-18 21:02:08 UTC
Hmm I'm sorry, I will look in more detail at this error on Mondey. On the meantime could you try to compile on older F-15 kernel.

Comment 6 Jan Teichmann 2011-08-19 15:17:50 UTC
I used the compat-wireless 3.0 stable release and no kernel panic occurs. I can connect successfully to the PEAP wifi and it's stable since hours. I will try to compile on an older F-15 kernel over the weekend. I keep you posted.

Comment 7 Stanislaw Gruszka 2011-08-22 12:12:29 UTC
Created attachment 519273 [details]
compat-wireless-compile-fix.patch

Compilation fix for compat-wireless-2011-08-08 on 2.6.40

Comment 8 Stanislaw Gruszka 2011-08-22 12:18:00 UTC
Did you use unpa(In reply to comment #6)
> I used the compat-wireless 3.0 stable release and no kernel panic occurs.

Did you use unpatched compat-wireless-3.0 ? If so that would be confusing, since 2.6.40 and 3.0 use the same code (fedora renamed 3.0 to 2.6.40 to do not broke applications that expect kernel version in form 2.x.y). I you patched compat-wireless-3.0, it should generate warnings and some other prints in dmesg, plese attach them here.

Comment 9 Jan Teichmann 2011-08-22 14:05:31 UTC
Created attachment 519288 [details]
output of dmesg after applying the patch

I use the patch provided against compat-wireless 3.0-2 stable release.
http://www.orbit-lab.org/kernel/compat-wireless-3.0-stable/v3.0/compat-wireless-3.0-2.tar.bz2

Comment 10 Stanislaw Gruszka 2011-08-23 13:57:53 UTC
I'm not seeing direct reason of oops yet. I'm not sure if mac layer do not give invalid band to iwl3945 rate scaling code. I think I will prepare modified patch which print more data. But first could you show output of "iwconfig wlan0" while associated, it will print used frequency?

Comment 11 Jan Teichmann 2011-08-24 08:46:47 UTC
wlan0     IEEE 802.11abg  ESSID:"eduroam"  
          Mode:Managed  Frequency:2.462 GHz  Access Point: 00:0B:85:84:42:DD   
          Bit Rate=48 Mb/s   Tx-Power=15 dBm   
          Retry  long limit:7   RTS thr:off   Fragment thr:off
          Power Management:off
          Link Quality=59/70  Signal level=-51 dBm  
          Rx invalid nwid:0  Rx invalid crypt:0  Rx invalid frag:0
          Tx excessive retries:0  Invalid misc:0   Missed beacon:0

Comment 12 Jan Teichmann 2011-08-24 08:53:16 UTC
Dear Stanislaw,
I won't have the possibility to get you more data after Tuesday 30.08. as I'm changing job. I won't have access to this computer and network any longer. Please provide me with possible patches for getting more data before that.
Best regards

Comment 13 Stanislaw Gruszka 2011-08-24 13:01:32 UTC
Created attachment 519631 [details]
iwl3945-rs-debug_v2.patch

Thanks for info Jan. I hope we will find proper fix before end of the month. If not, we just apply workaround we already have. 

This patch print some more data, hopefully that will be enough to find true reason of oops.

Comment 14 Jan Teichmann 2011-08-24 14:17:04 UTC
Created attachment 519648 [details]
log of the kernel panic

after applying the second patch to compat-wireless-3.0-2

Comment 15 Stanislaw Gruszka 2011-08-26 10:31:04 UTC
Created attachment 520059 [details]
001-iwlegacy-fix-BUG_ON-info-control.rates-0-.idx-0.patch

I'm not sure what is detailed reason of iwl3945 rate scaling code malfunction. To figure this out I need more detailed knowledge of 3945 rate scaling algorithm. Let fix kernel panic for now. This patch like previous should fix the panic and print warning instead when bad condition occurs. After your confirmation, I'll post it officially and it will be applied in kernel. Thanks.

Comment 16 Jan Teichmann 2011-08-26 13:01:34 UTC
I used the latest patch. The system is stable and reports warnings.
You can close this bug.
Thanks for the fast fix!
Best regards

Comment 17 Stanislaw Gruszka 2011-08-26 15:29:44 UTC
http://marc.info/?l=linux-wireless&m=131437236131348&w=2
Posted for now, will be applied in Fedora kernel through upstream -stable tree.

Comment 18 antonio montagnani 2011-08-31 12:33:36 UTC
I am suffering of this bug since this morning, everything was working fine until yesterday night, even when using PAE kernel

It seems that only PAE kernel is affected.

Not clear to me in which kernel will be included...

Comment 19 Stanislaw Gruszka 2011-08-31 13:37:15 UTC
Patch is currently in wireless-testing tree. Should be available on fedora kernel in week or two.

Comment 20 antonio montagnani 2011-08-31 14:17:02 UTC
I suffered also of same bug with standard kernel..

Comment 21 Stanislaw Gruszka 2011-09-01 07:46:56 UTC
Will be applied on upstream kernel, even before patch will land in fedora. Patch propagate through different kernel trees: wireless-testing -> net -> linux -> stable -> fedora.

Comment 22 Stanislaw Gruszka 2011-09-01 07:51:40 UTC
That's the process, I wish it could be faster for kernel panic fixes. But well, you have a patch, you can always apply it by yourself on vanilla kernel or on compat-wireless.

Comment 23 antonio montagnani 2011-09-05 11:33:02 UTC
Just for information, it happened also this morning after four days of operation:

can someone explain to this newbie why this bug appears randomly and generally after a short time of operation after start-up???

Tnx for help

Comment 24 Stanislaw Gruszka 2011-09-05 11:49:22 UTC
I don't know why, bugs have different nature and generally there are no rules with them. Antonio are you sure are you hitting this particular bug, i.e there is rate_control_get_rate in calltrace?

Comment 25 antonio montagnani 2011-09-05 12:21:02 UTC
Stan

I am quite sure to hit this bug, in particular I have messages similar to what is shown in the panic screenshot in the attachment by Jan.

To fully confirm it, I am waiting a new kernel panic and compare with it.

I hope to stress my system and cause such a panic ;-) but now it seems to be stable

Comment 26 Stanislaw Gruszka 2011-09-05 12:40:58 UTC
Perhaps you should rather try to compile compat-wireless with patch from comment 15, install it and see if kernel panic. See comment 3 how to get compat-wrieless, you will need also a patch from comment 7. 

Commands:

yum groupinstall "Development tools"
yum install kernel-devel

should be enough to install all tools needed for compilation.

Comment 27 antonio montagnani 2011-09-15 16:40:34 UTC
I proceeded to install updated drivers according to previous comments and I have to report a crash just during shut-down of my laptop.

I had no camera to get a copy of the screen as I was at work, but I remember some lines connected to iwl3945,legacy or alike.

Hope a new crash can be reported shortly

Do you have any news about updated kernel???

Comment 28 antonio montagnani 2011-09-15 16:47:29 UTC
Created attachment 523411 [details]
My Dmesg file after applying patches to wireless-compat

Comment 29 Stanislaw Gruszka 2011-09-16 07:37:42 UTC
(In reply to comment #27)
> Hope a new crash can be reported shortly
If you do not find other way to capture the logs, you may try kdump.

> Do you have any news about updated kernel???
kernel.org is down, so don't know if fix was applied or not.

Comment 30 antonio montagnani 2011-10-10 15:14:47 UTC
using 2.6.40.6-0.fc15.i686.PAE : since then not experiencing any oops (no patches applied)

waiting for final confirmation

Tnx