BUG: unable to handle kernel NULL pointer dereference at 00000000000002c0 IP: [<ffffffffa031aae2>] rtl92ce_get_desc+0x12/0x50 [rtl8192ce] PGD 1e4943067 PUD 1ebdbd067 PMD 0 Oops: 0000 [#1] SMP CPU 1 Modules linked in: nls_utf8 udf crc_itu_t fuse lockd sunrpc rfcomm bnep tpm_bios ip6t_REJECT nf_conntrack_ipv6 nf_conntrack_ipv4 nf_defrag_ipv6 nf_defrag_ipv4 xt_state nf_conntrack ip6table_filter ip6_tables snd_hda_codec_hdmi snd_hda_codec_conexant arc4 coretemp kvm_intel kvm microcode snd_usb_audio snd_usbmidi_lib snd_rawmidi snd_seq_device i2c_i801 uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_core videodev btusb bluetooth media rtl8192ce(-) rtlwifi rtl8192c_common mac80211 lpc_ich snd_hda_intel mfd_core snd_hda_codec snd_hwdep snd_pcm snd_page_alloc snd_timer cfg80211 e1000e mei thinkpad_acpi snd soundcore rfkill uinput crc32c_intel ghash_clmulni_intel sdhci_pci sdhci mmc_core wmi i915 video i2c_algo_bit drm_kms_helper drm i2c_core [last unloaded: scsi_wait_scan] Pid: 5659, comm: rmmod Not tainted 3.5.2-3.fc17.x86_64 #1 LENOVO 4177CTO/4177CTO RIP: 0010:[<ffffffffa031aae2>] [<ffffffffa031aae2>] rtl92ce_get_desc+0x12/0x50 [rtl8192ce] RSP: 0018:ffff880126105b78 EFLAGS: 00010046 RAX: ffffffffa031c2a0 RBX: 00000000000002c0 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00000000000002c0 RBP: ffff880126105b78 R08: 0000000000000040 R09: ffff880215400000 R10: 000000000db55f01 R11: 0000000000000008 R12: ffff88021111bc00 R13: 0000000000000016 R14: ffff88020e9c9f20 R15: 0000000000000016 FS: 00007f0bc6a85740(0000) GS:ffff88021e240000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00000000000002c0 CR3: 00000001efc3a000 CR4: 00000000000407e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process rmmod (pid: 5659, threadinfo ffff880126104000, task ffff88020e79c530) Stack: ffff880126105ca8 ffffffffa0300bbd ffff880126105ca8 ffff88020e9ca200 ffff88020e9ccdd8 ffff88020db553c0 ffff88020e9c8560 000000400836d540 ffff880215802300 ffff880126105c20 ffff880126105c18 0000000000000000 Call Trace: [<ffffffffa0300bbd>] _rtl_pci_rx_interrupt+0x19d/0x640 [rtlwifi] [<ffffffffa0301c12>] _rtl_pci_interrupt+0x2d2/0x2f0 [rtlwifi] [<ffffffff810e3e09>] __free_irq+0x189/0x220 [<ffffffff810e3ef4>] free_irq+0x54/0xc0 [<ffffffffa0301f86>] rtl_pci_disconnect+0x196/0x1c0 [rtlwifi] [<ffffffff812f7c1f>] pci_device_remove+0x3f/0x110 [<ffffffff813b510c>] __device_release_driver+0x7c/0xe0 [<ffffffff813b59d8>] driver_detach+0xb8/0xc0 [<ffffffff813b4c32>] bus_remove_driver+0x92/0x110 [<ffffffff813b5ed2>] driver_unregister+0x62/0xa0 [<ffffffff812f73b4>] pci_unregister_driver+0x44/0xa0 [<ffffffffa031ab8c>] rtl92ce_driver_exit+0x10/0x484 [rtl8192ce] [<ffffffff810b8c6e>] sys_delete_module+0x16e/0x2d0 [<ffffffff81185d56>] ? filp_close+0x66/0xa0 [<ffffffff81614969>] system_call_fastpath+0x16/0x1b Code: 3f 00 00 81 e2 00 c0 ff ff 09 d0 89 07 c3 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 66 66 66 66 90 40 84 f6 74 12 84 d2 75 1e <8b> 07 5d c1 e8 1f c3 0f 1f 80 00 00 00 00 84 d2 74 ee 80 fa 05 RIP [<ffffffffa031aae2>] rtl92ce_get_desc+0x12/0x50 [rtl8192ce] RSP <ffff880126105b78> CR2: 00000000000002c0
Created attachment 607948 [details] Trial patch for oops on unload This oops appears to be from some kind of race condition where the interrupts are disabled too late in some instances. This patch should fix the condition. Please test.
Well, I'm sorry that this has sat for so long... Test kernels with the patch from comment 1 are building here: http://koji.fedoraproject.org/koji/taskinfo?taskID=4537582 When they finish, please try them and report the results back here...thanks!
Sorry it's taken so long for me to try this build. I went to the link in comment 2 and tried to download the RPMs to install, but I couldn't find any download links anywhere. Am I missing something?
Test builds expire after some passage of time. Please try to test these soon after the build completes: http://koji.fedoraproject.org/koji/taskinfo?taskID=4635944
I tried the test kernel and it still oopses. Stack trace looks exactly the same.
Are you still seeing this oops with the 3.6.10 or newer kernel updates?
Yes.
Sorry, but I am unable to duplicate this problem. Debugging will be difficult with Fedora needing to build the trials. That introduces such a long delay that it is difficult to keep my train of thought. The traceback looks as if there was an interrupt after the pci_device_remove. I'm really surprised that the trial patch did not work.
Jonathan, can you please post the full oops text you see with 3.6.10 or newer?
Created attachment 675195 [details] 3.6.10 Oops
Created attachment 675741 [details] Second trial patch for oops on unload This patch not only moves the interrupt disable as was done in the first one, but it also does a check for the dereference of a NULL pointer from the location where the oops actually happens.
Test kernels with the patch from comment 11 are building here: http://koji.fedoraproject.org/koji/taskinfo?taskID=4852875
Created attachment 676454 [details] 3.6.11-5.bz852761.2.fc17.x86_64 oops Still happening. Oops log attached.
Created attachment 676477 [details] Third trial patch for oops on unload Thanks for the quick testing of the 2nd patch. It seems that I misread the line that caused the oops. This time, the traceback pointed at a useless debug message that is failing because the device is stopping. This patch removes the offending debug output.
Test kernels with the patch from comment 14 are building here: http://koji.fedoraproject.org/koji/taskinfo?taskID=4860328
Created attachment 677691 [details] Still Oops'ing
I see the same with F18. THe reason that I did rmmod rtl8192ce is that I couldn't connect to the wireless (In the past this sometimes helped). Feb 12 20:40:19 x220 NetworkManager[830]: <info> (wlan0): supplicant interface state: authenticating -> disconnected Feb 12 20:40:19 x220 NetworkManager[830]: <info> (wlan0): supplicant interface state: disconnected -> scanning uname: Linux x220 3.7.4-204.fc18.x86_64 #1 SMP Wed Jan 23 16:44:29 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux lcpci: 03:00.0 Network controller: Realtek Semiconductor Co., Ltd. RTL8188CE 802.11b/g/n WiFi Adapter (rev 01)
(In reply to comment #17) > THe reason that I did rmmod rtl8192ce is that I couldn't connect to the > wireless > (In the past this sometimes helped). Me too. See bug 852737, which I filed last August and has had absolutely no activity since then.
Anything new on this? Still an issue? Will be engaged on this next few days.. Can someone try latest 3.8.11? Will check it out myself if I can get upstream working on my system.. till then. Where are you?
Still a problem with most recently released F18 kernel. Can't test anything newer than that right now.
Tag I'm it. Reproduced this on RHEL experimental driver yesterday, with: modprobe -r rtl8192ce Looking into it now. Nothing I've found upstream as yet, will be debugging it.
Ah, will start with Larry's fix in C14.. Missed that earlier.
John, I have never duplicated this oops using openSUSE/KDE/NetworkManager. I just ran about 50 loops of the following command: while [ 1 ] ; do sudo modprobe -rv rtl8192ce ; sleep 10 ; sudo modprobe -v rtl8192ce ; sleep 10 ; done In nearly every case, the wireless connection completed during the 10 second sleep after module loading, and it never generated any kernel fault messages. Larry
Hmm. I got to finish something a bit, later today I will check to see if my crash is same signature as above, but it quite reproducible here. Gotta be able to see it first, so good first step.. Code from C14 in place, still produces an issue.
I have duplicated this issue in my version of this driver, very repeatable. It contains patch C14, looking into it myself as well.
Sorry that I am unable to help you. Are you using kernel 3.5 as the OP did? I was testing with 3.10-rc1. Perhaps something changed in the mac80211 level in the interum, or there is a fundamental difference between Fedora and openSUSE 12.3 user code.
Larry: Just got new hardware to be able to test upstream stuff now. My work involves 3.5 version of mac80211, but 3.9+ driver (on RHEL but still very close). It may well be a difference in mac80211, lemme kick it around and see. Jonathan: >Still a problem with most recently released F18 kernel. Can't test anything >newer than that right now. Ok, can you at least give the output of uname -r of your system here? Thanks.
jik-thinkpad:~!999$ uname -a Linux jik-thinkpad 3.9.3-201.fc18.x86_64 #1 SMP Tue May 21 17:02:24 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
Thanks Jonathan, Hmm. Looks like it still may be an issue with 3.9.3 with uplevel mac80211. My system reproducing would seem to be validated (uplevel driver & v3.5 mac80211) a bit if so. Larry, I'll take a look at this on F18 soon and see if I can repo there. I get this exact signature..any other idea at this time?
This message is a reminder that Fedora 17 is nearing its end of life. Approximately 4 (four) weeks from now Fedora will stop maintaining and issuing updates for Fedora 17. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '17'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 17's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 17 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior to Fedora 17's end of life. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete.
Still broken in F18. Updating version.
(In reply to Jonathan Kamens from comment #31) > Still broken in F18. Updating version. Jonathan, Thanks for updating to F18. I took a quick look today and see a few updates out there but nothing strikes me as applicable right out. Can you post output of uname -r of kernel you tested? 3.9 does has update to vendor driver I see, would like to know you testing in F18: did you test just stock version? 3.9.x? This gives what I need. uname -r
I'm on Fedora 19 now. kernel-3.9.9-302.fc19.x86_64. The problem is still there.
*********** MASS BUG UPDATE ************** We apologize for the inconvenience. There is a large number of bugs to go through and several of them have gone stale. Due to this, we are doing a mass bug update across all of the Fedora 18 kernel bugs. Fedora 18 has now been rebased to 3.11.4-101.fc18. Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel. If you have moved on to Fedora 19, and are still experiencing this issue, please change the version to Fedora 19. If you experience different issues, please open a new bug report for those.
Still happening in F19 with 3.11.4-201.fc19.x86_64.
I meanwhile got another wifi card of ebay, but anyway this bug might well be hardware specific, so please report your exact hardware when reporting.
I haven't the bandwidth at the moment. I believe this issue is as follows (been a while back): driver exit is called the interrupt is disabled and released or so it seems during the release/disable, the ISR gets called, and dies in the Rx processing. It should be a straightforward fix to: ensure pending ISR are cleared and/or disabled, the ISR code needs to check for exit in process and back away if it is called. NACKing on capacity for a bit..Sorry for the delay here.
I think the analysis is correct. What I do not understand is why it happens on the OPs system and not mine. I ran a test of 2000 unload/load cycles on the module without a single failure. The shutdown routine disables interrupts as soon as it can, then does some other cleanups before freeing the irq. I suppose I could add in a delay in the middle, but that just seems like a band-aid. What would cause a pending interrupt to be delayed longer on one system than another?
Created attachment 834953 [details] Patch to fix problem The problem in rtl92c_get_desc() is fixed by checking for a NULL pointer to the descriptor. I still have no idea why this problem only happens with Fedora installations, and not for any others. There may be similar patches needed for the other PCI adapters in the rtlwifi tree. Now that I have an f19 setup, I can test.
Created attachment 835515 [details] A better patch The previous patch sometimes failed. This one ia more robust.
*********** MASS BUG UPDATE ************** We apologize for the inconvenience. There is a large number of bugs to go through and several of them have gone stale. Due to this, we are doing a mass bug update across all of the Fedora 19 kernel bugs. Fedora 19 has now been rebased to 3.12.6-200.fc19. Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel. If you have moved on to Fedora 20, and are still experiencing this issue, please change the version to Fedora 20. If you experience different issues, please open a new bug report for those.
Still broken in 3.12.6 in F20.
This bug is fixed by commit 9278db6279e28d4d433bc8a848e10b4ece8793ed in the wireless-testing tree. It has been pushed to the linux-net tree and should be in mainline before 3.13 is released. Once there, it will be added to all stable kernels. The patch is the same as the one listed in the attachments.
The patch Larry mentions is in 3.13-rc7, so once DaveM bundles things up it should hit stable shortly. We'll put this in POST for now, and hopefully it makes 3.12.7. If not, we'll apply ourselves. (Thanks Larry!)
3.12.7 is in updates-testing now.
Kernel in updates-testing appears to fix the issue.
*********** MASS BUG UPDATE ************** We apologize for the inconvenience. There is a large number of bugs to go through and several of them have gone stale. Due to this, we are doing a mass bug update across all of the Fedora 20 kernel bugs. Fedora 20 has now been rebased to 3.13.4-200.fc20. Please test this kernel update and let us know if you issue has been resolved or if it is still present with the newer kernel. If you experience different issues, please open a new bug report for those.
Still a problem.
Does your kernel contain the patch mentioned in comment 43?
I have no idea. I am using the stock Fedora 20 kernel that I just got with "yum update" this morning, 3.13.3-201.fc20.x86_64.
Patch in comment 43 appears in kernel 3.12 upstream, appears to be in the kernel in C50, at least at a quick look.
The patch was merged between 3.13-rc4 and 3.13-rc5. It has to be in Jonathon's 3.13.3-201.fc20.x86_64. I need to see the kernel splat that is output.
Created attachment 867126 [details] 3.13.3-201.fc20.x86_64 kernel splat
What are the exact steps you are doing? The reason I ask is that I installed F20 and 3.13.3-201.fc20.x86_64, and configured a wireless network on an RTL8188CE using rtl8192ce. When I used 'sudo modprobe -rv rtl8192ce', it unloaded just the way I would expect - no kernel oops.
"rmmod rtl8192ce". That's it.
Whenever a module has dependent modules, "modprobe -r" will remove everything just as modprobe without the -r will load everything. As rtl8192ce has rtl8192-common, rtlwifi, and pci as dependent modules, modprobe is definitely preferable. I don't have the device that uses rtl8192ce in a computer right now so I cannot tell if rmmod will error here.
I thought rmmod would refuse to remove modules with other modules dependent on them? I thought I'd seen errors in the past where I tried to remove a module and it refused to let me because other modules were dependent on it. I don't get the Oops when I use modprobe -r, but regardless of whether that's the case, is it really ok for rmmod of a module to cause an Oops?
No it should not, and the interrupt should have been disabled *before* rtl8192ce was removed. No matter which command you use, the dependent module cannot be removed until the one using it is removed. In other words, 'modprobe -r rtl8192ce' is the same as: rmmod rtl8192ce.ko rmmod rtl8192c-common.ko rmmod rtlwifi.ko rmmod rtl_pci.ko They do have to be removed in that order.
*********** MASS BUG UPDATE ************** We apologize for the inconvenience. There is a large number of bugs to go through and several of them have gone stale. Due to this, we are doing a mass bug update across all of the Fedora 20 kernel bugs. Fedora 20 has now been rebased to 3.14.4-200.fc20. Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel. If you experience different issues, please open a new bug report for those.
*********** MASS BUG UPDATE ************** This bug is being closed with INSUFFICIENT_DATA as there has not been a response in 4 weeks. If you are still experiencing this issue, please reopen and attach the relevant data from the latest kernel you are running and any data that might have been requested previously.