Bug 561762
Summary: | [abrt] crash in kernel (actually a WARNING) | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Jay Turner <jturner> |
Component: | kernel | Assignee: | John W. Linville <linville> |
Status: | CLOSED DUPLICATE | QA Contact: | desktop-bugs <desktop-bugs> |
Severity: | medium | Docs Contact: | |
Priority: | low | ||
Version: | 6.0 | CC: | arozansk, benl, cmeadors, emcnabb, h.stilmack, louisjohnread, mishu, reinette.chatre, srevivo, tburke, woodard |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | abrt_hash:489771501 | ||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2010-08-13 10:16:39 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 534148 |
Description
Jay Turner
2010-02-04 08:50:43 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux major release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Major release. This request is not yet committed for inclusion. *** Bug 572371 has been marked as a duplicate of this bug. *** Reinette, we are hitting this some in with our (2.6.32-based) iwlagn drivers in RHEL6. I think this relates to bringing the device down while a scan is pending? Do you have any suggestions for avoiding this WARNING? Please try: commit 2ef6e4440926668cfa9eac4b79e63528ebcbe0c1 Author: Johannes Berg <johannes> Date: Tue Oct 20 15:08:12 2009 +0900 mac80211: keep auth state when assoc fails When association fails, we should stay authenticated, which in mac80211 is represented by the existence of the mlme work struct, so we cannot free that, instead we need to just set it to idle. (Brought to you by the hacking session at Kernel Summit 2009 in Tokyo, Japan. -- JWL) Signed-off-by: Johannes Berg <johannes> Signed-off-by: John W. Linville <linville> commit 7400f42e9d765fa0656b432f3ab1245f9710f190 Author: Johannes Berg <johannes> Date: Sat Oct 31 07:40:37 2009 +0100 cfg80211: fix NULL ptr deref commit 211a4d12abf86fe0df4cd68fc6327cbb58f56f81 Author: Johannes Berg <johannes> Date: Tue Oct 20 15:08:53 2009 +0900 cfg80211: sme: deauthenticate on assoc failure introduced a potential NULL pointer dereference that some people have been hitting for some reason -- the params.bssid pointer is not guaranteed to be non-NULL for what seems to be a race between various ways of reaching the same thing. While I'm trying to analyse the problem more let's first fix the crash. I think the real fix may be to avoid doing _anything_ if it ended up being NULL, but right now I'm not sure yet. I think http://bugzilla.kernel.org/show_bug.cgi?id=14342 might also be this issue. Reported-by: Parag Warudkar <parag.lkml> Tested-by: Parag Warudkar <parag.lkml> Signed-off-by: Johannes Berg <johannes> Signed-off-by: John W. Linville <linville> Do you only get this single warning? Please see http://bugzilla.intellinuxwireless.org/show_bug.cgi?id=2134 where this same warning appeared together with a warning in ieee80211_scan_completed. The fix for this was: commit 6d3560d4fc9c5b9fe1a07a63926ea70512c69c32 Author: Johannes Berg <johannes> Date: Sat Oct 31 07:44:08 2009 +0100 mac80211: fix scan abort sanity checks Since sometimes mac80211 queues up a scan request to only act on it later, it must be allowed to (internally) cancel a not-yet-running scan, e.g. when the interface is taken down. This condition was missing since we always checked only the local->scanning variable which isn't yet set in that situation. Reported-by: Luis R. Rodriguez <mcgrof> Signed-off-by: Johannes Berg <johannes> Signed-off-by: John W. Linville <linville> Reinette, thanks for the suggestions but we already have those commits in the RHEL6 kernels. Jay and Tim, can you reliably trigger the WARNING in the kernel logs? Sadly, I can't. Just happened to get it that once while attempting to suspend. I'll play around with suspend/resume a bit today and see if I can come up with a reliable reproducer. I have just started seeing this warning upon resuming from suspend. I can reliably reproduce if this is any assistance. Thanks, John -- it might be. Reinette, I would welcome any further suggestions you might have to help pinpoint the issue. *** Bug 575486 has been marked as a duplicate of this bug. *** (In reply to comment #10) > Thanks, John -- it might be. Reinette, I would welcome any further suggestions > you might have to help pinpoint the issue. I do not have any other ideas to try out. Is it possible to gather more information? Since you can see this when you resume from suspend, could you please do the following: - please add a "dump_stack()" to the beginning of iee80211_scan_completed() so that we can know exactly who is calling it. - run iwlwifi with debugging of 0x43fff, which includes scanning debugging. Since you only see this when you resume from suspend you can enable the debugging before you suspend like so: # echo 0x43fff > /sys/class/net/wlanX/device/debug_level Reinette, I've built a kernel for Jay and asked him to test and provide the feedback you requested -- thanks! *** Bug 577199 has been marked as a duplicate of this bug. *** John Read, can you try the test kernels available here? http://people.redhat.com/linville/kernels/rhel6/ Bonus points if you try-out the yum repo for the installation... :-) In any case, do those kernels address the issue? I can perhaps try Sunday night or early next week. However, I am somewhat new to this, so are there instructions somewhere? Thanks, John Mostly just click on the link for the jwltest-release rpm to install it, then issue the command provided right below it. :-) Sorry I have not yet tested this... I was uncertain of the utility of doing so, as I can no longer reproduce the crash reliably. Let me know if you still want me to try, though it may be inconclusive since I cannot reproduce the error. Any testing would be welcome. :-) John Read, I'm terribly sorry but due to an internal policy decision I've been required to remove my test kernels from people.redhat.com. If you have not already installed the test kernel referenced above you will not be able to test it. I sincerely apologize for the confusion and for whatever inconvenience this might cause. I hope that we will find a way to address the issue you are experiencing but at this point I'm not sure how that will happen. :-( OK, the situation in comment 24 has been resolved. I again have test kernels available a the location from comment 19. The ones there now are equivalent to the -19.el6 kernels but w/ the Intel wireless drivers back-ported from 2.6.33. If anyone can reliably recreate the problem reported here then please give those kernels a try and report the results below -- thanks! John Linville, I will install the kernel this weekend. However, it may be some time before I can determine if the warning issue is resolved as I only experience it occasionally. One point of clarification -- do I install jwltest-release-6-2.noarch.rpm and the run the yum command as outlined on the link in comment 19? Regards, John Yes, precisely -- thanks! So as I read the code that creates the warning as shown in the original report here, this happens when a device is going down while a scan is pending. After the warning, the scan is aborted. My impression from the code is that life should go-on after that with only the log SPAM as a consequence. In other words, the device should be able to continue operation afterwards. Jay/Tim/John, is this not the case? jwltest.8 has "mac80211: fix deferred hardware scan requests", which I think specifically addresses the issue causing this warning: http://people.redhat.com/linville/kernels/rhel6/ Please give it a try, especially anyone that can reliable recreate this issue! *** Bug 582594 has been marked as a duplicate of this bug. *** *** Bug 585983 has been marked as a duplicate of this bug. *** *** Bug 589615 has been marked as a duplicate of this bug. *** *** Bug 589752 has been marked as a duplicate of this bug. *** Seeing this with 2.6.32-23.el6.x86_64 Shouldn't that patch that you posted on 4/21 be in the that kernel? I didn't remember seeing this with kernel-2.6.32-20.el6.sg11y_revert_drm.x86_64 but kernel-2.6.32-22.el6.x86_64 didn't work with my WiFI AP. 23 was the one that was supposed to fix the problems that I saw with 20 Define "should" -- that patch does not appear to be in -23 or -24 either. Patch 20100421180543.GB5557 isn't committed. No ACKs so far. And unless it gets rhel-6.0.0+, it won't be in beta2. http://patchwork.usersys.redhat.com/patch/24285 QA ping? OK, now what does it take to get rhel-6.0.0 set? Patch(es) available on kernel-2.6.32-25.el6 *** Bug 589494 has been marked as a duplicate of this bug. *** *** Bug 591678 has been marked as a duplicate of this bug. *** *** Bug 595516 has been marked as a duplicate of this bug. *** Just reproduced with 2.6.32-28.el6. Was that during a suspend/resume? Or some other activity? Every time that I have seen this was during a suspend/resume cycle. I tried reproducing this morning here at the office and the machine survived a series of 5 cycles. I will continue poking around, but still cannot nail it down to a specific reproducer. |