Bug 808095

Summary: kernel crash at ieee80211_mgd_probe_ap_send
Product: Red Hat Enterprise Linux 6 Reporter: Stanislaw Gruszka <sgruszka>
Component: kernelAssignee: Stanislaw Gruszka <sgruszka>
Status: CLOSED DUPLICATE QA Contact: Desktop QE <desktop-qa-list>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 6.2CC: dhoward, fhrbata, hesemeyt, linville, plyons, pvine, tpelka, walicki
Target Milestone: rcKeywords: ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
While doing wireless roaming, under stressed conditions, an error could occur in the ieee80211_mgd_probe_ap_send() function and cause a kernel panic. With this update, the mac80211 MLME (MAC Layer Management Entity) code has been rewritten, thus fixing this bug.
Story Points: ---
Clone Of:
: 814674 (view as bug list) Environment:
Last Closed: 2012-04-20 13:43:40 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 814657, 814674, 820167    
Attachments:
Description Flags
20120326_kernel_panic_1523.jpg
none
mac80211-del-timers-on-disassocs.patch
none
kernel.panic.jpg none

Description Stanislaw Gruszka 2012-03-29 15:01:29 UTC

Comment 2 Stanislaw Gruszka 2012-03-29 15:04:30 UTC
Created attachment 573698 [details]
20120326_kernel_panic_1523.jpg

Kernel crash photo taken by customer.

Comment 3 Stanislaw Gruszka 2012-04-05 14:33:07 UTC
Created attachment 575452 [details]
mac80211-del-timers-on-disassocs.patch

Proposed fix.

Comment 5 RHEL Program Management 2012-04-13 09:00:17 UTC
This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux maintenance release. Product Management has 
requested further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed 
products. This request is not yet committed for inclusion in an Update release.

Comment 6 Stanislaw Gruszka 2012-04-13 10:06:44 UTC
This panic can be reproduced by configuring one ESS wireless network on two or more APs. Each AP on different channel (simple description for linksys routers is here: http://homecommunity.cisco.com/t5/Access-Points/WRT610N-setup-as-access-point/td-p/321732 ). And then use scripts to roam between APs: http://bombadil.infradead.org/~mcgrof/test-roam

Comment 10 John W. Linville 2012-04-18 17:24:54 UTC
So the fix in comment 3 is similar to code in upstream commit b9dcf712, which should be covered by the other wireless updates in rhel 6.3.  Should either commit b9dcf712 or the fix from comment 3 be proposed as a 6.2.z fix as well?

Comment 11 Stanislaw Gruszka 2012-04-19 09:40:21 UTC
Yes, the patch should be backported to 6.2.z .

Unfortunately I'm still seeing that crash, however it's harder to reproduce it with a patch. Anyway some more fixes are needed to address this bug.

Comment 12 Stanislaw Gruszka 2012-04-20 10:21:58 UTC
Remaining crashes I'm observing while doing roaming (as described in comment 6) is technically different issue. I'll open a new bug report for it. Let's proceed with fixfrom this bug report, it make roaming issues less frequent in real world environment.

Comment 15 Stanislaw Gruszka 2012-04-20 13:43:40 UTC
Since fix only mitigate the problem, it does not make sense of doing a QE for it. I'm closing this as duplicate of RHEL6.3 wireless update. Complete fix for wireless roaming crashes are expected to be done as patch for bug 814674.

We opened bug 814657 for applying this particular fix in RHEL-6.2.z

*** This bug has been marked as a duplicate of bug 766952 ***

Comment 16 Stanislaw Gruszka 2012-04-25 11:57:39 UTC
Created attachment 580142 [details]
kernel.panic.jpg

Patch make problem less reproducible, but issue is not fixed. Here is another photo with the crash. Hopefully fix for bug 814674, will make it gone for sure.

Comment 17 Tomas Capek 2012-06-14 10:25:52 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
While doing wireless roaming, under stressed conditions, an error could occur in the ieee80211_mgd_probe_ap_send() function and cause a kernel panic. With this update, the mac80211 MLME (MAC Layer Management Entity) code has been rewritten, thus fixing this bug.