Bug 188296 - tlb_clear_slave races with tlb_choose_channel
tlb_clear_slave races with tlb_choose_channel
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel (Show other bugs)
4.0
All Linux
medium Severity medium
: ---
: ---
Assigned To: Kimball Murray
Brian Brock
:
Depends On:
Blocks: 181409
  Show dependency treegraph
 
Reported: 2006-04-07 14:03 EDT by Kimball Murray
Modified: 2007-11-30 17:07 EST (History)
1 user (show)

See Also:
Fixed In Version: RHSA-2006-0575
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2006-08-10 19:06:32 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
patch for 2.6.9-34.14 (623 bytes, patch)
2006-04-07 14:03 EDT, Kimball Murray
no flags Details | Diff

  None (edit)
Description Kimball Murray 2006-04-07 14:03:01 EDT
Description of problem:
tlb_clear_slave drops a lock before calling tlb_init_slave.  This allows a race
with tlb_choose_channel which might be trying to set "head", which
tlb_init_slave could be at that moment clearing.  The result is a corrupted hash
table.

Version-Release number of selected component (if applicable):
Appears in U3 and earlier.

How reproducible:
Not too easy.  Need a PCI hotplug system that supports bonded network devices.


Steps to Reproduce:

I'm trying to dig this up.  The fellow who discovered the BUG in no longer
reachable.


Actual results:
Oops resulting from hash table corruption.

Expected results:


Additional info:
This patch was submitted upstream, and first appears in kernel 2.6.16.  The git
details follow:

commit 5af47b2ff124fdad9ba84baeb9f7eeebeb227b43
tree 1085c636295cd3f9ade5611f9519d83731e27cdc
parent 9a6301c114aaab1df6de6fad9899bb89852a7592
author Jay Vosburgh <fubar@us.ibm.com> Mon, 09 Jan 2006 12:14:00 -0800
committer Jeff Garzik <jgarzik@pobox.com> Thu, 12 Jan 2006 16:35:39 -0500

    [PATCH] bonding: UPDATED hash-table corruption in bond_alb.c
    
    I believe I see the race Michael refers to (tlb_choose_channel
    may set head, which tlb_init_slave clears), although I was not able to
    reproduce it.  I have updated his patch for the current netdev-2.6.git
    tree and added a version update.  His original comment follows:
    
    Our systems have been crashing during testing of PCI HotPlug
    support in the various networking components.  We've faulted in
    the bonding driver due to a bug in bond_alb.c:tlb_clear_slave()
    
    In that routine, the last modification to the TLB hash table is
    made without protection of the lock, allowing a race that can lead
    tlb_choose_channel() to select an invalid table element.
    
    -J
    
    Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
Comment 1 Kimball Murray 2006-04-07 14:03:01 EDT
Created attachment 127472 [details]
patch for 2.6.9-34.14
Comment 2 Jason Baron 2006-04-18 15:51:22 EDT
committed in stream U4 build 34.18. A test kernel with this patch is available
from http://people.redhat.com/~jbaron/rhel4/ However, there is a *serious* slab
corruption issue with this kernel, and thus it should not be released to
customers under any circumstances. I'll update this bug when the kernel is
stable again.
Comment 3 Jason Baron 2006-04-19 15:53:17 EDT
We've identified the corruption as specfic to x86-64 smp kernel builds 34.16 and
34.17. All other builds are safe for consumption.
Comment 6 Red Hat Bugzilla 2006-08-10 19:06:33 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2006-0575.html

Note You need to log in before you can comment on or make changes to this bug.