Bug 188296 - tlb_clear_slave races with tlb_choose_channel
Summary: tlb_clear_slave races with tlb_choose_channel
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel
Version: 4.0
Hardware: All
OS: Linux
Target Milestone: ---
: ---
Assignee: Kimball Murray
QA Contact: Brian Brock
Depends On:
Blocks: 181409
TreeView+ depends on / blocked
Reported: 2006-04-07 18:03 UTC by Kimball Murray
Modified: 2007-11-30 22:07 UTC (History)
1 user (show)

Clone Of:
Last Closed: 2006-08-10 23:06:32 UTC

Attachments (Terms of Use)
patch for 2.6.9-34.14 (623 bytes, patch)
2006-04-07 18:03 UTC, Kimball Murray
no flags Details | Diff

External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2006:0575 normal SHIPPED_LIVE Important: Updated kernel packages available for Red Hat Enterprise Linux 4 Update 4 2006-08-10 04:00:00 UTC

Description Kimball Murray 2006-04-07 18:03:01 UTC
Description of problem:
tlb_clear_slave drops a lock before calling tlb_init_slave.  This allows a race
with tlb_choose_channel which might be trying to set "head", which
tlb_init_slave could be at that moment clearing.  The result is a corrupted hash

Version-Release number of selected component (if applicable):
Appears in U3 and earlier.

How reproducible:
Not too easy.  Need a PCI hotplug system that supports bonded network devices.

Steps to Reproduce:

I'm trying to dig this up.  The fellow who discovered the BUG in no longer

Actual results:
Oops resulting from hash table corruption.

Expected results:

Additional info:
This patch was submitted upstream, and first appears in kernel 2.6.16.  The git
details follow:

commit 5af47b2ff124fdad9ba84baeb9f7eeebeb227b43
tree 1085c636295cd3f9ade5611f9519d83731e27cdc
parent 9a6301c114aaab1df6de6fad9899bb89852a7592
author Jay Vosburgh <fubar@us.ibm.com> Mon, 09 Jan 2006 12:14:00 -0800
committer Jeff Garzik <jgarzik@pobox.com> Thu, 12 Jan 2006 16:35:39 -0500

    [PATCH] bonding: UPDATED hash-table corruption in bond_alb.c
    I believe I see the race Michael refers to (tlb_choose_channel
    may set head, which tlb_init_slave clears), although I was not able to
    reproduce it.  I have updated his patch for the current netdev-2.6.git
    tree and added a version update.  His original comment follows:
    Our systems have been crashing during testing of PCI HotPlug
    support in the various networking components.  We've faulted in
    the bonding driver due to a bug in bond_alb.c:tlb_clear_slave()
    In that routine, the last modification to the TLB hash table is
    made without protection of the lock, allowing a race that can lead
    tlb_choose_channel() to select an invalid table element.
    Signed-off-by: Jeff Garzik <jgarzik@pobox.com>

Comment 1 Kimball Murray 2006-04-07 18:03:01 UTC
Created attachment 127472 [details]
patch for 2.6.9-34.14

Comment 2 Jason Baron 2006-04-18 19:51:22 UTC
committed in stream U4 build 34.18. A test kernel with this patch is available
from http://people.redhat.com/~jbaron/rhel4/ However, there is a *serious* slab
corruption issue with this kernel, and thus it should not be released to
customers under any circumstances. I'll update this bug when the kernel is
stable again.

Comment 3 Jason Baron 2006-04-19 19:53:17 UTC
We've identified the corruption as specfic to x86-64 smp kernel builds 34.16 and
34.17. All other builds are safe for consumption.

Comment 6 Red Hat Bugzilla 2006-08-10 23:06:33 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.


Note You need to log in before you can comment on or make changes to this bug.