Red Hat Bugzilla – Bug 188296
tlb_clear_slave races with tlb_choose_channel
Last modified: 2007-11-30 17:07:24 EST
Description of problem:
tlb_clear_slave drops a lock before calling tlb_init_slave. This allows a race
with tlb_choose_channel which might be trying to set "head", which
tlb_init_slave could be at that moment clearing. The result is a corrupted hash
Version-Release number of selected component (if applicable):
Appears in U3 and earlier.
Not too easy. Need a PCI hotplug system that supports bonded network devices.
Steps to Reproduce:
I'm trying to dig this up. The fellow who discovered the BUG in no longer
Oops resulting from hash table corruption.
This patch was submitted upstream, and first appears in kernel 2.6.16. The git
author Jay Vosburgh <firstname.lastname@example.org> Mon, 09 Jan 2006 12:14:00 -0800
committer Jeff Garzik <email@example.com> Thu, 12 Jan 2006 16:35:39 -0500
[PATCH] bonding: UPDATED hash-table corruption in bond_alb.c
I believe I see the race Michael refers to (tlb_choose_channel
may set head, which tlb_init_slave clears), although I was not able to
reproduce it. I have updated his patch for the current netdev-2.6.git
tree and added a version update. His original comment follows:
Our systems have been crashing during testing of PCI HotPlug
support in the various networking components. We've faulted in
the bonding driver due to a bug in bond_alb.c:tlb_clear_slave()
In that routine, the last modification to the TLB hash table is
made without protection of the lock, allowing a race that can lead
tlb_choose_channel() to select an invalid table element.
Signed-off-by: Jeff Garzik <firstname.lastname@example.org>
Created attachment 127472 [details]
patch for 2.6.9-34.14
committed in stream U4 build 34.18. A test kernel with this patch is available
from http://people.redhat.com/~jbaron/rhel4/ However, there is a *serious* slab
corruption issue with this kernel, and thus it should not be released to
customers under any circumstances. I'll update this bug when the kernel is
We've identified the corruption as specfic to x86-64 smp kernel builds 34.16 and
34.17. All other builds are safe for consumption.
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.