Bug 59248 - 2.4.9-any vs. md drivers triggers BUG()
2.4.9-any vs. md drivers triggers BUG()
Status: CLOSED CURRENTRELEASE
Product: Red Hat Linux
Classification: Retired
Component: kernel (Show other bugs)
7.2
All Linux
medium Severity high
: ---
: ---
Assigned To: Stephen Tweedie
Brian Brock
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2002-02-03 17:07 EST by Tim Pepper
Modified: 2007-04-18 12:39 EDT (History)
0 users

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2003-06-07 21:05:49 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Tim Pepper 2002-02-03 17:07:34 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.7) Gecko/20020104

Description of problem:
Back in June 2001 there were some threads on linux-kernel about adding a
BH_Async flag to the kernel.  The code went into 2.4.10 and I got Alan to pick
it up in 2.4.10-ac4.  It needs back ported to your 2.4.9 kernels.

The problem is that 2.4.9 and earlier have end_buffer_io_async() (fs/buffer.c)
checking whether a page is in use partly based on the associated page's (or
first bh in page?) b_end_io function being end_buffer_io_async().  Anyway, this
assumption is not true for various md-type drivers which modify the bh's
b_end_io function.  The result after a period of async IO is the BUG() in
UnlockPage being triggered because page usage counts have become corrupted.

This patch:
http://mirror.csit.fsu.edu/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.6pre5aa1/00_bh-async-1
fixes this problem.

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1.  Load an md driver that nests the system b_end_io function pointer for
end_buffer_io_async() for buffer heads it processes behind it's own end IO handler.
2.  Run an IO load that triggers page (async) IO.
3.  Given enough time the kernel will eventually corrupt itself and in my
testing it always caught it in end_buffer_io_async()'s call to UnlockPage().
	

Actual Results:  BUG() in UnlockPage() as called from end_buffer_io_async()
triggers and the kernel panics.

Expected Results:  IO should succeed and kernel should not panic.

Additional info:
Comment 1 Stephen Tweedie 2002-02-08 17:12:01 EST
The attached patch had a slight bug.  I've checked in a fixed version to our
local tree.

Do you want a test build to try it out with?
Comment 2 Tim Pepper 2002-02-08 21:27:27 EST
I can try a test build for an i686/SMP.
Comment 3 Tim Pepper 2002-03-06 16:50:07 EST
This appears to have shipped in 2.4.9-31's linux-2.4.9-assorted-bits.patch (if
not earlier?).  I'll test that kernel some, but the patch should've fixed the
issue as it did in the official kernel tree.
Comment 4 Tim Pepper 2002-03-13 11:55:54 EST
FYI: The 2.4.9-31 kernel survives my testing now.  Looks like the bug is closable.

Note You need to log in before you can comment on or make changes to this bug.