Bug 59248 - 2.4.9-any vs. md drivers triggers BUG()
Summary: 2.4.9-any vs. md drivers triggers BUG()
Alias: None
Product: Red Hat Linux
Classification: Retired
Component: kernel
Version: 7.2
Hardware: All
OS: Linux
Target Milestone: ---
Assignee: Stephen Tweedie
QA Contact: Brian Brock
Depends On:
TreeView+ depends on / blocked
Reported: 2002-02-03 22:07 UTC by Tim Pepper
Modified: 2007-04-18 16:39 UTC (History)
0 users

Clone Of:
Last Closed: 2003-06-08 01:05:49 UTC

Attachments (Terms of Use)

Description Tim Pepper 2002-02-03 22:07:34 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.7) Gecko/20020104

Description of problem:
Back in June 2001 there were some threads on linux-kernel about adding a
BH_Async flag to the kernel.  The code went into 2.4.10 and I got Alan to pick
it up in 2.4.10-ac4.  It needs back ported to your 2.4.9 kernels.

The problem is that 2.4.9 and earlier have end_buffer_io_async() (fs/buffer.c)
checking whether a page is in use partly based on the associated page's (or
first bh in page?) b_end_io function being end_buffer_io_async().  Anyway, this
assumption is not true for various md-type drivers which modify the bh's
b_end_io function.  The result after a period of async IO is the BUG() in
UnlockPage being triggered because page usage counts have become corrupted.

This patch:
fixes this problem.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1.  Load an md driver that nests the system b_end_io function pointer for
end_buffer_io_async() for buffer heads it processes behind it's own end IO handler.
2.  Run an IO load that triggers page (async) IO.
3.  Given enough time the kernel will eventually corrupt itself and in my
testing it always caught it in end_buffer_io_async()'s call to UnlockPage().

Actual Results:  BUG() in UnlockPage() as called from end_buffer_io_async()
triggers and the kernel panics.

Expected Results:  IO should succeed and kernel should not panic.

Additional info:

Comment 1 Stephen Tweedie 2002-02-08 22:12:01 UTC
The attached patch had a slight bug.  I've checked in a fixed version to our
local tree.

Do you want a test build to try it out with?

Comment 2 Tim Pepper 2002-02-09 02:27:27 UTC
I can try a test build for an i686/SMP.

Comment 3 Tim Pepper 2002-03-06 21:50:07 UTC
This appears to have shipped in 2.4.9-31's linux-2.4.9-assorted-bits.patch (if
not earlier?).  I'll test that kernel some, but the patch should've fixed the
issue as it did in the official kernel tree.

Comment 4 Tim Pepper 2002-03-13 16:55:54 UTC
FYI: The 2.4.9-31 kernel survives my testing now.  Looks like the bug is closable.

Note You need to log in before you can comment on or make changes to this bug.