Red Hat Bugzilla – Bug 59248
2.4.9-any vs. md drivers triggers BUG()
Last modified: 2007-04-18 12:39:42 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.7) Gecko/20020104
Description of problem:
Back in June 2001 there were some threads on linux-kernel about adding a
BH_Async flag to the kernel. The code went into 2.4.10 and I got Alan to pick
it up in 2.4.10-ac4. It needs back ported to your 2.4.9 kernels.
The problem is that 2.4.9 and earlier have end_buffer_io_async() (fs/buffer.c)
checking whether a page is in use partly based on the associated page's (or
first bh in page?) b_end_io function being end_buffer_io_async(). Anyway, this
assumption is not true for various md-type drivers which modify the bh's
b_end_io function. The result after a period of async IO is the BUG() in
UnlockPage being triggered because page usage counts have become corrupted.
fixes this problem.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Load an md driver that nests the system b_end_io function pointer for
end_buffer_io_async() for buffer heads it processes behind it's own end IO handler.
2. Run an IO load that triggers page (async) IO.
3. Given enough time the kernel will eventually corrupt itself and in my
testing it always caught it in end_buffer_io_async()'s call to UnlockPage().
Actual Results: BUG() in UnlockPage() as called from end_buffer_io_async()
triggers and the kernel panics.
Expected Results: IO should succeed and kernel should not panic.
The attached patch had a slight bug. I've checked in a fixed version to our
Do you want a test build to try it out with?
I can try a test build for an i686/SMP.
This appears to have shipped in 2.4.9-31's linux-2.4.9-assorted-bits.patch (if
not earlier?). I'll test that kernel some, but the patch should've fixed the
issue as it did in the official kernel tree.
FYI: The 2.4.9-31 kernel survives my testing now. Looks like the bug is closable.