Bug 472752 - BUG() in end_buffer_async_write()
BUG() in end_buffer_async_write()
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel (Show other bugs)
4.7
All Linux
medium Severity medium
: rc
: ---
Assigned To: Jeff Moyer
Eryu Guan
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2008-11-24 07:00 EST by Bryn M. Reeves
Modified: 2011-02-16 10:34 EST (History)
6 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2011-02-16 10:34:20 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Bryn M. Reeves 2008-11-24 07:00:03 EST
Description of problem:
There is a race in 2.6.9's handling of bh submission in __block_write_full_page() that will allow more than one request to be submitted for a given bh at the same time. This causes end_buffer_async_write() to blow up on the following bug:

 685         struct buffer_head *tmp;
 686         struct page *page;
 687 
 688         BUG_ON(!buffer_async_write(bh));
 689 

This happens in the 2nd invocation of end_buffer_async_write() as the 1st caller has already cleared the bit later on in the same function:

 705         spin_lock_irqsave(&page_uptodate_lock, flags);
 706         clear_buffer_async_write(bh);
 707         unlock_buffer(bh);
 708         tmp = bh->b_this_page;

Version-Release number of selected component (if applicable):
Originally reported on 2.6.9-34.EL but confirmed to also affect later kernels.

How reproducible:
Difficult. Seen in production at a couple of sites every few months.

Steps to Reproduce:
Nick Piggin posted a reproducer to LKML but we've been unable to trigger the problem reliably with this test case:


    "When running fsstress -v -d $DIR/tmp -n 1000 -p 1000 -l 2
     on an ext2 filesystem with 1024 byte block size, on SMP i386
     with 4096 byte page size over loopback to an image file on a tmpfs 
     filesystem, I would very quickly hit

         BUG_ON(!buffer_async_write(bh));

     in fs/buffer.c:end_buffer_async_write"
  
Actual results:
BUG

Expected results:
No BUG.

Additional info:
Discussed on LKML here:

http://lkml.org/lkml/2005/4/24/145 [[patch] __block_write_full_page bug]
http://lkml.org/lkml/2005/4/27/85  [[patch] fix the 2nd buffer race properly]
http://lkml.org/lkml/2005/4/30/181 [[patch] alternative fix for VFS race]
Comment 1 Bryn M. Reeves 2008-11-24 07:02:25 EST
This was fixed upstream with the following three commits:

commit f0fbd5fc09b20f7ba7bc8c80be33e39925bb38e1
Author: Andrew Morton <akpm@osdl.org>
Date:   Thu May 5 16:15:48 2005 -0700

    [PATCH] __block_write_full_page() simplification
    
    The `last_bh' logic probably isn't worth much.  In those situations where only
    the front part of the page is being written out we will save some looping but
    in the vastly more common case of an all-page writeout if just adds more code.
    
    Nick Piggin <nickpiggin@yahoo.com.au>
    
    Signed-off-by: Andrew Morton <akpm@osdl.org>
    Signed-off-by: Linus Torvalds <torvalds@osdl.org>

commit 05937baae9fc27b64bcd4378da7d2b14edf7931c
Author: Andrew Morton <akpm@osdl.org>
Date:   Thu May 5 16:15:47 2005 -0700

    [PATCH] __block_write_full_page speedup
    
    Remove all those get_bh()'s and put_bh()'s by extending lock_page() to cover
    the troublesome regions.
    
    (get_bh() and put_bh() happen every time whereas contention on a page's lock
    in there happens basically never).
    
    Cc: Nick Piggin <nickpiggin@yahoo.com.au>
    Signed-off-by: Andrew Morton <akpm@osdl.org>
    Signed-off-by: Linus Torvalds <torvalds@osdl.org>

commit ad576e63e0c8b274a8558b8e05a6d0526e804dc0
Author: Nick Piggin <nickpiggin@yahoo.com.au>
Date:   Thu May 5 16:15:46 2005 -0700

    [PATCH] __block_write_full_page race fix
    
    When running
        fsstress -v -d $DIR/tmp -n 1000 -p 1000 -l 2
    on an ext2 filesystem with 1024 byte block size, on SMP i386 with 4096 byte
    page size over loopback to an image file on a tmpfs filesystem, I would
    very quickly hit
        BUG_ON(!buffer_async_write(bh));
    in fs/buffer.c:end_buffer_async_write
    
    It seems that more than one request would be submitted for a given bh
    at a time.
    
    What would happen is the following:
    2 threads doing __mpage_writepages on the same page.
    Thread 1 - lock the page first, and enter __block_write_full_page.
    Thread 1 - (eg.) mark_buffer_async_write on the first 2 buffers.
    Thread 1 - set page writeback, unlock page.
    Thread 2 - lock page, wait on page writeback
    Thread 1 - submit_bh on the first 2 buffers.
    => both requests complete, none of the page buffers are async_write,
       end_page_writeback is called.
    Thread 2 - wakes up. enters __block_write_full_page.
    Thread 2 - mark_buffer_async_write on (eg.) the last buffer
    Thread 1 - finds the last buffer has async_write set, submit_bh on that.
    Thread 2 - submit_bh on the last buffer.
    => oops.
    
    So change __block_write_full_page to explicitly keep track of the last bh
    we need to issue, so we don't touch anything after issuing the last
    request.
    
    Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au>
    Signed-off-by: Andrew Morton <akpm@osdl.org>
    Signed-off-by: Linus Torvalds <torvalds@osdl.org>

ad576e63e0c8b274a8558b8e05a6d0526e804dc0 is the fix, the remaining two are cleanups of the surrounding code.
Comment 6 Jeff Moyer 2009-01-15 16:07:54 EST
I grabbed a test machine to try to reproduce this, and I hit a different BUG:

------------[ cut here ]------------
kernel BUG at fs/mpage.c:417!
invalid operand: 0000 [#1]
SMP 
Modules linked in: md5 ipv6 parport_pc lp parport autofs4 sunrpc cpufreq_powersave loop joydev button battery ac ehci_hcd uhci_hcd i5000_edac edac_mc hw_random bnx2 dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod ata_piix libata cciss sd_mod scsi_mod
CPU:    4
EIP:    0060:[<c017aed1>]    Not tainted VLI
EFLAGS: 00010202   (2.6.9-78.23.ELsmp) 
EIP is at mpage_writepage+0xd0/0x56f
eax: 0000001c   ebx: c140d560   ecx: 0000000c   edx: 00000000
esi: d7a4ba70   edi: d7a4ba70   ebp: 00000000   esp: c8d8fdb4
ds: 007b   es: 007b   ss: 0068
Process fsstress (pid: 10153, threadinfo=c8d8f000 task=e1d72df0)
Stack: 00152afe 00000000 00000000 00000000 00000000 00000000 00000000 00000004 
       00000001 00000001 00000004 0000000a e341ae50 e341af00 c0196e79 c140d560 
       00000000 00000082 c203ad80 c011d31c 0000001b 00000089 0000022b 00000006 
Call Trace:
 [<c0196e79>] ext2_get_block+0x0/0x2cb
 [<c011d31c>] try_to_wake_up+0x28e/0x299
 [<c01c5356>] radix_tree_gang_lookup_tag+0x3d/0x53
 [<c014187b>] find_get_pages_tag+0x28/0x64
 [<c01418b0>] find_get_pages_tag+0x5d/0x64
 [<c017b569>] mpage_writepages+0x1f9/0x314
 [<c0196e79>] ext2_get_block+0x0/0x2cb
 [<c0197ebb>] ext2_update_inode+0x30a/0x315
 [<c01463fe>] do_writepages+0x19/0x27
 [<c0179e70>] __sync_single_inode+0x5f/0x1bb
 [<c017a22d>] sync_sb_inodes+0x1a7/0x274
 [<c017a44a>] sync_inodes_sb+0x72/0x83
 [<c02df172>] __cond_resched+0x14/0x39
 [<c017a51c>] sync_inodes+0x16/0x6c
 [<c015dd3a>] do_sync+0x11/0x55
 [<c015dd88>] sys_sync+0xa/0xd
 [<c02e0c43>] syscall_call+0x7/0xb
 [<c02e007b>] __lock_text_end+0x60c/0x1071
Code: 8b 03 f6 c4 10 0f 84 f6 00 00 00 8b 03 f6 c4 10 75 08 0f 0b 9b 01 ec 60 2f c0 8b 44 24 3c 31 ed 8b 70 0c 89 f7 8b 07 a8 04 74 08 <0f> 0b a1 01 ec 60 2f c0 8b 07 a8 10 75 21 8b 07 a8 02 0f 85 1e 
 <0>Fatal exception: panic in 5 seconds
Kernel panic - not syncing: Fatal exception

That bug is in mpage_writepage:

        if (page_has_buffers(page)) {
                struct buffer_head *head = page_buffers(page);
                struct buffer_head *bh = head;

                /* If they're all mapped and dirty, do it */
                page_block = 0;
                do {
                        BUG_ON(buffer_locked(bh));    <==================

This could be related, but I'll have to think about it a bit more.  I'll retry with a patched kernel to make sure the issue has been resolved.
Comment 7 Jeff Moyer 2009-01-15 16:54:32 EST
I re-ran the test with a kernel that was patched with the above 3 commits and I still get this failure.  I'll continue investigating.
Comment 8 Jeff Moyer 2009-01-16 15:52:19 EST
I opened up a separate bugzilla (480404) for the problem I ran into.  Once that's fixed, hopefully I can reproduce the bug reported here.
Comment 15 RHEL Product and Program Management 2010-10-12 13:10:38 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.
Comment 16 Vivek Goyal 2010-10-15 09:26:51 EDT
Committed in 89.44.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/
Comment 20 Eryu Guan 2011-01-23 22:29:09 EST
This is hard to reproduce, I followed steps in comment 0 and let it run for 8 hours and on different hosts, I can hit bug 480404 but not this one.

Ran reproducer on 2.6.9-96 kernel over the weekend(about 60 hours) and no issue was found.

linux-2.6.9-fs-buffer-__block_write_full_page-race-fix.patch
linux-2.6.9-fs-buffer-__block_write_full_page-speedup-by-removin.pattch
linux-2.6.9-fs-buffer-__block_write_full_page-simplification-by-.pattch
are applied in 2.6.9-96 kernel correctly.

Set it to SanityOnly
Comment 21 errata-xmlrpc 2011-02-16 10:34:20 EST
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0263.html

Note You need to log in before you can comment on or make changes to this bug.