Bug 480404 - kernel BUG at fs/mpage.c:417!
Summary: kernel BUG at fs/mpage.c:417!
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel
Version: 4.8
Hardware: All
OS: Linux
Target Milestone: rc
: ---
Assignee: Jeff Moyer
QA Contact: Boris Ranto
Depends On:
Blocks: 633968
TreeView+ depends on / blocked
Reported: 2009-01-16 20:51 UTC by Jeff Moyer
Modified: 2018-10-27 13:30 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Last Closed: 2011-02-16 15:52:37 UTC
Target Upstream Version:

Attachments (Terms of Use)

System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2011:0263 0 normal SHIPPED_LIVE Important: Red Hat Enterprise Linux 4.9 kernel security and bug fix update 2011-02-16 15:14:55 UTC

Description Jeff Moyer 2009-01-16 20:51:41 UTC
Description of problem:
I ran the reproducer for bug 472752:

    "When running fsstress -v -d $DIR/tmp -n 1000 -p 1000 -l 2
     on an ext2 filesystem with 1024 byte block size, on SMP i386
     with 4096 byte page size over loopback to an image file on a tmpfs 
     filesystem, ..."

The system in question was an hp dl380g5 with 2GB of memory and 8 cpus.  The tmpfs file system was created with a size of 1200M, and the ext2 image file was created by dd-ing /dev/zero into the tmpfs file system until all space was utilized.

------------[ cut here ]------------
kernel BUG at fs/mpage.c:417!
invalid operand: 0000 [#1]
Modules linked in: md5 ipv6 parport_pc lp parport autofs4 sunrpc
cpufreq_powersave loop joydev button battery ac ehci_hcd uhci_hcd i5000_edac
edac_mc hw_random bnx2 dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod ata_piix
libata cciss sd_mod scsi_mod
CPU:    4
EIP:    0060:[<c017aed1>]    Not tainted VLI
EFLAGS: 00010202   (2.6.9-78.23.ELsmp) 
EIP is at mpage_writepage+0xd0/0x56f
eax: 0000001c   ebx: c140d560   ecx: 0000000c   edx: 00000000
esi: d7a4ba70   edi: d7a4ba70   ebp: 00000000   esp: c8d8fdb4
ds: 007b   es: 007b   ss: 0068
Process fsstress (pid: 10153, threadinfo=c8d8f000 task=e1d72df0)
Stack: 00152afe 00000000 00000000 00000000 00000000 00000000 00000000 00000004 
       00000001 00000001 00000004 0000000a e341ae50 e341af00 c0196e79 c140d560 
       00000000 00000082 c203ad80 c011d31c 0000001b 00000089 0000022b 00000006 
Call Trace:
 [<c0196e79>] ext2_get_block+0x0/0x2cb
 [<c011d31c>] try_to_wake_up+0x28e/0x299
 [<c01c5356>] radix_tree_gang_lookup_tag+0x3d/0x53
 [<c014187b>] find_get_pages_tag+0x28/0x64
 [<c01418b0>] find_get_pages_tag+0x5d/0x64
 [<c017b569>] mpage_writepages+0x1f9/0x314
 [<c0196e79>] ext2_get_block+0x0/0x2cb
 [<c0197ebb>] ext2_update_inode+0x30a/0x315
 [<c01463fe>] do_writepages+0x19/0x27
 [<c0179e70>] __sync_single_inode+0x5f/0x1bb
 [<c017a22d>] sync_sb_inodes+0x1a7/0x274
 [<c017a44a>] sync_inodes_sb+0x72/0x83
 [<c02df172>] __cond_resched+0x14/0x39
 [<c017a51c>] sync_inodes+0x16/0x6c
 [<c015dd3a>] do_sync+0x11/0x55
 [<c015dd88>] sys_sync+0xa/0xd
 [<c02e0c43>] syscall_call+0x7/0xb
 [<c02e007b>] __lock_text_end+0x60c/0x1071
Code: 8b 03 f6 c4 10 0f 84 f6 00 00 00 8b 03 f6 c4 10 75 08 0f 0b 9b 01 ec 60
2f c0 8b 44 24 3c 31 ed 8b 70 0c 89 f7 8b 07 a8 04 74 08 <0f> 0b a1 01 ec 60 2f
c0 8b 07 a8 10 75 21 8b 07 a8 02 0f 85 1e 
 <0>Fatal exception: panic in 5 seconds
Kernel panic - not syncing: Fatal exception

That bug is in mpage_writepage:

        if (page_has_buffers(page)) {
                struct buffer_head *head = page_buffers(page);
                struct buffer_head *bh = head;

                /* If they're all mapped and dirty, do it */
                page_block = 0;
                do {
                        BUG_ON(buffer_locked(bh));    <==================

Version-Release number of selected component (if applicable):

How reproducible:

Comment 1 Jeff Moyer 2009-01-16 20:54:01 UTC
This may be the same bug as reported upstream:

Comment 2 Sumeet Gandhare 2010-08-26 05:07:10 UTC
Created attachment 441110 [details]
Proposed fix for the fs/mpage.c BUG

The patch works because __block_prepare_write can start read requests in the first loop (the call to ll_rw_blk()) which will lock the buffer.

the second loop then needs to use wait_on_buffer() to wait for the reads to complete.  Without the patch, __block_prepare_write() could exit without running the second loop.  With the reads still in progress this is a violation of the design and would cause a BUG since the buffers shouldn't be locked when __block_prepare_write() completes.

Comment 3 Sumeet Gandhare 2010-08-26 05:31:03 UTC
Customer Contact Name:
Toshiyuki Okajima

Description of Problem:

On executing the fsstress utility the system panics at fs/mpage.c 417. Seems
like a locked buffer_head is being written. The problem can be easily
reproduced following the steps as given below on a X86_64 system running
Originally the problem was reported on an IA 64, but it appears that problem is
not restricted to IA64 and can happen on X86 too.

To reproduce this issue, create an ext2 1GB filesystem with 1KiB block size and
run the reproducer(fsstress utility from LTP project). 

This is easy to reproduce quickly.  I have a VM with a 1GB disk backed by a
ramdisk on the host machine (to make it fast).  I formatted the disk with ext2
(although I'm not sure that matters) using a block size of 1K and ran fsstress
with 1000 threads.  It panicked in about 10 minutes.

     KERNEL: /root/vmlinux
   DUMPFILE: vmcore
       CPUS: 2
       DATE: Wed Aug 25 04:22:14 2010
     UPTIME: 00:20:38
LOAD AVERAGE: 1001.22, 886.50, 568.50
      TASKS: 1057
   NODENAME: localhost.localdomain
    RELEASE: 2.6.9-89.0.28.ELsmp
    VERSION: #1 SMP Thu Jul 22 18:14:41 EDT 2010
    MACHINE: x86_64  (2660 Mhz)
     MEMORY: 1 GB
      PANIC: ""

Kernel BUG at mpage:417
invalid operand: 0000 [1] SMP
Modules linked in: md5 ipv6 parport_pc lp parport autofs4 i2c_dev i2c_core
sunrpc ds yenta_socket pcmcia_core cpufreq_powersave ide_dump scsi_dump
diskdump zlib_deflate button battery ac uhci_hcd virtio_pci virtio_ring virtio
snd_ens1370 snd_rawmidi snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm
snd_timer snd_page_alloc snd_ak4531_codec snd soundcore 8139cp mii floppy
dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod sym53c8xx scsi_transport_spi
sd_mod scsi_mod
Pid: 9018, comm: fsstress Not tainted 2.6.9-89.0.28.ELsmp
RIP: 0010:[<ffffffff8019d36d>] <ffffffff8019d36d>{mpage_writepages+767}
RSP: 0018:00000100278cfa18  EFLAGS: 00010202
RAX: 000000000000001c RBX: 0000000000000010 RCX: 000001001ed0dce8
RDX: 00000100265dd6a0 RSI: 00000000ffffffff RDI: 000001001ed0dce8
RBP: 000001003f5c2630 R08: 0000000000000001 R09: 0000000000000040
R10: 0000000000000246 R11: 0000000000001000 R12: 0000000000000000
R13: 0000000000000004 R14: 00000100278cfc68 R15: 0000000000000000
FS:  00000000006a3850(005b) GS:ffffffff80504f00(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000007fbfffcff8 CR3: 0000000000101000 CR4: 00000000000006e0
Process fsstress (pid: 9018, threadinfo 00000100278ce000, task
Stack: 00000100265dd5f8 0000000000129a57 0000000000000000 0000000000000000
      000000001ed0dce8 0000000000000000 0000000a00000004 00000100265dd6a0
      00000100265dd7c0 00000000265dd6a0
Call Trace:<ffffffff801bd29b>{ext2_get_block+0}
      <ffffffff8018ac9b>{may_open+95} <ffffffff80184fea>{cp_new_stat+234}
      <ffffffff8017c36e>{vfs_write+207} <ffffffff8017c456>{sys_write+69}

Code: 0f 0b 4a 1f 33 80 ff ff ff ff a1 01 8b 01 a8 10 75 1a 8b 01
RIP <ffffffff8019d36d>{mpage_writepages+767} RSP <00000100278cfa18>

Those symbols above looked messed up but this backtrace looks okay:

PID: 9018   TASK: 100184a8030       CPU: 0   COMMAND: "fsstress"
#0 [100278cf840] start_disk_dump at ffffffffa01583ff
#1 [100278cf870] try_crashdump at ffffffff8014cf15
#2 [100278cf880] die at ffffffff80111c7c
#3 [100278cf8a0] do_invalid_op at ffffffff80112044
#4 [100278cf960] error_exit at ffffffff80110e09
   [exception RIP: mpage_writepages+767]
   RIP: ffffffff8019d36d  RSP: 00000100278cfa18  RFLAGS: 00010202
   RAX: 000000000000001c  RBX: 0000000000000010  RCX: 000001001ed0dce8
   RDX: 00000100265dd6a0  RSI: 00000000ffffffff  RDI: 000001001ed0dce8
   RBP: 000001003f5c2630   R8: 0000000000000001   R9: 0000000000000040
   R10: 0000000000000246  R11: 0000000000001000  R12: 0000000000000000
   R13: 0000000000000004  R14: 00000100278cfc68  R15: 0000000000000000
   ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
#5 [100278cfa10] mpage_writepages at ffffffff8019d27d
#6 [100278cfc20] __filemap_fdatawrite_range at ffffffff8015b9a4
#7 [100278cfcb0] filemap_write_and_wait at ffffffff8015bd67
#8 [100278cfcc0] generic_file_direct_IO at ffffffff8015d998
#9 [100278cfd00] generic_file_direct_write at ffffffff8015da3d
#10 [100278cfd50] __generic_file_aio_write_nolock at ffffffff8015dd79
#11 [100278cfdd0] __generic_file_write_nolock at ffffffff8015de90
#12 [100278cfec0] generic_file_write at ffffffff8015dfd3
#13 [100278cff10] vfs_write at ffffffff8017c36e
#14 [100278cff40] sys_write at ffffffff8017c456
#15 [100278cff80] system_call at ffffffff801102de
   RIP: 000000000041c310  RSP: 0000007fbffff548  RFLAGS: 00010202
   RAX: 0000000000000001  RBX: ffffffff801102de  RCX: 000000000041d7c0
   RDX: 0000000000006000  RSI: 0000002a98001000  RDI: 0000000000000003
   RBP: 0000000000000000   R8: 0000000000000000   R9: 0000000000200000
   R10: 0000000000000000  R11: 0000000000000246  R12: 0000000000405260
   R13: 0000007fbffff6b0  R14: 0000000058ee6f6f  R15: 0000000098000090
   ORIG_RAX: 0000000000000001  CS: 0033  SS: 002b 

388 static struct bio *
389 mpage_writepage(struct bio *bio, struct page *page, get_block_t get_block,
390         sector_t *last_block_in_bio, int *ret, struct writeback_control
391 {

410         if (page_has_buffers(page)) {
411                 struct buffer_head *head = page_buffers(page);
412                 struct buffer_head *bh = head;
414                 /* If they're all mapped and dirty, do it */
415                 page_block = 0;
416                 do {
417                         BUG_ON(buffer_locked(bh));
418                         if (!buffer_mapped(bh)) {

Version-Release number of selected component:
Red Hat Enterprise Linux Version Number:  RHEL4
Release Number:  4.8 errata

Kernel Version: 2.6.9-89.0.26.EL

Drivers or hardware or architecture dependency:


How reproducible:

Step to Reproduce:

1. On X86_64 system, create a 1GB ext2 filesystem with 1KiB block size.
mkfs.ext2 -b 1024 /dev/sdb

Mount the above filesytem 

2.  fsstress -d /mnt/test/fsstress -n 1000000 -p 1000

Actual Results:

Expected Results:
Not panic.

Target Release:4.9

Errata Request: async errata for 4.8

Hotfix Request: No

Comment 12 Moritoshi Oshiro 2010-10-01 02:14:28 UTC
=== In Red Hat Customer Portal Case 00319222 ===
--- Comment by Oshiro, Moritoshi on 10/1/2010 11:14 AM ---

We have opened 4.8.z async errata: bug #633968 .

Comment 13 Vivek Goyal 2010-10-05 15:48:28 UTC
Committed in 89.39.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/

Comment 14 Moritoshi Oshiro 2010-10-08 07:00:55 UTC
Hi Jeffery,

(In reply to comment #13)
> Committed in 89.39.EL . RPMS are available at
> http://people.redhat.com/vgoyal/rhel4/

Fujitsu has confirmed that the test kernel resolves the issue. Thanks!
By executing my reproducer with 89.39.EL, I confirmed that any problems don't happen in 3 days. 

Best Rgards,

Comment 15 Jeff Moyer 2010-10-08 13:07:19 UTC

Please send along our thanks to Fujitsu for testing.  It is greatly appreciated!

Comment 21 errata-xmlrpc 2011-02-16 15:52:37 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.


Note You need to log in before you can comment on or make changes to this bug.