Description of problem: I ran the reproducer for bug 472752: "When running fsstress -v -d $DIR/tmp -n 1000 -p 1000 -l 2 on an ext2 filesystem with 1024 byte block size, on SMP i386 with 4096 byte page size over loopback to an image file on a tmpfs filesystem, ..." The system in question was an hp dl380g5 with 2GB of memory and 8 cpus. The tmpfs file system was created with a size of 1200M, and the ext2 image file was created by dd-ing /dev/zero into the tmpfs file system until all space was utilized. ------------[ cut here ]------------ kernel BUG at fs/mpage.c:417! invalid operand: 0000 [#1] SMP Modules linked in: md5 ipv6 parport_pc lp parport autofs4 sunrpc cpufreq_powersave loop joydev button battery ac ehci_hcd uhci_hcd i5000_edac edac_mc hw_random bnx2 dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod ata_piix libata cciss sd_mod scsi_mod CPU: 4 EIP: 0060:[<c017aed1>] Not tainted VLI EFLAGS: 00010202 (2.6.9-78.23.ELsmp) EIP is at mpage_writepage+0xd0/0x56f eax: 0000001c ebx: c140d560 ecx: 0000000c edx: 00000000 esi: d7a4ba70 edi: d7a4ba70 ebp: 00000000 esp: c8d8fdb4 ds: 007b es: 007b ss: 0068 Process fsstress (pid: 10153, threadinfo=c8d8f000 task=e1d72df0) Stack: 00152afe 00000000 00000000 00000000 00000000 00000000 00000000 00000004 00000001 00000001 00000004 0000000a e341ae50 e341af00 c0196e79 c140d560 00000000 00000082 c203ad80 c011d31c 0000001b 00000089 0000022b 00000006 Call Trace: [<c0196e79>] ext2_get_block+0x0/0x2cb [<c011d31c>] try_to_wake_up+0x28e/0x299 [<c01c5356>] radix_tree_gang_lookup_tag+0x3d/0x53 [<c014187b>] find_get_pages_tag+0x28/0x64 [<c01418b0>] find_get_pages_tag+0x5d/0x64 [<c017b569>] mpage_writepages+0x1f9/0x314 [<c0196e79>] ext2_get_block+0x0/0x2cb [<c0197ebb>] ext2_update_inode+0x30a/0x315 [<c01463fe>] do_writepages+0x19/0x27 [<c0179e70>] __sync_single_inode+0x5f/0x1bb [<c017a22d>] sync_sb_inodes+0x1a7/0x274 [<c017a44a>] sync_inodes_sb+0x72/0x83 [<c02df172>] __cond_resched+0x14/0x39 [<c017a51c>] sync_inodes+0x16/0x6c [<c015dd3a>] do_sync+0x11/0x55 [<c015dd88>] sys_sync+0xa/0xd [<c02e0c43>] syscall_call+0x7/0xb [<c02e007b>] __lock_text_end+0x60c/0x1071 Code: 8b 03 f6 c4 10 0f 84 f6 00 00 00 8b 03 f6 c4 10 75 08 0f 0b 9b 01 ec 60 2f c0 8b 44 24 3c 31 ed 8b 70 0c 89 f7 8b 07 a8 04 74 08 <0f> 0b a1 01 ec 60 2f c0 8b 07 a8 10 75 21 8b 07 a8 02 0f 85 1e <0>Fatal exception: panic in 5 seconds Kernel panic - not syncing: Fatal exception That bug is in mpage_writepage: if (page_has_buffers(page)) { struct buffer_head *head = page_buffers(page); struct buffer_head *bh = head; /* If they're all mapped and dirty, do it */ page_block = 0; do { BUG_ON(buffer_locked(bh)); <================== Version-Release number of selected component (if applicable): 2.6.9-78.23.ELsmp How reproducible: Fairly.
This may be the same bug as reported upstream: http://bugzilla.kernel.org/show_bug.cgi?id=10290
Created attachment 441110 [details] Proposed fix for the fs/mpage.c BUG The patch works because __block_prepare_write can start read requests in the first loop (the call to ll_rw_blk()) which will lock the buffer. the second loop then needs to use wait_on_buffer() to wait for the reads to complete. Without the patch, __block_prepare_write() could exit without running the second loop. With the reads still in progress this is a violation of the design and would cause a BUG since the buffers shouldn't be locked when __block_prepare_write() completes.
Customer Contact Name: Toshiyuki Okajima Description of Problem: On executing the fsstress utility the system panics at fs/mpage.c 417. Seems like a locked buffer_head is being written. The problem can be easily reproduced following the steps as given below on a X86_64 system running RHEL4. Originally the problem was reported on an IA 64, but it appears that problem is not restricted to IA64 and can happen on X86 too. To reproduce this issue, create an ext2 1GB filesystem with 1KiB block size and run the reproducer(fsstress utility from LTP project). This is easy to reproduce quickly. I have a VM with a 1GB disk backed by a ramdisk on the host machine (to make it fast). I formatted the disk with ext2 (although I'm not sure that matters) using a block size of 1K and ran fsstress with 1000 threads. It panicked in about 10 minutes. KERNEL: /root/vmlinux DUMPFILE: vmcore CPUS: 2 DATE: Wed Aug 25 04:22:14 2010 UPTIME: 00:20:38 LOAD AVERAGE: 1001.22, 886.50, 568.50 TASKS: 1057 NODENAME: localhost.localdomain RELEASE: 2.6.9-89.0.28.ELsmp VERSION: #1 SMP Thu Jul 22 18:14:41 EDT 2010 MACHINE: x86_64 (2660 Mhz) MEMORY: 1 GB PANIC: "" Kernel BUG at mpage:417 invalid operand: 0000 [1] SMP CPU 0 Modules linked in: md5 ipv6 parport_pc lp parport autofs4 i2c_dev i2c_core sunrpc ds yenta_socket pcmcia_core cpufreq_powersave ide_dump scsi_dump diskdump zlib_deflate button battery ac uhci_hcd virtio_pci virtio_ring virtio snd_ens1370 snd_rawmidi snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd_page_alloc snd_ak4531_codec snd soundcore 8139cp mii floppy dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod sym53c8xx scsi_transport_spi sd_mod scsi_mod Pid: 9018, comm: fsstress Not tainted 2.6.9-89.0.28.ELsmp RIP: 0010:[<ffffffff8019d36d>] <ffffffff8019d36d>{mpage_writepages+767} RSP: 0018:00000100278cfa18 EFLAGS: 00010202 RAX: 000000000000001c RBX: 0000000000000010 RCX: 000001001ed0dce8 RDX: 00000100265dd6a0 RSI: 00000000ffffffff RDI: 000001001ed0dce8 RBP: 000001003f5c2630 R08: 0000000000000001 R09: 0000000000000040 R10: 0000000000000246 R11: 0000000000001000 R12: 0000000000000000 R13: 0000000000000004 R14: 00000100278cfc68 R15: 0000000000000000 FS: 00000000006a3850(005b) GS:ffffffff80504f00(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000007fbfffcff8 CR3: 0000000000101000 CR4: 00000000000006e0 Process fsstress (pid: 9018, threadinfo 00000100278ce000, task 00000100184a8030) Stack: 00000100265dd5f8 0000000000129a57 0000000000000000 0000000000000000 000000001ed0dce8 0000000000000000 0000000a00000004 00000100265dd6a0 00000100265dd7c0 00000000265dd6a0 Call Trace:<ffffffff801bd29b>{ext2_get_block+0} <ffffffff8017db18>{__find_get_block_slow+255} <ffffffff801618b0>{__set_page_dirty_nobuffers+305} <ffffffff80180007>{__block_prepare_write+988} <ffffffff801bd29b>{ext2_get_block+0} <ffffffff80165c41>{__pagevec_lru_add+202} <ffffffff8015b9a4>{__filemap_fdatawrite_range+95} <ffffffff8015bd67>{filemap_write_and_wait+18} <ffffffff8015d998>{generic_file_direct_IO+47} <ffffffff8015da3d>{generic_file_direct_write+96} <ffffffff8015dd79>{__generic_file_aio_write_nolock+662} <ffffffff8015de90>{__generic_file_write_nolock+158} <ffffffff8018ac9b>{may_open+95} <ffffffff80184fea>{cp_new_stat+234} <ffffffff801361fe>{autoremove_wake_function+0} <ffffffff8015dfd3>{generic_file_write+103} <ffffffff8017c36e>{vfs_write+207} <ffffffff8017c456>{sys_write+69} <ffffffff801102de>{system_call+126} Code: 0f 0b 4a 1f 33 80 ff ff ff ff a1 01 8b 01 a8 10 75 1a 8b 01 RIP <ffffffff8019d36d>{mpage_writepages+767} RSP <00000100278cfa18> Those symbols above looked messed up but this backtrace looks okay: PID: 9018 TASK: 100184a8030 CPU: 0 COMMAND: "fsstress" #0 [100278cf840] start_disk_dump at ffffffffa01583ff #1 [100278cf870] try_crashdump at ffffffff8014cf15 #2 [100278cf880] die at ffffffff80111c7c #3 [100278cf8a0] do_invalid_op at ffffffff80112044 #4 [100278cf960] error_exit at ffffffff80110e09 [exception RIP: mpage_writepages+767] RIP: ffffffff8019d36d RSP: 00000100278cfa18 RFLAGS: 00010202 RAX: 000000000000001c RBX: 0000000000000010 RCX: 000001001ed0dce8 RDX: 00000100265dd6a0 RSI: 00000000ffffffff RDI: 000001001ed0dce8 RBP: 000001003f5c2630 R8: 0000000000000001 R9: 0000000000000040 R10: 0000000000000246 R11: 0000000000001000 R12: 0000000000000000 R13: 0000000000000004 R14: 00000100278cfc68 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #5 [100278cfa10] mpage_writepages at ffffffff8019d27d #6 [100278cfc20] __filemap_fdatawrite_range at ffffffff8015b9a4 #7 [100278cfcb0] filemap_write_and_wait at ffffffff8015bd67 #8 [100278cfcc0] generic_file_direct_IO at ffffffff8015d998 #9 [100278cfd00] generic_file_direct_write at ffffffff8015da3d #10 [100278cfd50] __generic_file_aio_write_nolock at ffffffff8015dd79 #11 [100278cfdd0] __generic_file_write_nolock at ffffffff8015de90 #12 [100278cfec0] generic_file_write at ffffffff8015dfd3 #13 [100278cff10] vfs_write at ffffffff8017c36e #14 [100278cff40] sys_write at ffffffff8017c456 #15 [100278cff80] system_call at ffffffff801102de RIP: 000000000041c310 RSP: 0000007fbffff548 RFLAGS: 00010202 RAX: 0000000000000001 RBX: ffffffff801102de RCX: 000000000041d7c0 RDX: 0000000000006000 RSI: 0000002a98001000 RDI: 0000000000000003 RBP: 0000000000000000 R8: 0000000000000000 R9: 0000000000200000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000405260 R13: 0000007fbffff6b0 R14: 0000000058ee6f6f R15: 0000000098000090 ORIG_RAX: 0000000000000001 CS: 0033 SS: 002b ----------- fs/mpage.c ----------- 388 static struct bio * 389 mpage_writepage(struct bio *bio, struct page *page, get_block_t get_block, 390 sector_t *last_block_in_bio, int *ret, struct writeback_control *wbc) 391 { 409 410 if (page_has_buffers(page)) { 411 struct buffer_head *head = page_buffers(page); 412 struct buffer_head *bh = head; 413 414 /* If they're all mapped and dirty, do it */ 415 page_block = 0; 416 do { 417 BUG_ON(buffer_locked(bh)); ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 418 if (!buffer_mapped(bh)) { Version-Release number of selected component: Red Hat Enterprise Linux Version Number: RHEL4 Release Number: 4.8 errata Kernel Version: 2.6.9-89.0.26.EL Drivers or hardware or architecture dependency: None. How reproducible: frequently Step to Reproduce: 1. On X86_64 system, create a 1GB ext2 filesystem with 1KiB block size. mkfs.ext2 -b 1024 /dev/sdb Mount the above filesytem 2. fsstress -d /mnt/test/fsstress -n 1000000 -p 1000 Actual Results: Panic. Expected Results: Not panic. Target Release:4.9 Errata Request: async errata for 4.8 Hotfix Request: No
Upstream patch for a similar issue. http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.12.y.git;a=commit;h=f3ddbdc6267c32223035ea9bb8456a2d86f65ba1
=== In Red Hat Customer Portal Case 00319222 === --- Comment by Oshiro, Moritoshi on 10/1/2010 11:14 AM --- We have opened 4.8.z async errata: bug #633968 .
Committed in 89.39.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/
Hi Jeffery, (In reply to comment #13) > Committed in 89.39.EL . RPMS are available at > http://people.redhat.com/vgoyal/rhel4/ Fujitsu has confirmed that the test kernel resolves the issue. Thanks! --- By executing my reproducer with 89.39.EL, I confirmed that any problems don't happen in 3 days. --- Best Rgards, Moritoshi
Moritoshi, Please send along our thanks to Fujitsu for testing. It is greatly appreciated!
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-0263.html