Bug 469582

Summary: Kernel 2.6.27.4-19.fc9.x86_64 crash ext4 filesystem
Product: [Fedora] Fedora Reporter: Mihai Harpau <mishu>
Component: kernelAssignee: Eric Sandeen <esandeen>
Status: CLOSED NEXTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: medium    
Version: 9CC: esandeen, kernel-maint, quintela, tytso
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: kernel-2.6.27.5-37.fc9 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-11-14 16:57:49 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
full crash log
none
Proposed patch for the problem reported here. none

Description Mihai Harpau 2008-11-02 21:40:24 UTC
Created attachment 322238 [details]
full crash log

Description of problem:

After a few hours of running I see this crash in log:
Nov  2 22:20:53 taz kernel: __jbd2_log_wait_for_space: no transactions
Nov  2 22:20:53 taz kernel: Aborting journal on device dm-0:8.
Nov  2 22:20:57 taz kernel: ext4_abort called.
Nov  2 22:20:57 taz kernel: EXT4-fs error (device dm-0): ext4_journal_start_sb: Detected aborted journal
Nov  2 22:20:57 taz kernel: Remounting filesystem read-only
Nov  2 22:20:57 taz kernel: ext4_da_writepages: jbd2_start: 1024 pages, ino 7113326; err -30
Nov  2 22:20:57 taz kernel: Pid: 224, comm: pdflush Not tainted 2.6.27.4-19.fc9.x86_64 #1
Nov  2 22:20:57 taz kernel:
Nov  2 22:20:57 taz kernel: Call Trace:
Nov  2 22:20:57 taz kernel: [<ffffffffa0041bae>] ext4_da_writepages+0x189/0x322 [ext4]
Nov  2 22:20:57 taz kernel: [<ffffffff8114b6b1>] ? __next_cpu+0x19/0x26
Nov  2 22:20:57 taz kernel: [<ffffffffa0042d26>] ? ext4_da_get_block_write+0x0/0x11c [ext4]
Nov  2 22:20:57 taz kernel: [<ffffffff81094355>] do_writepages+0x28/0x38
Nov  2 22:20:57 taz kernel: [<ffffffff810dbc9c>] __writeback_single_inode+0x185/0x2f9
Nov  2 22:20:57 taz kernel: [<ffffffff810334b1>] ? __dequeue_entity+0x61/0x6a
Nov  2 22:20:57 taz kernel: [<ffffffff810dc1f5>] generic_sync_sb_inodes+0x229/0x309
Nov  2 22:20:57 taz kernel: [<ffffffff810dc55e>] writeback_inodes+0xa4/0xfd
Nov  2 22:20:57 taz kernel: [<ffffffff810944ab>] wb_kupdate+0xa3/0x119
Nov  2 22:20:57 taz kernel: [<ffffffff81094ebf>] pdflush+0x16e/0x231
Nov  2 22:20:57 taz kernel: [<ffffffff81094408>] ? wb_kupdate+0x0/0x119
Nov  2 22:20:57 taz kernel: [<ffffffff81094d51>] ? pdflush+0x0/0x231
Nov  2 22:20:57 taz kernel: [<ffffffff81094d51>] ? pdflush+0x0/0x231
Nov  2 22:20:57 taz kernel: [<ffffffff8105338b>] kthread+0x49/0x76
Nov  2 22:20:57 taz kernel: [<ffffffff810116e9>] child_rip+0xa/0x11
Nov  2 22:20:57 taz kernel: [<ffffffff81010a07>] ? restore_args+0x0/0x30
Nov  2 22:20:57 taz kernel: [<ffffffff81053342>] ? kthread+0x0/0x76
Nov  2 22:20:57 taz kernel: [<ffffffff810116df>] ? child_rip+0x0/0x11
Nov  2 22:20:57 taz kernel:
Nov  2 22:21:27 taz kernel: ext4_da_writepages: jbd2_start: 1024 pages, ino 7113325; err -30
Nov  2 22:21:27 taz kernel: Pid: 224, comm: pdflush Not tainted 2.6.27.4-19.fc9.x86_64 #1
Nov  2 22:21:27 taz kernel:
Nov  2 22:21:27 taz kernel: Call Trace:
Nov  2 22:21:27 taz kernel: [<ffffffffa0041bae>] ext4_da_writepages+0x189/0x322 [ext4]
Nov  2 22:21:27 taz kernel: [<ffffffff8114b6b1>] ? __next_cpu+0x19/0x26
Nov  2 22:21:27 taz kernel: [<ffffffffa0042d26>] ? ext4_da_get_block_write+0x0/0x11c [ext4]
Nov  2 22:21:27 taz kernel: [<ffffffff81094355>] do_writepages+0x28/0x38
Nov  2 22:21:27 taz kernel: [<ffffffff810dbc9c>] __writeback_single_inode+0x185/0x2f9
Nov  2 22:21:27 taz kernel: [<ffffffff810334b1>] ? __dequeue_entity+0x61/0x6a
Nov  2 22:21:27 taz kernel: [<ffffffff810dc1f5>] generic_sync_sb_inodes+0x229/0x309
Nov  2 22:21:27 taz kernel: [<ffffffff810dc55e>] writeback_inodes+0xa4/0xfd
Nov  2 22:21:27 taz kernel: [<ffffffff810944ab>] wb_kupdate+0xa3/0x119
Nov  2 22:21:27 taz kernel: [<ffffffff81094ebf>] pdflush+0x16e/0x231
Nov  2 22:21:27 taz kernel: [<ffffffff81094408>] ? wb_kupdate+0x0/0x119
Nov  2 22:21:27 taz kernel: [<ffffffff81094d51>] ? pdflush+0x0/0x231
Nov  2 22:21:27 taz kernel: [<ffffffff81094d51>] ? pdflush+0x0/0x231
Nov  2 22:21:27 taz kernel: [<ffffffff8105338b>] kthread+0x49/0x76
Nov  2 22:21:27 taz kernel: [<ffffffff810116e9>] child_rip+0xa/0x11
Nov  2 22:21:27 taz kernel: [<ffffffff81010a07>] ? restore_args+0x0/0x30
Nov  2 22:21:27 taz kernel: [<ffffffff81053342>] ? kthread+0x0/0x76
Nov  2 22:21:27 taz kernel: [<ffffffff810116df>] ? child_rip+0x0/0x11

that is roll over and over for until I reboot the computer and I does a e2fsck from single run level.

Version-Release number of selected component (if applicable):

F9 up-to-date
kernel-2.6.27.4-19.fc9.x86_64

How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Eric Sandeen 2008-11-03 15:48:16 UTC
What does e2fsck find, if anything?  I wouldn't expect this to be a disk corruption issue, but if e2fsck found problems please attach that info.

Thanks,
-Eric

Comment 2 Eric Sandeen 2008-11-03 15:54:36 UTC
Also; can you let me know what the filesystem geometry is (dumpe2fs -h), as well as which mount options you're using?

Thanks,
-Eric

Comment 3 Theodore Tso 2008-11-03 16:09:05 UTC
Looks like this bug is also being tracked at http://bugzilla.kernel.org/show_bug.cgi?id=11937 and there is a proposed fix.   I'm just waiting for feedback patch.

Comment 4 Mihai Harpau 2008-11-03 18:13:48 UTC
Re: comment #1, #2
No, e2fsck does not find anything.

[mihai@taz ~]$ df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/taz-root   28G  4.5G   22G  17% /
/dev/mapper/taz-home  109G   95G  9.2G  92% /home
/dev/sda1              99M   27M   68M  29% /boot
tmpfs                 994M   48K  994M   1% /dev/shm


[root@taz ~]# dumpe2fs -h /dev/mapper/taz-home
dumpe2fs 1.41.3 (12-Oct-2008)
Filesystem volume name:   <none>
Last mounted on:          <not available>
Filesystem UUID:          d06f7797-270a-45c7-8f5b-7c85f7db0698
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal ext_attr resize_inode dir_index filetype needs_recovery extent sparse_super large_file
Filesystem flags:         signed_directory_hash test_filesystem 
Default mount options:    user_xattr acl
Filesystem state:         clean
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              29491200
Block count:              29483008
Reserved block count:     1474150
Free blocks:              3656225
Free inodes:              29241170
First block:              0
Block size:               4096
Fragment size:            4096
Reserved GDT blocks:      1024
Blocks per group:         32768
Fragments per group:      32768
Inodes per group:         32768
Inode blocks per group:   1024
Filesystem created:       Mon Nov 28 23:32:34 2005
Last mount time:          Mon Nov  3 15:29:34 2008
Last write time:          Mon Nov  3 15:29:34 2008
Mount count:              1
Maximum mount count:      -1
Last checked:             Mon Nov  3 15:27:31 2008
Check interval:           0 (<none>)
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:	          128
Journal inode:            8
Default directory hash:   tea
Directory Hash Seed:      2a02e5ed-d63c-4f57-b50e-5a135bdf95dd
Journal backup:           inode blocks
Journal size:             32M


[root@taz ~]# cat /proc/mounts
rootfs / rootfs rw 0 0
/dev/root / ext4dev rw,relatime,barrier=1,data=ordered 0 0
/dev /dev tmpfs rw,relatime,mode=755 0 0
/proc /proc proc rw,relatime 0 0
/sys /sys sysfs rw,relatime 0 0
none /selinux selinuxfs rw,relatime 0 0
/proc/bus/usb /proc/bus/usb usbfs rw,relatime 0 0
devpts /dev/pts devpts rw,relatime,gid=5,mode=620 0 0
/dev/mapper/taz-home /home ext4dev rw,relatime,barrier=1,data=ordered 0 0
/dev/sda1 /boot ext3 rw,relatime,errors=continue,user_xattr,acl,data=ordered 0 0
tmpfs /dev/shm tmpfs rw,relatime 0 0
none /proc/sys/fs/binfmt_misc binfmt_misc rw,relatime 0 0
sunrpc /var/lib/nfs/rpc_pipefs rpc_pipefs rw,relatime 0 0
fusectl /sys/fs/fuse/connections fusectl rw,relatime 0 0
/proc /var/named/chroot/proc proc rw,relatime 0 0
gvfs-fuse-daemon /home/mihai/.gvfs fuse.gvfs-fuse-daemon rw,nosuid,nodev,relatime,user_id=500,group_id=500 0 0

Comment 5 Mihai Harpau 2008-11-03 18:24:31 UTC
In reply to comment #3:

Do you need my feedback or are you referring to feedback from bugzilla.kernel.org?
Also that patch to fix my problem seems to be against kernel 2.6.28-rc2, isn't?

Comment 6 Theodore Tso 2008-11-03 19:07:41 UTC
In reply to comment #5, yes, that patch is against 2.6.28-rc2, and against the ext3 filesystem.   There was a similar patch that caused the same regression for ext3, and which is in the Ext4 tree, and a similar patch which is in the ext4 patch queue.

I'll attach it here for your convenence, but Eric knows where to find it.  :-)

Comment 7 Theodore Tso 2008-11-03 19:08:36 UTC
Created attachment 322354 [details]
Proposed patch for the problem reported here.

Comment 8 Chuck Ebbert 2008-11-06 02:28:27 UTC
(In reply to comment #5)
> In reply to comment #3:
> 
> Do you need my feedback or are you referring to feedback from
> bugzilla.kernel.org?
> Also that patch to fix my problem seems to be against kernel 2.6.28-rc2, isn't?

2.6.27.4 fc9 kernels include the ext4 updates from 2.6.28-rc2

Comment 9 Mihai Harpau 2008-11-06 19:11:00 UTC
Ok, now I run the kernel-test 2.6.27.4-26.mh.bz469582.fc9.x86_64 that means kernel 2.6.27.4-26 + patch from comment #7. I'll report back any results about this kernel-test.

Comment 10 Mihai Harpau 2008-11-08 13:01:23 UTC
1. After running for 24 hours the kernel-test 2.6.27.4-26.mh.bz469582.fc9.x86_64 I don't see anymore the crash from comment #1.
2. After that period of testing I also increase the journal size from 32M to 128M for filesystem /dev/taz/home, as per advice of Mr. Theodore Tso from http://lkml.org/lkml/2008/11/1/61, and running the same kernel-test to see if I can have more performance from filesystem

Comment 11 Chuck Ebbert 2008-11-08 22:13:55 UTC
Fix from upstream went in 2.6.27.5-30

Comment 12 Fedora Update System 2008-11-10 13:15:32 UTC
kernel-2.6.27.5-32.fc9 has been submitted as an update for Fedora 9.
http://admin.fedoraproject.org/updates/kernel-2.6.27.5-32.fc9

Comment 13 Fedora Update System 2008-11-12 02:57:33 UTC
kernel-2.6.27.5-32.fc9 has been pushed to the Fedora 9 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update kernel'.  You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F9/FEDORA-2008-9583

Comment 14 Fedora Update System 2008-11-13 07:42:41 UTC
kernel-2.6.27.5-37.fc9 has been submitted as an update for Fedora 9.
http://admin.fedoraproject.org/updates/kernel-2.6.27.5-37.fc9

Comment 15 Fedora Update System 2008-11-14 11:53:57 UTC
kernel-2.6.27.5-41.fc9 has been submitted as an update for Fedora 9.
http://admin.fedoraproject.org/updates/kernel-2.6.27.5-41.fc9

Comment 16 Fedora Update System 2008-11-19 14:54:22 UTC
kernel-2.6.27.5-41.fc9 has been pushed to the Fedora 9 stable repository.  If problems still persist, please make note of it in this bug report.