Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

For bugs related to Red Hat Enterprise Linux 5 product line. The current stable release is 5.10. For Red Hat Enterprise Linux 6 and above, please visit Red Hat JIRA https://issues.redhat.com/secure/CreateIssue!default.jspa?pid=12332745 to report new issues.

Bug 770767

Summary:	ext4: file writes become 20-100x times slower when partition was 97% full
Product:	Red Hat Enterprise Linux 5	Reporter:	Andrey <aokunev>
Component:	kernel	Assignee:	Red Hat Kernel Manager <kernel-mgr>
Status:	CLOSED INSUFFICIENT_DATA	QA Contact:	Red Hat Kernel QE team <kernel-qe>
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	5.7	CC:	esandeen, rwheeler
Target Milestone:	rc
Target Release:	---
Hardware:	i686
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2013-02-15 13:16:07 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Andrey 2011-12-29 02:37:38 UTC

Description of problem:
IO operations became 20-100x slow if large ext4 filesystem is more than 97% full. There are call traces in dmesg output regarding ext4 calls:
Call Trace:
 [<f906850d>] ext4_mark_iloc_dirty+0x42d/0x4e1 [ext4]
 [<f9068539>] ext4_mark_iloc_dirty+0x459/0x4e1 [ext4]
 [<f9068d1f>] ext4_mark_inode_dirty+0x167/0x19b [ext4]
 [<f900aa8b>] start_this_handle+0x224/0x33b [jbd2]
 [<c043731b>] autoremove_wake_function+0x0/0x2d
 [<f900ac2f>] jbd2_journal_start+0x8d/0xbc [jbd2]
 [<f906dcfd>] ext4_da_write_begin+0x18e/0x28b [ext4]
 [<c045952b>] generic_file_buffered_write+0x101/0x58b
 [<f900a4b3>] jbd2_journal_stop+0x177/0x181 [jbd2]
 [<c0459e5b>] __generic_file_aio_write_nolock+0x4a6/0x52a
 [<c04c8bf3>] avc_has_perm_noaudit+0x5e/0x336
 [<c040597a>] common_interrupt+0x1a/0x20
 [<c04c9927>] avc_has_perm+0x3c/0x46
 [<c0459f38>] generic_file_aio_write+0x59/0xac
 [<f9065d4c>] ext4_file_write+0xf3/0x1ef [ext4]
 [<c047628a>] do_sync_write+0xb6/0xf1
 [<c043731b>] autoremove_wake_function+0x0/0x2d
 [<c04761d4>] do_sync_write+0x0/0xf1
 [<c0476b13>] vfs_write+0xa1/0x143
 [<c047713d>] sys_write+0x3c/0x63
 [<c0404f4b>] syscall_call+0x7/0xb


Version-Release number of selected component (if applicable):
e4fsprogs-1.41.12-2.el5
kernel-PAE-2.6.18-274.12.1.el5

How reproducible:
I was able to test and reproduce the scenario on two different servers with two different underlying physical storages
* md3000i with single virtual drive and multipathd 
* dell h700 with 6x2TB drives and raid0 softraid

Steps to Reproduce:
1. I've created 11TB partition, formated as ext4 filesystem with default parameters 'mkfs.ext4 -L /opt /dev/md0'
2. run benchmark that copied 6 millions of small(100KB) files and 6 millions of 100MB files from local drive to the storage. 6 cp processes in parallel - that simulated our workload.

  
Actual results:
File writes become 20-100x times slower when ext4 partition was 97% full. 
LA is up to 800, server crashes

Expected results:
no slowdown

Additional info:


[root@localhost ~]# dumpe4fs /dev/md0 
dumpe4fs 1.41.12 (17-May-2010)
Filesystem volume name:   test
Last mounted on:          /opt/opt
Filesystem UUID:          6c9b3fe9-9198-440d-976c-cc269228590a
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize
Filesystem flags:         signed_directory_hash 
Default mount options:    (none)
Filesystem state:         clean
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              724697088
Block count:              2898784512
Reserved block count:     144939225
Free blocks:              2853256642
Free inodes:              724697077
First block:              0
Block size:               4096
Fragment size:            4096
Reserved GDT blocks:      332
Blocks per group:         32768
Fragments per group:      32768
Inodes per group:         8192
Inode blocks per group:   512
Flex block group size:    16
Filesystem created:       Tue Dec 27 15:38:47 2011
Last mount time:          Tue Dec 27 15:56:34 2011
Last write time:          Tue Dec 27 15:56:34 2011
Mount count:              1
Maximum mount count:      23
Last checked:             Tue Dec 27 15:38:47 2011
Check interval:           15552000 (6 months)
Next check after:         Sun Jun 24 16:38:47 2012
Lifetime writes:          173 GB
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:               256
Required extra isize:     28
Desired extra isize:      28
Journal inode:            8
Default directory hash:   half_md4
Directory Hash Seed:      fe943c16-2765-42cf-a23c-99cd30c65772
Journal backup:           inode blocks
Journal features:         journal_incompat_revoke
Journal size:             128M
Journal length:           32768
Journal sequence:         0x000012b5
Journal start:            27176

Comment 1 Ric Wheeler 2012-01-03 15:57:48 UTC

Please open this through red hat support so we can have them help grab information. If you don't have a red hat support contract, we should take this discussion out to the upstream lists.

Thanks!

Comment 2 Andrey 2012-01-03 17:13:03 UTC

We don't have Red Hat support contract. Can you please specify to what upstream list should I forward this issue.

Thank you, 
Regards, Andrey

Comment 3 Eric Sandeen 2012-01-03 18:27:07 UTC

linux-ext4.org please.  Although ideally it should be tested on an upstream kernel prior to reporting there.

Thanks,
-Eric

Comment 4 Jes Sorensen 2013-02-15 13:16:07 UTC

Closing since this was moved to upstream lists