Bug 167186

Summary: ext2 and ext3 file systems are extremely slow when deleting large files on a busy file system
Product: Red Hat Enterprise Linux 3 Reporter: Jos VanWezel <jvw>
Component: kernelAssignee: Stephen Tweedie <sct>
Status: CLOSED DEFERRED QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.0CC: petrides
Target Milestone: ---Keywords: FutureFeature
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-09-05 16:19:44 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jos VanWezel 2005-08-31 13:28:32 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.8) Gecko/20050511 Firefox/1.0.4

Description of problem:
Our application writes large (> 1 GB files) to ext3 formatted SATA disks. After a while the disk is filling up and deleting the files takes a very long time (10 to hundreds of seconds). I have tried several logging options for ext3 without success. Without log (ext2) the problem remains. 

You can reproduce the problem easily by filling a disk partition with large files and while writing new files try to delete some old ones.

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1.
2.
3.
  You can reproduce the problem easily by filling a disk partition with large files and while writing new files try to delete some old ones.

Additional info:

We moved the application to xfs and the problem dissappeared. For support we would like to keep the ext3 file systems. Is there an option we can use to improve throughput.

Comment 1 Stephen Tweedie 2005-09-05 16:19:44 UTC
Deleting large files requires ext2/3 to walk a fairly large amount of scattered
metadata.  The indirect blocks needed to map files to disk blocks are not laid
out sequentially, so delete requires all of these to be read in randomly.

And each such block only gets read once the previous one has been processed. 
It's a set of synchronous read operations.  Each read only gets queued once the
previous one has been processed; on a very busy disk, each read has to compete
for the disk's attention separately, so the time taken builds up rapidly.

Because this is metadata read IO that's taking the time, ext2 is affected just
as much; the journal in ext3 only has an impact on write performance, it doesn't
come into the picture for reads at all.

There are several possible ways to improve the situation.  In RHEL-4, two new
disk IO schedulers (the CFQ "completely fair queuing" scheduler --- the new
default --- and especially the "anticipatory" scheduler, optimised for
interactive performance) will both improve this by allowing a task to submit
multiple reads without being preempted too much.  There are prototype patches to
let ext3 maintain its disk mapping information more compactly via "extent maps",
which would greatly reduce the amount of metadata to be read to complete a
delete (this is what XFS already does.)  And we might asynchronously read-ahead
some of the metadata in question.

However, all of these changes are far too invasive to consider for an
established, stable release like RHEL-3.  The scheduler improvements in RHEL-4
are likely to help a lot; other than that, it will be up to future enhancements
(extents in particular) to really address this particular performance property
of ext3.