Bug 167186 - ext2 and ext3 file systems are extremely slow when deleting large files on a busy file system
Summary: ext2 and ext3 file systems are extremely slow when deleting large files on a ...
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel
Version: 3.0
Hardware: i386
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Stephen Tweedie
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2005-08-31 13:28 UTC by Jos VanWezel
Modified: 2007-11-30 22:07 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Enhancement
Doc Text:
Clone Of:
Environment:
Last Closed: 2005-09-05 16:19:44 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Jos VanWezel 2005-08-31 13:28:32 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.8) Gecko/20050511 Firefox/1.0.4

Description of problem:
Our application writes large (> 1 GB files) to ext3 formatted SATA disks. After a while the disk is filling up and deleting the files takes a very long time (10 to hundreds of seconds). I have tried several logging options for ext3 without success. Without log (ext2) the problem remains. 

You can reproduce the problem easily by filling a disk partition with large files and while writing new files try to delete some old ones.

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1.
2.
3.
  You can reproduce the problem easily by filling a disk partition with large files and while writing new files try to delete some old ones.

Additional info:

We moved the application to xfs and the problem dissappeared. For support we would like to keep the ext3 file systems. Is there an option we can use to improve throughput.

Comment 1 Stephen Tweedie 2005-09-05 16:19:44 UTC
Deleting large files requires ext2/3 to walk a fairly large amount of scattered
metadata.  The indirect blocks needed to map files to disk blocks are not laid
out sequentially, so delete requires all of these to be read in randomly.

And each such block only gets read once the previous one has been processed. 
It's a set of synchronous read operations.  Each read only gets queued once the
previous one has been processed; on a very busy disk, each read has to compete
for the disk's attention separately, so the time taken builds up rapidly.

Because this is metadata read IO that's taking the time, ext2 is affected just
as much; the journal in ext3 only has an impact on write performance, it doesn't
come into the picture for reads at all.

There are several possible ways to improve the situation.  In RHEL-4, two new
disk IO schedulers (the CFQ "completely fair queuing" scheduler --- the new
default --- and especially the "anticipatory" scheduler, optimised for
interactive performance) will both improve this by allowing a task to submit
multiple reads without being preempted too much.  There are prototype patches to
let ext3 maintain its disk mapping information more compactly via "extent maps",
which would greatly reduce the amount of metadata to be read to complete a
delete (this is what XFS already does.)  And we might asynchronously read-ahead
some of the metadata in question.

However, all of these changes are far too invasive to consider for an
established, stable release like RHEL-3.  The scheduler improvements in RHEL-4
are likely to help a lot; other than that, it will be up to future enhancements
(extents in particular) to really address this particular performance property
of ext3.


Note You need to log in before you can comment on or make changes to this bug.