49578 – ext3 kernel messages when writing large files / file size irregularity

Bug 49578 - ext3 kernel messages when writing large files / file size irregularity

Summary: ext3 kernel messages when writing large files / file size irregularity

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Linux
Classification:	Retired
Component:	kernel
Sub Component:
Version:	7.3
Hardware:	i386
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Stephen Tweedie
QA Contact:	Aaron Brown
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2001-07-20 20:55 UTC by Don Smith
Modified:	2007-04-18 16:34 UTC (History)
CC List:	0 users
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2001-08-03 14:56:55 UTC
Embargoed:

Attachments	(Terms of Use)
source for Bonnie drive benchmarking utility (17.40 KB, text/plain) 2001-07-20 20:56 UTC, Don Smith	no flags	Details
View All

Description Don Smith 2001-07-20 20:55:56 UTC

From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0)

Description of problem:
While writing large files (512MB - 1024MB) with a utility called Bonnie 
(www.textuality.com/bonnie, source attached) on ext3 partitions located on 
a RAID 1 drive array, the following kernel messages were observed:

(hostname) kernel: JBD: out of memory for journal head.

and

(hostname) kernel: ext3_write_inode: inside transaction!

The machine is a P3-667Mhz with 256MB RAM, IBM ServeRAID 4M controller 
(also happens with ServeRAID 4L). 

It looks like the file size ends up being too small - Bonnie writes a file 
out using the size you give it, and then reads it back in. It  reports an 
error because it receives an EOF before it reads in the number of bytes it 
was supposed to.

How reproducible:
Sometimes

Steps to Reproduce:
1. Use bonnie to benchmark a RAID 1 drive array using a 1024MB file - 
bonnie -s 1024 -d <dir on RAID 1>
2. The kernel messages described above should appear, and bonnie should 
fail while "Reading with getc()..."
	

Additional info:

Comment 1 Don Smith 2001-07-20 20:56:45 UTC

Created attachment 24367 [details]
source for Bonnie drive benchmarking utility

Comment 2 Arjan van de Ven 2001-07-20 21:01:08 UTC

The first message is a warning only, but the file should be ok, so this is 
indeed a bug.

Comment 3 Stephen Tweedie 2001-07-21 00:12:05 UTC

Both messages are warnings.  The second message is benign: it is simply a result
of a debugging message which escaped into the wild and has since been
eliminated.  The out-of-memory error means that the kernel ran out of memory at
a critical point for the filesystem.  The filesystem will retry in that case
until the allocation succeeds, so it is not the root cause of any file
corruption, but it is entirely probable that if we got into such a low-memory
state that the kernel would start failing other filesystem write operations with
-ENOMEM for other reasons which might not have been logged.

There is a new version of ext3 being pushed out without the second debugging
message, and with more informative output about out-of-memory situations, but
the underlying low-memory problems may remain.  We will let you know when that
build is available for you --- it has been built locally already and will be on
ftp soon.

I really suspect that the remaining part of the problem is VM-related, not
filesystem-related (although we know for certain that ext3's pattern of VM use
does cause problems for the VM that ext2 does not provoke.)  The results from
the newer build will be useful in determining this.

Comment 4 Glen Foster 2001-07-23 21:17:43 UTC

This defect is considered SHOULD-FIX for Fairfax.

Comment 5 Stephen Tweedie 2001-08-03 14:19:53 UTC

Could you try the kernel from the Roswell beta or from rawhide?  There is still
VM tuning being done, but the ext3 debug logging has been cleaned up enormously.

Comment 6 Don Smith 2001-08-03 14:56:50 UTC

The Roswell kernel does seem to have cleared this up - we no longer get those 
messages.

Note You need to log in before you can comment on or make changes to this bug.