From Bugzilla Helper: User-Agent: Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0) Description of problem: While writing large files (512MB - 1024MB) with a utility called Bonnie (www.textuality.com/bonnie, source attached) on ext3 partitions located on a RAID 1 drive array, the following kernel messages were observed: (hostname) kernel: JBD: out of memory for journal head. and (hostname) kernel: ext3_write_inode: inside transaction! The machine is a P3-667Mhz with 256MB RAM, IBM ServeRAID 4M controller (also happens with ServeRAID 4L). It looks like the file size ends up being too small - Bonnie writes a file out using the size you give it, and then reads it back in. It reports an error because it receives an EOF before it reads in the number of bytes it was supposed to. How reproducible: Sometimes Steps to Reproduce: 1. Use bonnie to benchmark a RAID 1 drive array using a 1024MB file - bonnie -s 1024 -d <dir on RAID 1> 2. The kernel messages described above should appear, and bonnie should fail while "Reading with getc()..." Additional info:
Created attachment 24367 [details] source for Bonnie drive benchmarking utility
The first message is a warning only, but the file should be ok, so this is indeed a bug.
Both messages are warnings. The second message is benign: it is simply a result of a debugging message which escaped into the wild and has since been eliminated. The out-of-memory error means that the kernel ran out of memory at a critical point for the filesystem. The filesystem will retry in that case until the allocation succeeds, so it is not the root cause of any file corruption, but it is entirely probable that if we got into such a low-memory state that the kernel would start failing other filesystem write operations with -ENOMEM for other reasons which might not have been logged. There is a new version of ext3 being pushed out without the second debugging message, and with more informative output about out-of-memory situations, but the underlying low-memory problems may remain. We will let you know when that build is available for you --- it has been built locally already and will be on ftp soon. I really suspect that the remaining part of the problem is VM-related, not filesystem-related (although we know for certain that ext3's pattern of VM use does cause problems for the VM that ext2 does not provoke.) The results from the newer build will be useful in determining this.
This defect is considered SHOULD-FIX for Fairfax.
Could you try the kernel from the Roswell beta or from rawhide? There is still VM tuning being done, but the ext3 debug logging has been cleaned up enormously.
The Roswell kernel does seem to have cleared this up - we no longer get those messages.