From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.4)
Description of problem:
Bordeaux w/ 2.4.9-13.1smp, bios a02, 32GB ram, 2GB swap.
Running copy and compare on a qlogic2200 8GB Lun with one primary partition
and ext3 filesystem mounted to /sdc directory.
Copy and compare is started with 1 stream and a count of 50 using a 650MB
iso image. The source file and both destination directories are placed in
the qlogic mounted filesystem. After about 35 successful passes, cpcmp
errors out and states "no space left on device," but df reports utilization
of filesystem at only 25%.
If one successful pass is made, then any number of passes should complete,
but this is not happening.
This same test has passed on a megaraid filesystem and an onboard qlogic
filesystem. Only a qlogic2200 add-in card with direct attached PV650F
storage is failing.
I have tried recreating partition with parted, reformatting filesystem with
mke2fs -j, and remounting filesystem.
I have not tried ext2, and have not recreated the lun with the qlogic bios
utility. I will do this next.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Install bordeaux with 2.4.9-13.1 or earlier, qlogic2200 w/ PV650F
2. Create a GPT primary partition with ext3 filesystem and mount
3. Run copy and compare to use atleast 1GB of space per pass, and run
Actual Results: Failure of copy and compare due to lack of disk space,
despite df reporting only 25% utilization of filesystem.
Expected Results: No failure
Please try with anything *other* than 2.4.9-13.1. 2.4.9-13.1 is 'known bad'.
It should be noted that the 2.4.9-13.1 that Clay refers to was built by me,
based on -13 plus megaraid 1.18 patch applied and e1000 (unsigned long->u32)
fix. The same problem occurred with 2.4.9-9.x also.
small request: please add something like "dell" to the version in the future in
order to avoid confusion with Red Hat build numbers
Unfortionatly, the qlogic2x00 driver is of such poor quality that it's not
realistic to find let alone fix bugs in it. If Dell has contacts with Qlogic,
please take it up with Qlogic.
Reproduced in RC2 and RC1. Also a useful piece of information. I tried ext2
and not only did I not get the error, but the performance of the copy and
compare was significantly improved. I'd say atleast 5x faster with ext2.
Please try the 2.4.9-13.3 we sent to Dell yesterday; the error seems to be an
out-of-memory condition that could be fixed/avoided in that kernel.
Reproduced in qa1108 (2.4.9-13.3smp)
2.4.9-13.4 has another possible fix and will be available for you shortly
your copy/compare is one of the kinds of loads that ext3 is expected
to be slower on. You can test ext3 with "data=writeback" and you should
find that the performance is pretty close to what you get with ext2.
and in particular http://www.redhat.com/support/wpapers/redhat/ext3/tuning.html
2.4.9-13.4 is now available at
Please give it a whirl!
Reproduced with qa1108 updated to 2.4.9-13.4smp kernel.
Reproduced with qa1129 (2.4.9-17.3smp kernel), though it did take twice as long
Where can we pick up a copy of cpcmp to run?
that was quick, thanks!
To make sure that we are duplicating the problem, what is the
*exact* command line that you are running to see this?
./cpcmp.pl 9 testbig.iso ./a,./b,50,1
Testbig.iso is a single 670MB iso image file, made from a distro CD I think.
./a and ./b are placed in the directory the qlogic device is mounted to.
And I start one stream, for 50 iterations.
My lun is 8GB, so this scenario should never get above appr. 25% disk utilization.
I've had one successful run so far on our bordeaux with 32GB RAM, 2GB
swap, writing on an empty 8GB ext3 partition on a drive on a PV660F
using the qla2200 driver (not qla2x00 driver) with the 2.4.9-17.3
smp kernel. I've started another one.
I wonder, is there any chance that we didn't loudly enough communicate
to you that qla2x00 is the old driver left in for comparison with the
new driver, qla2200?
Yes, there is a pretty good chance of that :) I will try the qla2200 now.
I had another run complete successfully and have started a third, so I'll
bet this is fixed for you as well.
FWIW, Kudzu should have this set up correctly, so new installs should
select the 2200 driver for ISP2200 cards. We're leaving the 2x00 driver
for the ISP2100 cards until we can confirm that Arjan's driver works for
the ISP2100 cards.
FYI, kudzu has been updated because my modules.conf had a qla2200 entry after
install. However the initrd did not contain the driver because the installer
has not been updated with qla2200 entries.
OK, folks here confirm that after that tree went out, a new fix was put in
the installer for this case, so it should work right from the installer
I've seen two successful 50-count runs. I am satisfied that this is fixed.
"I am satisfied that this is fixed." -> closing
Since this seemed fixed, we decided to step up the disk IO intensity to really
test the 2.4.9-17.4 qlogic driver and am getting this error again.
Started cpcmp on 5 luns simultaneously with 4 streams on each lun with:
./cpcmp.pl 9 testbig.iso ./a,./b,50,4
This gets each lun's disk usage to roughly 50-60% at peak usage.
After a couple of hours, 3 of the luns are reporting out of disk space even
though they are only at 50-60% used.
Same setup as previously mentioned, 4 of 5 luns failed within an hour with
This is the same testing level that you've been doing with other
controllers like the adaptec SCSI controller, the megaraid controller,
Stephen says that this could be a design effect of ext3. Instead of
having just "in use" and "free" blocks, you have "in use", "free",
and "free but not available because transactions have not yet been
committed". When your scripts get in sync and delete several iso
images at the same time, it is possible that you get so many of that
third state that is neither properly "in use" nor "free" that with
apparantly plenty of space on the filesystem, you temporarily do not
have free blocks to use. It is also possible that this only happens
with qlogic and not with the other drivers because of differences in
If you try this with ext2 and it still happens, that would rule this
out, of course. Could you do that?
If you want the gory detail, ext3 has to avoid reusing disk blocks which have
been freed but where the delete has not yet been committed to disk. After a
crash, subsequent recovery can end up rolling back any uncommitted transactions,
and if we reused deleted blocks before their commit, we might overwrite them on
disk while there is still a chance that the old version might be needed after a
recovery and rollback.
I've never seen a scenario where this made an observable difference, but doing
mass deletes and rewrites on a sufficiently fast disk array might cause the
behaviour you describe simply because the deletes have not been committed. If
we can determine that this is definitely the cause of your problem, then it
should be possible to teach ext3 to recognise this situation and to force an
early commit rather than returning ENOSPACE immediately.
Per conversation with Clay, ext2 doesn't fail, just ext3. Also, I've often
seen the cpcmp scripts (which simply kick off all processes at once) running
pretty synchronized, so that leads me to believe that Stephen's thoughts are
Clay, please try mounting with "data=writeback" and see if this goes away.
I've also asked to have the tests run with more small files
(say, /usr/share/doc instead of a 650MB file) and see if anything changes.
"data=writeback" will not cure the problem. All that "writeback" mode does is
to remove any synchronisation of data writes with transaction commits, so that
newly-written data may not be seen on disk after a crash, and you can therefore
find stale data blocks in recently-allocated files.
Even in writeback mode, ext3 still refuses to allow such data writes to
overwrite metadata which is still valid.
However, in "data=journal" mode, the problem _will_ probably disappear, simply
because the writing of data to the journal will force transaction commits much
more frequently (of course, you'll see much worse performance in that mode for
the sort of workload we're talking about here).
Running the five-lun setup mounted with the data=journal option. It is running
fine (if a little slow) and has made it past the point where I would expect an
error. I'm letting it run over night and will update you in the morning whether
pass or fail.
One other thing to try is to simply test with a larger partition -- try, say,
a 12GB instead of 8GB disk with the same cpcmp invocation.
Also, is your lun a single disk, or is it a switched raid volume that would
be faster than a single disk?
Well, still running this morning. Still hasn't finished due to the slower
performance of running with data=journal. This is definitely past the point
where I would expect a failure, though.
Each lun is on a single 8GB disk.
I would need to reconfigure the enclosure to span across multiple disks to get a
bigger lun, which I could do but haven't tried.
One thing that still puzzles me...what happened in the qlogic driver between
17.3 and 17.4 that made the problem disappear with only one stream running?
And due to ext3, will there always be some threshold where the uncommitted but
deleted bits will affect valid disk space size? To what extent can and will
ext3 be modified to account for this?
Use software RAID 0 to put two luns together to test on a larger FS.
17.3 and earlier: The older driver was slower. With enough physical
ram to cache several successive allocations without needing to go to
disk, and a big enough journal, it's possible that you could get enough
block deletes pending but not committed because of the slow driver.
Part of it might have been serialization due to more abuse of the
io_request_lock, I don't know.
And yes, quite fundamentally with a journaling file system, the two
phase commit will always mean that there will, for some window of time
after a file has been deleted, be blocks that are neither allocated for
existing files nor available for use for other files. That's just one
of the fundamental tradeoffs of a journaling file system. You are
trading off space for consistency. The journal size is only part of
This is a degenerate case and so far Matt hasn't taken my challenge to
step up to the plate with a real-world case where this could be a problem.
In normal cases, this pressure doesn't show up very much.
Lowering to Sev 2 here, "enhancement" in Bugzilla. Workarounds include:
1) data=journal (working so far)
2) increase disk space so you've got >=2X disk space
3) 'sync; sync; sync; sleep 10;' between massive deletes and refills to give
the journal a fighting chance to run.
4) using a larger journal
4) use a smaller journal