55836 – Erroneous "no space left on device" error when running copy and compare on qlogic device

Bug 55836 - Erroneous "no space left on device" error when running copy and compare on qlogic device

Summary: Erroneous "no space left on device" error when running copy and compare on ql...

Keywords:
Status:	CLOSED DEFERRED
Alias:	None
Product:	Red Hat Linux
Classification:	Retired
Component:	kernel
Sub Component:
Version:	7.3
Hardware:	ia64
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Arjan van de Ven
QA Contact:	Brock Organ
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2001-11-07 15:37 UTC by Clay Cooper
Modified:	2007-04-18 16:38 UTC (History)
CC List:	10 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2001-12-11 16:06:52 UTC
Embargoed:

Attachments	(Terms of Use)

Description Clay Cooper 2001-11-07 15:37:37 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.4)
Gecko/20011019 Netscape6/6.2

Description of problem:
Bordeaux w/ 2.4.9-13.1smp, bios a02, 32GB ram, 2GB swap.

Running copy and compare on a qlogic2200 8GB Lun with one primary partition
and ext3 filesystem mounted to /sdc directory.  

Copy and compare is started with 1 stream and a count of 50 using a 650MB
iso image.  The source file and both destination directories are placed in
the qlogic mounted filesystem.  After about 35 successful passes, cpcmp
errors out and states "no space left on device," but df reports utilization
of filesystem at only 25%.

If one successful pass is made, then any number of passes should complete,
but this is not happening.

This same test has passed on a megaraid filesystem and an onboard qlogic
filesystem.  Only a qlogic2200 add-in card with direct attached PV650F
storage is failing.

I have tried recreating partition with parted, reformatting filesystem with
mke2fs -j, and remounting filesystem.

I have not tried ext2, and have not recreated the lun with the qlogic bios
utility.  I will do this next.  

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1.  Install bordeaux with 2.4.9-13.1 or earlier, qlogic2200 w/ PV650F
attached storage.
2.  Create a GPT primary partition with ext3 filesystem and mount

3.  Run copy and compare to use atleast 1GB of space per pass, and run
multiple passes.
	

Actual Results:  Failure of copy and compare due to lack of disk space,
despite df reporting only 25% utilization of filesystem.

Expected Results:  No failure

Additional info:

Comment 1 Bill Nottingham 2001-11-07 15:51:51 UTC

Please try with anything *other* than 2.4.9-13.1. 2.4.9-13.1 is 'known bad'.

Comment 2 Matt Domsch 2001-11-07 15:54:44 UTC

It should be noted that the 2.4.9-13.1 that Clay refers to was built by me, 
based on -13 plus megaraid 1.18 patch applied and e1000 (unsigned long->u32) 
fix.  The same problem occurred with 2.4.9-9.x also.

Comment 3 Arjan van de Ven 2001-11-07 16:00:50 UTC

small request: please add something like "dell" to the version in the future in
order to avoid confusion with Red Hat build numbers

Comment 4 Arjan van de Ven 2001-11-07 16:02:16 UTC

Unfortionatly, the qlogic2x00 driver is of such poor quality that it's not
realistic to find let alone fix bugs in it. If Dell has contacts with Qlogic,
please take it up with Qlogic.

Comment 5 Clay Cooper 2001-11-09 19:23:14 UTC

Reproduced in RC2 and RC1.  Also a useful piece of information.  I tried ext2
and not only did I not get the error, but the performance of the copy and
compare was significantly improved.  I'd say atleast 5x faster with ext2.

Comment 6 Arjan van de Ven 2001-11-09 19:26:48 UTC

Please try the 2.4.9-13.3 we sent to Dell yesterday; the error seems to be an
out-of-memory condition that could be fixed/avoided in that kernel.

Comment 7 Clay Cooper 2001-11-12 20:06:31 UTC

Reproduced in qa1108 (2.4.9-13.3smp)

Comment 8 Michael K. Johnson 2001-11-13 17:31:53 UTC

2.4.9-13.4 has another possible fix and will be available for you shortly

Comment 9 Michael K. Johnson 2001-11-13 19:55:42 UTC

your copy/compare is one of the kinds of loads that ext3 is expected
to be slower on.  You can test ext3 with "data=writeback" and you should
find that the performance is pretty close to what you get with ext2.
See http://www.redhat.com/support/wpapers/redhat/ext3/
and in particular http://www.redhat.com/support/wpapers/redhat/ext3/tuning.html

Comment 10 Michael K. Johnson 2001-11-13 21:31:26 UTC

2.4.9-13.4 is now available at
ftp://ftp.beta.redhat.com/pub/testing/kernel/

Please give it a whirl!

Comment 11 Clay Cooper 2001-11-15 20:51:40 UTC

Reproduced with qa1108 updated to 2.4.9-13.4smp kernel.

Comment 12 Clay Cooper 2001-11-30 16:26:42 UTC

Reproduced with qa1129 (2.4.9-17.3smp kernel), though it did take twice as long
to error.

Comment 13 Michael K. Johnson 2001-11-30 16:40:08 UTC

Where can we pick up a copy of cpcmp to run?

Comment 14 Matt Domsch 2001-11-30 16:43:38 UTC

ftp.beta.redhat.com:~dell/cpcmp.tgz

Comment 15 Michael K. Johnson 2001-11-30 16:45:07 UTC

that was quick, thanks!

Comment 16 Michael K. Johnson 2001-11-30 16:51:43 UTC

To make sure that we are duplicating the problem, what is the
*exact* command line that you are running to see this?

Comment 17 Clay Cooper 2001-11-30 16:59:26 UTC

./cpcmp.pl 9 testbig.iso ./a,./b,50,1

Testbig.iso is a single 670MB iso image file, made from a distro CD I think.

./a and ./b are placed in the directory the qlogic device is mounted to.

And I start one stream, for 50 iterations.

My lun is 8GB, so this scenario should never get above appr. 25% disk utilization.

Comment 18 Michael K. Johnson 2001-11-30 17:01:12 UTC

OK, thanks!

Comment 19 Michael K. Johnson 2001-11-30 19:25:15 UTC

I've had one successful run so far on our bordeaux with 32GB RAM, 2GB
swap, writing on an empty 8GB ext3 partition on a drive on a PV660F
using the qla2200 driver (not qla2x00 driver) with the 2.4.9-17.3
smp kernel.  I've started another one.

I wonder, is there any chance that we didn't loudly enough communicate
to you that qla2x00 is the old driver left in for comparison with the
new driver, qla2200?

Comment 20 Clay Cooper 2001-11-30 19:53:11 UTC

Yes, there is a pretty good chance of that :)  I will try the qla2200 now.

Comment 21 Michael K. Johnson 2001-11-30 20:10:00 UTC

Aha, thanks!

I had another run complete successfully and have started a third, so I'll
bet this is fixed for you as well.

FWIW, Kudzu should have this set up correctly, so new installs should
select the 2200 driver for ISP2200 cards.  We're leaving the 2x00 driver
for the ISP2100 cards until we can confirm that Arjan's driver works for
the ISP2100 cards.

Comment 22 Clay Cooper 2001-11-30 20:18:58 UTC

FYI, kudzu has been updated because my modules.conf had a qla2200 entry after
install.  However the initrd did not contain the driver because the installer
has not been updated with qla2200 entries.

Comment 23 Michael K. Johnson 2001-11-30 20:32:39 UTC

OK, folks here confirm that after that tree went out, a new fix was put in
the installer for this case, so it should work right from the installer
now.

Comment 24 Clay Cooper 2001-12-01 19:03:58 UTC

I've seen two successful 50-count runs.  I am satisfied that this is fixed.

Comment 25 Arjan van de Ven 2001-12-03 13:57:34 UTC

"I am satisfied that this is fixed." -> closing

Comment 26 Clay Cooper 2001-12-07 19:35:14 UTC

Since this seemed fixed, we decided to step up the disk IO intensity to really
test the 2.4.9-17.4 qlogic driver and am getting this error again.

Started cpcmp on 5 luns simultaneously with 4 streams on each lun with:
./cpcmp.pl 9 testbig.iso ./a,./b,50,4

This gets each lun's disk usage to roughly 50-60% at peak usage.

After a couple of hours, 3 of the luns are reporting out of disk space even
though they are only at 50-60% used.

Comment 27 Matt Domsch 2001-12-08 00:13:23 UTC

Reopening

Comment 28 Clay Cooper 2001-12-10 17:22:50 UTC

Same setup as previously mentioned, 4 of 5 luns failed within an hour with
qa1207 (2.4.9-17.6smp).

Comment 29 Michael K. Johnson 2001-12-10 17:25:54 UTC

This is the same testing level that you've been doing with other
controllers like the adaptec SCSI controller, the megaraid controller,
etc.  Right?

Comment 30 Clay Cooper 2001-12-10 17:31:39 UTC

Yes.

Comment 31 Michael K. Johnson 2001-12-10 17:57:02 UTC

Stephen says that this could be a design effect of ext3.  Instead of
having just "in use" and "free" blocks, you have "in use", "free",
and "free but not available because transactions have not yet been
committed".  When your scripts get in sync and delete several iso
images at the same time, it is possible that you get so many of that
third state that is neither properly "in use" nor "free" that with
apparantly plenty of space on the filesystem, you temporarily do not
have free blocks to use.  It is also possible that this only happens
with qlogic and not with the other drivers because of differences in
hardware speed.

If you try this with ext2 and it still happens, that would rule this
out, of course.  Could you do that?

Comment 32 Stephen Tweedie 2001-12-10 18:08:06 UTC

If you want the gory detail, ext3 has to avoid reusing disk blocks which have
been freed but where the delete has not yet been committed to disk.  After a
crash, subsequent recovery can end up rolling back any uncommitted transactions,
and if we reused deleted blocks before their commit, we might overwrite them on
disk while there is still a chance that the old version might be needed after a
recovery and rollback.

I've never seen a scenario where this made an observable difference, but doing
mass deletes and rewrites on a sufficiently fast disk array might cause the
behaviour you describe simply because the deletes have not been committed.  If
we can determine that this is definitely the cause of your problem, then it
should be possible to teach ext3 to recognise this situation and to force an
early commit rather than returning ENOSPACE immediately.

Comment 33 Matt Domsch 2001-12-10 18:12:41 UTC

Per conversation with Clay, ext2 doesn't fail, just ext3.  Also, I've often 
seen the cpcmp scripts (which simply kick off all processes at once) running 
pretty synchronized, so that leads me to believe that Stephen's thoughts are 
indeed applicable.

Clay, please try mounting with "data=writeback" and see if this goes away.  
I've also asked to have the tests run with more small files 
(say, /usr/share/doc instead of a 650MB file) and see if anything changes.
-Matt

Comment 34 Stephen Tweedie 2001-12-10 18:27:20 UTC

"data=writeback" will not cure the problem.  All that "writeback" mode does is
to remove any synchronisation of data writes with transaction commits, so that
newly-written data may not be seen on disk after a crash, and you can therefore
find stale data blocks in recently-allocated files.

Even in writeback mode, ext3 still refuses to allow such data writes to
overwrite metadata which is still valid.

However, in "data=journal" mode, the problem _will_ probably disappear, simply
because the writing of data to the journal will force transaction commits much
more frequently (of course, you'll see much worse performance in that mode for
the sort of workload we're talking about here).

Comment 35 Clay Cooper 2001-12-10 21:57:07 UTC

Running the five-lun setup mounted with the data=journal option.  It is running
fine (if a little slow) and has made it past the point where I would expect an
error.  I'm letting it run over night and will update you in the morning whether
pass or fail.

Comment 36 Michael K. Johnson 2001-12-10 22:35:24 UTC

One other thing to try is to simply test with a larger partition -- try, say,
a 12GB instead of 8GB disk with the same cpcmp invocation.

Also, is your lun a single disk, or is it a switched raid volume that would
be faster than a single disk?

Thanks!

Comment 37 Clay Cooper 2001-12-11 14:28:44 UTC

Well, still running this morning.  Still hasn't finished due to the slower
performance of running with data=journal.  This is definitely past the point
where I would expect a failure, though.

Each lun is on a single 8GB disk.

I would need to reconfigure the enclosure to span across multiple disks to get a
 bigger lun, which I could do but haven't tried.

One thing that still puzzles me...what happened in the qlogic driver between
17.3 and 17.4 that made the problem disappear with only one stream running?

And due to ext3, will there always be some threshold where the uncommitted but
deleted bits will affect valid disk space size?  To what extent can and will
ext3 be modified to account for this?

Comment 38 Michael K. Johnson 2001-12-11 16:06:46 UTC

Use software RAID 0 to put two luns together to test on a larger FS.

17.3 and earlier:  The older driver was slower.  With enough physical
ram to cache several successive allocations without needing to go to
disk, and a big enough journal, it's possible that you could get enough
block deletes pending but not committed because of the slow driver.
Part of it might have been serialization due to more abuse of the
io_request_lock, I don't know.

And yes, quite fundamentally with a journaling file system, the two
phase commit will always mean that there will, for some window of time
after a file has been deleted, be blocks that are neither allocated for
existing files nor available for use for other files.  That's just one
of the fundamental tradeoffs of a journaling file system.  You are
trading off space for consistency.  The journal size is only part of
that tradeoff.

This is a degenerate case and so far Matt hasn't taken my challenge to
step up to the plate with a real-world case where this could be a problem.
In normal cases, this pressure doesn't show up very much.

Comment 39 Matt Domsch 2001-12-11 16:08:54 UTC

Lowering to Sev 2 here, "enhancement" in Bugzilla. Workarounds include:
1) data=journal (working so far)
2) increase disk space so you've got >=2X disk space
3) 'sync; sync; sync; sleep 10;' between massive deletes and refills to give 
the journal a fighting chance to run.
4) using a larger journal

Comment 40 Matt Domsch 2001-12-11 16:11:58 UTC

That is:
4) use a smaller journal

Note You need to log in before you can comment on or make changes to this bug.