Bug 737069

Summary:

livecd compose: /sbin/resize2fs: Attempt to write block from filesystem resulted in short write

Product:

[Fedora] Fedora

Reporter:

Mads Kiilerich <mads>

Component:

e2fsprogs

Assignee:

Eric Sandeen <esandeen>

Status:

CLOSED NOTABUG

QA Contact:

Fedora Extras Quality Assurance <extras-qa>

Severity:

unspecified

Docs Contact:

Priority:

unspecified

Version:

CC:

esandeen, josef, kzak, oliver

Target Milestone:

---

Target Release:

---

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2011-09-09 21:13:49 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
patchlet	none

Description Mads Kiilerich 2011-09-09 13:58:50 UTC

livecd composition frequently fails for me:

...
Checking filesystem /var/tmp/imgcreate-neqeMN/tmp-if7W12/ext3fs.img
e2fsck 1.41.14 (22-Dec-2010)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
_gon_os: 41357/262144 files (0.1% non-contiguous), 330712/1048576 blocks

e2image 1.41.14 (22-Dec-2010)

resize2fs 1.41.14 (22-Dec-2010)
/sbin/resize2fs: Attempt to write block from filesystem resulted in short write while trying to resize /var/tmp/imgcreate-neqeMN/tmp-if7W12/ext3fs.img
Please run 'e2fsck -fy /var/tmp/imgcreate-neqeMN/tmp-if7W12/ext3fs.img' to fix the filesystem
after the aborted resize operation.
Resizing the filesystem on /var/tmp/imgcreate-neqeMN/tmp-if7W12/ext3fs.img to 325479 (4k) blocks.

Error creating Live CD : resize2fs returned an error (1)!  image to debug at /tmp/resize-image-RAoLfi


There is no problem resizing /tmp/resize-image "manually".


I know this is a vague report ... but it is a problem I do see. Any idea what can cause this?


e2fsprogs-1.41.14-2.fc15.i686
livecd-tools-16.5-1.fc16.i686

Comment 1 Mads Kiilerich 2011-09-09 14:11:43 UTC

It seems strange to me that the image contains holes that resize2fs doesn't eliminate:

The image left in /tmp was 4G (but with holes).

# resize2fs -M /tmp/resize-image-RAoLfi
resize2fs 1.41.14 (22-Dec-2010)
Resizing the filesystem on /tmp/resize-image-RAoLfi to 325479 (4k) blocks.
The filesystem on /tmp/resize-image-RAoLfi is now 325479 blocks long.

# ll /tmp/resize-image-RAoLfi
-rw-------.  1 root root 1333161984 Sep  9 15:42 resize-image-RAoLfi

(which is 325000 blocks)

But:

# du -ms resize-image-RAoLfi 
753	resize-image-RAoLfi

# mount resize-image-RAoLfi m
# du -ms m
1092	m

Comment 2 Mads Kiilerich 2011-09-09 14:58:35 UTC

Looking at /tmp/resize-image- was apparently a red herring. That file is created by livecd-tools with e2image before every resizing. That takes quite a bit of time and disk space (even though I don't understand why) and is only useful for debugging some resize2fs bugs and then immediately removed again. The disk space concumption was thus temporarily considerably higher than I expected.

Is it possible that this 'short write' was caused by running out of disk space? Can it be caused by anything else? Would it make sense to improve the error message so it communicated that more clearly?

Comment 3 Eric Sandeen 2011-09-09 15:18:31 UTC

(In reply to comment #1)
> It seems strange to me that the image contains holes that resize2fs doesn't
> eliminate:

That's expected, e2image only contains metadata, and no data blocks.

> Is it possible that this 'short write' was caused by running out of disk space?

Yes, I think that's possible.

All resize2fs knows is that it tried to write and failed ... and shrinking an -image- file is probably not the original expected use case, so "ENOSPC" isn't really an expected error.  I suppose it would be good to print the error when a "short write" is encountered, though.

Can you try freeing up more space, and try again?

The whole e2image dance is left over from when resize2fs was reliably corrupting filesystems.  Livecd-creator was a great stress test.  :)

Comment 4 Mads Kiilerich 2011-09-09 15:41:27 UTC

(In reply to comment #3)
> That's expected, e2image only contains metadata, and no data blocks.

That is why I wonder why du reports that it takes up so much space. But ok - that might be because the original 4G image really was sparse and very small, but my resize -M made it copy the sparse sections around so they ended up no longer being sparse. I'm satisfied with that explanation ... if it sounds reasonable ;-)

> > Is it possible that this 'short write' was caused by running out of disk space?
> 
> Yes, I think that's possible.
> 
> All resize2fs knows is that it tried to write and failed ... and shrinking an
> -image- file is probably not the original expected use case, so "ENOSPC" isn't
> really an expected error.  I suppose it would be good to print the error when a
> "short write" is encountered, though.

Thanks!

> Can you try freeing up more space, and try again?

Yes, once I realized how much more space it needed it works fine.

> The whole e2image dance is left over from when resize2fs was reliably
> corrupting filesystems.  Livecd-creator was a great stress test.  :)

Have you seen such issues recently? Is it in your opinion worth it to keep making this copy every time?

Comment 5 Eric Sandeen 2011-09-09 16:08:02 UTC

> Have you seen such issues recently? Is it in your opinion worth it to keep
> making this copy every time?

It's a case of "if you don't have it and something goes wrong, you're out of luck."

But no, I haven't seen issues for a while.

I'd have no problem with it being turned into an option; maybe official builds could turn it on, but have it off by default for mere mortal users?

Comment 6 Eric Sandeen 2011-09-09 21:13:49 UTC

I don't think there's a whole lot to do here within the framework of e2fsprogs error reporting, I'm afraid.

The write() call actually did get a short write, it wrote part of what was requested and returned that many bytes... no error was generated; write is allowed to write less than requested.

So I think I have to close this NOTABUG ... you ran out of space and resize2fs did the right thing, however cryptic it may have been in telling you....

-Eric

Comment 7 Mads Kiilerich 2011-09-09 23:43:44 UTC

Created attachment 522448 [details]
patchlet

(In reply to comment #6)
> The write() call actually did get a short write, it wrote part of what was
> requested and returned that many bytes... no error was generated; write is
> allowed to write less than requested.

Ok. In that case I would expect it to retry the write and either succeed or get a real error - for example something like the attached draft naive patch. But I must assume there is a reason you don't do that.

Comment 8 Mads Kiilerich 2011-09-10 10:38:45 UTC

By the way: It seems strange that the disk space usage temporarily increases while resizing. Couldn't/shouldn't it mark the moved and unused sections as sparse on-the-fly? Ok, resizing of file images is not the primary use case for resize2fs, but I assume SSDs and SANs could benefit a bit from something similar.

Comment 9 Eric Sandeen 2011-09-12 15:31:31 UTC

Ok, fair enough, you are right - it probably should retry until it gets an error.

I have to admit that this is somewhat low on my list of things to do - would you like to send the patch upstream & start a discussion about this issue?

As far as why the disk space is temporarily increased, I'd need to look more closely at resize2fs.  I'm sure there is some temporary increase simply due to transactional requirements.  It could possibly be optimized to lop off the end of the filesystem incrementally, as it moves blocks (if it doesn't already).  But I'm afraid that is even lower on the list of things to do ;)

I'm sorry to be kind of pawning this off, but it is a bit of a corner case, and I'm more concerned with things such as making ext4 work above 16T, right now.

If you want to send your patch to the list, though, I'll be happy to review it and chime in on the thread.  Note that I think there is at least one place where "actual == -1" isn't tested, and simply "written != length" - so it can't tell the difference between a short write, and an error.

Thanks,
-Eric