745576 – libguestfs (or qemu?) hangs if sparse file runs out of disk space

Bug 745576 - libguestfs (or qemu?) hangs if sparse file runs out of disk space

Summary: libguestfs (or qemu?) hangs if sparse file runs out of disk space

Keywords:
Status:	NEW
Alias:	None
Product:	Virtualization Tools
Classification:	Community
Component:	libguestfs
Sub Component:
Version:	unspecified
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	Richard W.M. Jones
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2011-10-12 18:32 UTC by Richard W.M. Jones
Modified:	2021-04-19 10:35 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Richard W.M. Jones 2011-10-12 18:32:07 UTC

Description of problem:

On this machine, /tmp has 6.6G of space free.

If I create a larger sparse file, then fill it up using
libguestfs, instead of getting an error when I run out of
space I get a hang.

Here is how to reproduce this:

$ cd /tmp
$ df -h /tmp
Filesystem                  Size  Used Avail Use% Mounted on
/dev/mapper/vg_pin-lv_root   45G   38G  6.6G  86% /
                                        ^^^^
  # 6.6G free, so create a sparse file bigger than this.
  # In the example below, I'm using 10G disk:

$ rm -f test1.img
$ truncate -s 10G test1.img
$ guestfish -a test1.img -x <<EOF
run
part-disk /dev/sda mbr
mkfs ext2 /dev/sda1
mount-options "" /dev/sda1 /
upload /dev/zero /zero
EOF

Eventually this uses up all available space, but instead
of getting an error, it just hangs.

Version-Release number of selected component (if applicable):

1.13.20

How reproducible:

100%

Comment 1 Richard W.M. Jones 2011-10-21 14:21:17 UTC

This is not at all trivial to fix.

We can pass the -drive ...,werror=report option to qemu.
However this doesn't do anything useful.

ENOSPC errors on the host are passed up to the guest as
I/O errors.

When writing, ext4 simply does not pass I/O errors up to
userspace.  The write(2) and close(2) system calls return
OK as if nothing was happening, while the kernel message
log fills up with "Buffer I/O error on device vda" errors.

Adding the -o errors=panic mount option also does precisely
nothing.  No panic, behaves same as above.

Comment 2 Eric Sandeen 2013-07-26 16:13:05 UTC

Just saw this one.

ext4 errors=XXX only handles metadata errors; data IO errors just look like i.e. a bad block or something, and there's no reason to abort the fs.  (although I'm not sure why we don't hit some metadata errors in this case...; detected inconsistencies trip it, but now that I think of it, I'm not sure if metadata IO errors do, I need to look)

Anyway, as far as the inner fs is concerned we're just pushing data to the buffer cache, which succeeds just fine:

# dd if=/dev/zero of=file3 bs=1M count=10
10+0 records in
10+0 records out
10485760 bytes (10 MB) copied, 0.147922 s, 70.9 MB/s

If we tried direct IO it'd fail, though, because block allocation fails:

# dd if=/dev/zero of=file3 bs=1M count=10 oflag=direct
dd: writing `file3': Input/output error

As far as the inner fs is concerned, there's still space left in its 4G.  Neither write nor close would be expected to return an error on a buffered write.  fsync, OTOH, should and does return an error:

# xfs_io -f -c "pwrite 0 16m" -c "fsync" mytestfile
wrote 16777216/16777216 bytes at offset 0
16 MiB, 4096 ops; 0.0000 sec (63.332 MiB/sec and 16213.0496 ops/sec)
fsync: Input/output error


ext4 has a mount option to treat data errors more severely:

data_err=ignore(*)      Just print an error message if an error occurs
                        in a file data buffer in ordered mode.
data_err=abort          Abort the journal if an error occurs in a file
                        data buffer in ordered mode.


Anyway, handling ENOSPC errors on thinly provisioned storage is definitely something that still needs work...

Note You need to log in before you can comment on or make changes to this bug.