Red Hat Bugzilla – Bug 745576
libguestfs (or qemu?) hangs if sparse file runs out of disk space
Last modified: 2013-08-09 03:30:17 EDT
Description of problem:
On this machine, /tmp has 6.6G of space free.
If I create a larger sparse file, then fill it up using
libguestfs, instead of getting an error when I run out of
space I get a hang.
Here is how to reproduce this:
$ cd /tmp
$ df -h /tmp
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vg_pin-lv_root 45G 38G 6.6G 86% /
# 6.6G free, so create a sparse file bigger than this.
# In the example below, I'm using 10G disk:
$ rm -f test1.img
$ truncate -s 10G test1.img
$ guestfish -a test1.img -x <<EOF
part-disk /dev/sda mbr
mkfs ext2 /dev/sda1
mount-options "" /dev/sda1 /
upload /dev/zero /zero
Eventually this uses up all available space, but instead
of getting an error, it just hangs.
Version-Release number of selected component (if applicable):
This is not at all trivial to fix.
We can pass the -drive ...,werror=report option to qemu.
However this doesn't do anything useful.
ENOSPC errors on the host are passed up to the guest as
When writing, ext4 simply does not pass I/O errors up to
userspace. The write(2) and close(2) system calls return
OK as if nothing was happening, while the kernel message
log fills up with "Buffer I/O error on device vda" errors.
Adding the -o errors=panic mount option also does precisely
nothing. No panic, behaves same as above.
Just saw this one.
ext4 errors=XXX only handles metadata errors; data IO errors just look like i.e. a bad block or something, and there's no reason to abort the fs. (although I'm not sure why we don't hit some metadata errors in this case...; detected inconsistencies trip it, but now that I think of it, I'm not sure if metadata IO errors do, I need to look)
Anyway, as far as the inner fs is concerned we're just pushing data to the buffer cache, which succeeds just fine:
# dd if=/dev/zero of=file3 bs=1M count=10
10+0 records in
10+0 records out
10485760 bytes (10 MB) copied, 0.147922 s, 70.9 MB/s
If we tried direct IO it'd fail, though, because block allocation fails:
# dd if=/dev/zero of=file3 bs=1M count=10 oflag=direct
dd: writing `file3': Input/output error
As far as the inner fs is concerned, there's still space left in its 4G. Neither write nor close would be expected to return an error on a buffered write. fsync, OTOH, should and does return an error:
# xfs_io -f -c "pwrite 0 16m" -c "fsync" mytestfile
wrote 16777216/16777216 bytes at offset 0
16 MiB, 4096 ops; 0.0000 sec (63.332 MiB/sec and 16213.0496 ops/sec)
fsync: Input/output error
ext4 has a mount option to treat data errors more severely:
data_err=ignore(*) Just print an error message if an error occurs
in a file data buffer in ordered mode.
data_err=abort Abort the journal if an error occurs in a file
data buffer in ordered mode.
Anyway, handling ENOSPC errors on thinly provisioned storage is definitely something that still needs work...