Bug 589669 - Failed ext4 delayed allocation corrupts disk usage counters (statfs)
Summary: Failed ext4 delayed allocation corrupts disk usage counters (statfs)
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 12
Hardware: All
OS: Linux
low
medium
Target Milestone: ---
Assignee: Josef Bacik
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-05-06 16:12 UTC by Petr Pisar
Modified: 2010-11-04 15:40 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-11-04 15:40:02 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)

Description Petr Pisar 2010-05-06 16:12:20 UTC
Description of problem:

When write to sparse file on ext4 beyond file system capacity, kernel reports failed allocation correctly, however file system usage counters will report bad values.

Version-Release number of selected component (if applicable):

kernel-2.6.32.12-115.fc12.x86_64
strace-4.5.19-1.fc12.x86_64
coreutils-7.6-11.fc12.x86_64

How reproducible:

Create small (1 GB) block device, create ext4 on top, mount it:

# df -h /mnt/12TBlv
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/vg_dhcp0122-12TB
                     1008M   34M  924M   4% /mnt/12TBlv

Create big (12 TB) sparse file in the file system:

# dd if=/dev/zero of=12tb.image bs=1MB seek=$((12*1024*1024)) count=1
1+0 records in
1+0 records out
1000000 bytes (1.0 MB) copied, 0.0016588 s, 603 MB/s

# ls -l
total 996
-rw-r--r--. 1 root root 12582913000000 May  6 17:44 12tb.image
drwx------. 2 root root          16384 May  6 17:42 lost+found

# df -h /mnt/12TBlv/
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/vg_dhcp0122-12TB
                     1008M   35M  923M   4% /mnt/12TBlv

Create ext3 filesystem inside the big sparse file (12tb.image):

# mkfs.ext3 -N 1024 -F 12tb.image
mke2fs 1.41.9 (22-Aug-2009)   
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
1500016 inodes, 3072000244 blocks
153600012 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=4294967296
93751 block groups
32768 blocks per group, 32768 fragments per group
16 inodes per group
Superblock backups stored on blocks: 
        32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 
        4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968, 
        102400000, 214990848, 512000000, 550731776, 644972544, 1934917632, 
        2560000000

Writing inode tables: done
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information:
Warning, had trouble writing out superblocks.done

This filesystem will be automatically checked every 35 mounts or
180 days, whichever comes first.  Use tune2fs -c or -i to override.

As you can see the mkfs.ext3 command failed:

# echo $?
1

And of course kernel complains:

EXT4-fs (dm-6): delayed block allocation failed for inode 12 at logical offset 3
070328834 with max blocks 1 with error -28

This should not happen!!  Data will be lost
Total free blocks count 0
Free/Dirty block details
free_blocks=0
dirty_blocks=52
Block reservation details
i_reserved_data_blocks=51
i_reserved_meta_blocks=2

This ext4 cry repeats several times.

And see ext4 usage:

# df -h /mnt/12TBlv/
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/vg_dhcp0122-12TB
                     1008M  -64Z  -52M 100% /mnt/12TBlv

It seems like coreutils bug, thus check df' syscalls with strace:

lstat("/mnt/12TBlv", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
stat("/mnt/12TBlv", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
statfs("/mnt/12TBlv/", {f_type="EXT2_SUPER_MAGIC", f_bsize=4096, f_blocks=258022, f_bfree=18446744073709551564, f_bavail=18446744073709538457, f_files=65536, f_ffree=65524, f_fsid={-1284699015, -540083809}, f_namelen=255, f_frsize=4096}) = 0

As you can see statfs(2) exits with 0 and f_bfree is _much bigger_ then f_blocks.

If I umount the file exte4 file system and mount it again read-only, no error occurs and df reports good results:

# strace -estatfs df -h /mnt/12TBlv/
statfs("/mnt/12TBlv/", {f_type="EXT2_SUPER_MAGIC", f_bsize=4096, f_blocks=258022, f_bfree=0, f_bavail=0, f_files=65536, f_ffree=65524, f_fsid={-1284699015, -540083809}, f_namelen=255, f_frsize=4096}) = 0

If I umount the file system and do fsck, not file system corruption is reported:

# fsck.ext4 -f /dev/mapper/vg_dhcp0122-12TB 
e2fsck 1.41.9 (22-Aug-2009)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/mapper/vg_dhcp0122-12TB: 12/65536 files (0.0% non-contiguous), 262144/262144 blocks

Thus I guess there is a bug in ext4 file system kernel driver.

Comment 1 Eric Sandeen 2010-05-07 20:37:41 UTC
(In reply to comment #0)

...

> # df -h /mnt/12TBlv/
> Filesystem            Size  Used Avail Use% Mounted on
> /dev/mapper/vg_dhcp0122-12TB
>                      1008M  -64Z  -52M 100% /mnt/12TBlv
> 
> It seems like coreutils bug, thus check df' syscalls with strace:
> 
> lstat("/mnt/12TBlv", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
> stat("/mnt/12TBlv", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
> statfs("/mnt/12TBlv/", {f_type="EXT2_SUPER_MAGIC", f_bsize=4096,
> f_blocks=258022, f_bfree=18446744073709551564, f_bavail=18446744073709538457,
> f_files=65536, f_ffree=65524, f_fsid={-1284699015, -540083809}, f_namelen=255,
> f_frsize=4096}) = 0
> 
> As you can see statfs(2) exits with 0 and f_bfree is _much bigger_ then
> f_blocks.

b_bfree is 0xFFFFFFFFFFFFFFCC b_bavail is 0xFFFFFFFFFFFFCC99 ... they went negative.

So yes, looks like we didn't properly clean up some counters when we ran out of space.

-Eric

Comment 2 Eric Sandeen 2010-05-14 17:11:12 UTC
Dmitry's patch on linux-ext4 -might- fix this:

ext4: Do not dec quota for reserved blocks on error paths v2

(quota manipulation can be tangled up with block reservation even when quotas are off... just a guess)

-Eric

Comment 3 Bug Zapper 2010-11-03 15:29:21 UTC
This message is a reminder that Fedora 12 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 12.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '12'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 12's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 12 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 4 Petr Pisar 2010-11-04 15:40:02 UTC
Seems fixed in 2.6.34.7-61.fc13.x86_64 at Fedora 13, ext4 disk usage shows correct data now.


Note You need to log in before you can comment on or make changes to this bug.