Hide Forgot
Created attachment 565849 [details] Log of df during cp When copying directories/files on an ext4 filesystem, the amount of free space reported in "df" fluctuates wildly. I'll attach a log where I was running "df" every 1 second while doing a "cp -Rpd" of a user's home directory on the filesystem. The directory contained nothing special, just a mix of regular directories, files, and symlinks. As you can see, at the end, only after I run "sync" does "df" report proper information again. My system is RHEL5.8 with all post-5.8 updates as of this writing. Kernel is x86_64, 2.6.18-308.el5. Filesystem was created as follows: mke4fs -E stride=128,stripe-width=1024 -j /dev/md0 Then I mounted it with no special options, copied a user's directory into it, then tried to make a duplicate of that directory with "cp -Rpd" during the log you see attached. The problem does not occur if I mount the filesystem with the "nodelalloc" option.
Hi, this looks like a large block reservation caused by how delalloc feature estimates how many metadata blocks will be needed to actually store the new data. Which, after these blocks are really allocated, the file system clear the amount of reserved blocks leaving just those which were actually allocated. I'm trying to reproduce it here but with no success until now. I could do some bigger reservations but nothing out of the expected behavior, which in your case, the amount of reserved blocks were really huge. Can you reproduce it any time? Do you have any idea of how many files/blocks/symlinks/directories/subdirectories do you have on this directory you're trying to duplicate? The amount of reserved blocks increases mainly when there are very large/fragmented directories which may use more metadata blocks, so, I'm wondering if you have a very fragmented or very large directory. some outputs like `df -i`, `tune4fs -l <bdev>` may help to get a better idea what's going on. Also, is there any non-default settings on this system, in regards to Virtual memory (mainly changes in /proc/sys/vm)?
I moved my systems to RHEL6 so I can't easily reproduce the problem right now or give you the tune4fs info. At the time I put in the bug, it was fairly easy to reproduce and I reproduced it on 3 different systems with vastly different configurations. The ext4 filesystems were probably at least 6TB on all of them, though. We do have a few large directories, but it seemed to always happen when any copying was going on, even if it wasn't one of the large directories. There were no non-default settings on the system, at least one of the systems where I reproduced it was a completely fresh install of RHEL5 for testing purposes.
Also maybe worth mentioning: the same problem does not occur on RHEL6, using the exact same system, disk, and filesystem configuration.
Hi, after doing some code review and tests, I confirmed this is not a bug, but a common behaviour of the filesystem, mainly using delayed allocation feature as I explained before. This also happens on rhel6, I was able to see the same behaviour you noticed wether on rhel5 or rhel6. When you allocate a new inode, and is writing to it, the filesystem is free to reserve space to its own use and report it on statfs. What you are seeing here is exactly this situation. When allocating a file, it reserve (using delayed allocation) metadata blocks needed to the worst allocation case, like for example, enough blocks to keep metadata of a file in a very fragmented filesystem. This is done as a way to avoid data loss, since the filesystem won't be able to notify the user if it can't allocate a new block for a file due lack of space to allocate metadata. After the blocks are really allocated, and the file is completely written (like a fsync() call), the non-used reserved blocks are freed, causing statfs to show less blocks allocated than it had when the files are being allocated. This is exactly what happens when you're copying files to a filesystem.