Bug 1724754
Summary: | fallocate of a file larger than brick size leads to increased brick usage despite failure | ||
---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | Raghavendra Bhat <rabhat> |
Component: | posix | Assignee: | bugs <bugs> |
Status: | CLOSED UPSTREAM | QA Contact: | |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | mainline | CC: | bugs |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-03-12 13:22:45 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Raghavendra Bhat
2019-06-27 17:35:24 UTC
fallocate -l <size> <file> command fails when the size mentioned is bigger than the brick size where fallocate is being directed to. But, it would have led to non-zero blocks usage by the file (used in fallocate command) even though fallocate fails. This leads to increased brick usage despite fallocate getting failed. Better would be to ensure that, the file used in fallocate is truncated if fallocate fails. Description of problem: ====================== when we use fallocate to create a file which is >= or to the max disk capacity, while we get a CLI error "fallocate: fallocate failed: No space left on device" However, the file gets created, and if you check the file size on the mount shows zero size, but if you check the volume space on the client (df -h) , it can be seen that the file is occupying significant space, that is because on the backend bricks the file is created upto size of about 90% of the disk size( may be because of storage reserve space) How reproducible: =================== always Steps to Reproduce: 1.create a 1x3 volume and fuse mount it 2. use fallocate to create a file which is >=size of of the brick 3. you would get error "fallocate: fallocate failed: No space left on device" Actual results: ============ however file is created and it shows as zero size file from mount point, but the file does occupy about 90% of the brick size backend and the same reflects in df -h of the mount point From client: [root@hostname2]# pwd /mnt/nfnas/falloc-test [root@hostname2]# df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/rhel_dhcp42--60-root 44G 1.6G 43G 4% / devtmpfs 3.9G 0 3.9G 0% /dev tmpfs 3.9G 0 3.9G 0% /dev/shm tmpfs 3.9G 8.5M 3.9G 1% /run tmpfs 3.9G 0 3.9G 0% /sys/fs/cgroup /dev/sda1 1014M 188M 827M 19% /boot tmpfs 783M 0 783M 0% /run/user/0 hostname1:nfnas 2.2T 453G 1.8T 21% /mnt/nfnas ====> NOTICE THE USED SIZE OF Storage space [root@hostname2]# fallocate test -l 600GB fallocate: fallocate failed: No space left on device [root@hostname2]# df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/rhel_dhcp42--60-root 44G 1.6G 43G 4% / devtmpfs 3.9G 0 3.9G 0% /dev tmpfs 3.9G 0 3.9G 0% /dev/shm tmpfs 3.9G 8.5M 3.9G 1% /run tmpfs 3.9G 0 3.9G 0% /sys/fs/cgroup /dev/sda1 1014M 188M 827M 19% /boot tmpfs 783M 0 783M 0% /run/user/0 hostname1:nfnas 2.2T 925G 1.3T 43% /mnt/nfnas ===>NOTICE THE INCREASE IN USED SPACE [root@hostname2]# ls test [root@dhcp42-60 falloc-test]# du -sh test 0 test [root@hostname2]# stat test File: ‘test’ Size: 0 Blocks: 0 IO Block: 131072 regular empty file Device: 26h/38d Inode: 13717993992350864287 Links: 1 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Context: system_u:object_r:fusefs_t:s0 Access: 2019-05-08 18:24:52.342403239 +0530 Modify: 2019-05-08 18:24:52.342403239 +0530 Change: 2019-05-08 18:24:52.342403239 +0530 Birth: - [root@hostname2]# from server: [root@hostname1]# ls /gluster/brick1 nfnas [root@hostname1]# ls /gluster/brick1 brick1/ brick10/ brick11/ [root@hostname1]# ls /gluster/brick1/nfnas/ falloc-test IOs logs [root@hostname1]# ls /gluster/brick1/nfnas/falloc-test/ test [root@hostname1]# ls /gluster/brick1/nfnas/falloc-test/test /gluster/brick1/nfnas/falloc-test/test [root@hostname1]# du -sh /gluster/brick1/nfnas/falloc-test/test 473G /gluster/brick1/nfnas/falloc-test/test [root@hostname1]# stat /gluster/brick1/nfnas/falloc-test/test File: ‘/gluster/brick1/nfnas/falloc-test/test’ Size: 0 Blocks: 990030216 IO Block: 4096 regular empty file Device: fd17h/64791d Inode: 1749722171 Links: 2 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Context: system_u:object_r:glusterd_brick_t:s0 Access: 2019-05-08 18:24:52.343774310 +0530 Modify: 2019-05-08 18:24:52.343774310 +0530 Change: 2019-05-08 18:24:52.366773892 +0530 Birth: - [root@hostname1]# df -h /gluster/brick1/ Filesystem Size Used Avail Use% Mounted on /dev/mapper/GLUSTER_vg1-GLUSTER_lv1 547G 541G 6.9G 99% /gluster/brick1 I think the problem is with du -sh (or stat) on the fallocated file saying zero usage on a glusterfs client. 1) Volume info 1x3 replicate volume Volume Name: mirror Type: Replicate Volume ID: 68535a1f-48c3-4e7b-86fc-ecc0143c2cfe Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: server1:/export1/tmp/mirror Brick2: server2:/export1/tmp/mirror Brick3: server3:/export1/tmp/mirror Options Reconfigured: transport.address-family: inet storage.fips-mode-rchecksum: on nfs.disable: on performance.client-io-threads: off 2) Bricks df -h Filesystem Size Used Avail Use% Mounted on devtmpfs 7.8G 0 7.8G 0% /dev tmpfs 7.8G 0 7.8G 0% /dev/shm tmpfs 7.8G 9.1M 7.8G 1% /run tmpfs 7.8G 0 7.8G 0% /sys/fs/cgroup /dev/mapper/server1-root 50G 22G 29G 44% / /dev/sda1 1014M 149M 866M 15% /boot /dev/mapper/server1-root 500G 33M 500G 1% /home tmpfs 1.6G 0 1.6G 0% /run/user/0 /dev/mapper/group-thin_vol 9.0G 34M 9.0G 1% /export1/tmp =======> Used as brick for the volume /dev/mapper/new-thin_vol 9.0G 33M 9.0G 1% /export2/tmp i.e. /export1/tmp is used as brick in all the 3 nodes (same size as seen in above df command) 3) mounted the client df -h Filesystem Size Used Avail Use% Mounted on devtmpfs 7.8G 0 7.8G 0% /dev tmpfs 7.8G 0 7.8G 0% /dev/shm tmpfs 7.8G 9.1M 7.8G 1% /run tmpfs 7.8G 0 7.8G 0% /sys/fs/cgroup /dev/mapper/server3-root 50G 22G 29G 44% / /dev/mapper/server3-root 1.8T 33M 1.8T 1% /home /dev/sda1 1014M 157M 858M 16% /boot tmpfs 1.6G 0 1.6G 0% /run/user/0 /dev/mapper/group-thin_vol 9.0G 34M 9.0G 1% /export1/tmp /dev/mapper/new-thin_vol 9.0G 33M 9.0G 1% /export2/tmp dell-per320-12.gsslab.rdu2.redhat.com:/mirror 9.0G 126M 8.9G 2% /mnt/glusterfs ======> freshly mounted client 4) Ran the TEST [root@server3 glusterfs]# fallocate -l 22GB repro fallocate: fallocate failed: No space left on device [root@server3 glusterfs]# du -sh repro 0 repro ============================================================================> du -sh says 0 file size [root@server3 glusterfs]# stat repro File: ‘repro’ Size: 0 Blocks: 0 IO Block: 131072 regular empty file =====> stat showing 0 size and 0 blocks Device: 28h/40d Inode: 12956667450403493410 Links: 1 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Context: system_u:object_r:fusefs_t:s0 Access: 2019-06-19 15:29:25.712546158 -0400 Modify: 2019-06-19 15:29:25.712546158 -0400 Change: 2019-06-19 15:29:25.712546158 -0400 Birth: - [root@server3 glusterfs]# df -h Filesystem Size Used Avail Use% Mounted on devtmpfs 7.8G 0 7.8G 0% /dev tmpfs 7.8G 0 7.8G 0% /dev/shm tmpfs 7.8G 9.1M 7.8G 1% /run tmpfs 7.8G 0 7.8G 0% /sys/fs/cgroup /dev/mapper/server3-root 50G 22G 29G 44% / /dev/mapper/server3-home 1.8T 33M 1.8T 1% /home /dev/sda1 1014M 157M 858M 16% /boot tmpfs 1.6G 0 1.6G 0% /run/user/0 /dev/mapper/group-thin_vol 9.0G 1.2G 7.9G 13% /export1/tmp /dev/mapper/new-thin_vol 9.0G 33M 9.0G 1% /export2/tmp server1:/mirror 9.0G 1.3G 7.8G 14% /mnt/glusterfs =========> Increased consumption 5) Ran a similar test on a XFS filesystem (i.e. no glusterfs, only xfs filesystem) df -h Filesystem Size Used Avail Use% Mounted on devtmpfs 7.8G 0 7.8G 0% /dev tmpfs 7.8G 0 7.8G 0% /dev/shm tmpfs 7.8G 9.1M 7.8G 1% /run tmpfs 7.8G 0 7.8G 0% /sys/fs/cgroup /dev/mapper/server1-root 50G 22G 29G 44% / /dev/sda1 1014M 149M 866M 15% /boot /dev/mapper/server1-home 500G 33M 500G 1% /home tmpfs 1.6G 0 1.6G 0% /run/user/0 /dev/mapper/group-thin_vol 9.0G 1.2G 7.9G 13% /export1/tmp /dev/mapper/new-thin_vol 9.0G 33M 9.0G 1% /export2/tmp ===========> a separate XFS filesytstem used in this test. [root@server1 dir]# pwd /export2/tmp/dir [root@server1 dir]# fallocate -l 22GB repro fallocate: fallocate failed: No space left on device [root@server1 dir]# du -sh repro 1.2G repro df -h Filesystem Size Used Avail Use% Mounted on devtmpfs 7.8G 0 7.8G 0% /dev tmpfs 7.8G 0 7.8G 0% /dev/shm tmpfs 7.8G 9.1M 7.8G 1% /run tmpfs 7.8G 0 7.8G 0% /sys/fs/cgroup /dev/mapper/server1-root 50G 22G 29G 44% / /dev/sda1 1014M 149M 866M 15% /boot /dev/mapper/server1-home 500G 33M 500G 1% /home tmpfs 1.6G 0 1.6G 0% /run/user/0 /dev/mapper/group-thin_vol 9.0G 1.2G 7.9G 13% /export1/tmp /dev/mapper/new-thin_vol 9.0G 1.2G 7.9G 13% /export2/tmp ==================> Increased usage after the fallocate test stat repro File: ‘repro’ Size: 0 Blocks: 2359088 IO Block: 4096 regular empty file ======> zero size but non-zero blocks Device: fd0ch/64780d Inode: 260 Links: 1 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Context: unconfined_u:object_r:unlabeled_t:s0 Access: 2019-06-19 16:15:57.072885431 -0400 Modify: 2019-06-19 16:15:57.072885431 -0400 Change: 2019-06-19 16:15:57.072885431 -0400 Birth: - CONCLUSION: ============ * So from the above tests, the xfs filesystem having a non-zero file is not the problem IIUC. GlusterFS reporting du -sh <file> and the number of blocks as zero in the stat output is the problem. * What happens is, as part of the operation (stat, du etc commands send a stat () system call), the backend disk receives the request, does the on disk stat () system call and gives the response back to gluster brick process. THis is the stat response received just after gluster brick does on disk stat () operation (got from gdb attachment) p lstatbuf $31 = {st_dev = 64775, st_ino = 260, st_nlink = 2, st_mode = 33188, st_uid = 0, st_gid = 0, __pad0 = 0, st_rdev = 0, st_size = 0, st_blksize = 4096, st_blocks = 2358824, st_atim = {tv_sec = 1560972565, tv_nsec = 713592657}, st_mtim = {tv_sec = 1560972565, tv_nsec = 713592657}, st_ctim = {tv_sec = 1560972565, tv_nsec = 716592631}, __unused = {0, 0, 0}} NOTE the non zero st_blocks received just after the response is received * Gluster brick process now tries to converts the 'struct stat' structure (where the stat information is present) to its own internal 'struct iatt' structure and calls iatt_from_stat () function * And in iatt_from_stat function, we handle the number of blocks information differently for sparse files iatt->ia_size = stat->st_size; iatt->ia_blksize = stat->st_blksize; iatt->ia_blocks = stat->st_blocks; /* There is a possibility that the backend FS (like XFS) can allocate blocks beyond EOF for better performance reasons, which results in 'st_blocks' with higher values than what is consumed by the file descriptor. This would break few logic inside GlusterFS, like quota behavior etc, thus we need the exact number of blocks which are consumed by the file to the higher layers inside GlusterFS. Currently, this logic won't work for sparse files (ie, file with holes) */ { uint64_t maxblocks; maxblocks = (iatt->ia_size + 511) / 512; if (iatt->ia_blocks > maxblocks) iatt->ia_blocks = maxblocks; } For the fallocated file, stat->st_size (hence iatt->ia_size) will be zero. So, we change the number of blocks (which in this case becomes zero). * The same number of blocks information is used by du command to construct the file size Like mentioned in the 1st comment of this bug, one way to handle this would be to ensure that in posix, if fallocate fails, it truncates the file to its last known size. REVIEW: https://review.gluster.org/22969 (storage/posix: truncate the file to zero if fallocate fails) posted (#1) for review on master by Raghavendra Bhat This bug is moved to https://github.com/gluster/glusterfs/issues/1003, and will be tracked there from now on. Visit GitHub issues URL for further details |