Description of problem: EC volume after replacing brick selfheal started and completed. but brick size differs from other bricks in same set. when comparing files between good brick and healed brick found few files size differ in healed disk. Version-Release number of selected component (if applicable): 3.10.1 File which is showing size difference after brick heal. Also, there is a difference in ls -l and du -h in healed brick =========================== File info from Healed brick =========================== du -h /media/disk11/brick11/file1 2.2G /media/disk11/brick11/file1 ls -lh /media/disk11/brick11/file1 -rw-r--r-- 2 root root 3.5G Nov 10 00:03 /media/disk11/brick11/file1 stat /media/disk11/brick11/file1 File: ‘/media/disk11/brick11/file1’ Size: 3661745152 Blocks: 4565608 IO Block: 4096 regular file Device: 8c1h/2241d Inode: 5931163503 Links: 2 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2016-11-09 23:58:07.083459000 +0530 Modify: 2016-11-10 00:03:15.955455000 +0530 Change: 2017-04-23 05:56:33.570068918 +0530 Birth: - getfattr -m. -e hex -d /media/disk11/brick11/file1 getfattr: Removing leading '/' from absolute path names # file: media/disk11/brick11/file1 trusted.bit-rot.signature=0x010500000000000000574ef2ff2bba2798a0451de3d9bca857380c1c36a8ca39fc7fd4e8c85dd4e559 trusted.bit-rot.version=0x050000000000000058ef4cad000c2af5 trusted.ec.config=0x0000080a02000200 trusted.ec.size=0x00000006d20e5937 trusted.ec.version=0x00000000000369080000000000036909 trusted.gfid=0xc1fadd2e84c34e5d825d6431cfb17e48 ========================== File info from good brick ========================== ls -lh /media/disk11/brick11/file1 -rw-r--r-- 2 root root 3.5G Nov 10 00:03 /media/disk11/brick11/file1 du -h /media/disk11/brick11/file1 3.5G /media/disk11/brick11/file1 getfattr -m. -e hex -d /media/disk11/brick11/file1 getfattr: Removing leading '/' from absolute path names # file: media/disk11/brick11/file1 trusted.bit-rot.signature=0x010500000000000000b87cccce67fe51c0c2c224459d3987fe6beb2d674264048bf508d793443a6837 trusted.bit-rot.version=0x050000000000000058e10e9d00056438 trusted.ec.config=0x0000080a02000200 trusted.ec.dirty=0x00000000000000000000000000000000 trusted.ec.size=0x00000006d20e5937 trusted.ec.version=0x00000000000369080000000000036909 trusted.gfid=0xc1fadd2e84c34e5d825d6431cfb17e48 How reproducible: First time seeing this behaviour in production environment. Listing out few points which i was doing during heal process. 1. during heal process reading file which is about to heal. 2. reading file from healing brick was slow. so, killed healing brick pid for user to download file. this was done twice in a days gap. 3. to speed up heal process tried running command "getfattr -h -n trusted.ec.heal 'filename' " but that also took time to heal file. so stopped 4. other than heal brick process. rebalance fix-layout and bitrot signer process were running in cluster. Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Hi Amudhan, While the heal is going on you can see difference in "du -h" and "ls -h" That is ok. Reason - When heal starts, it truncate that file on a brick to size 0. If IO is going on, it will start form a specific offset and start write and that will become the size (offset + leanght) ls -l will give this size while du -h will give you the actual block size written on disk. It is not showing the zeros created because of truncate.
Hi Ashish, heal is completed, but still its showing same.
I think you have also mentioned that you have killed some heal process. 2. reading file from healing brick was slow. so, killed healing brick pid for user to download file. this was done twice in a days gap. That is the reason healing was not completed. However, it should have been started once you have all the bricks UP again. I would suggest to make sure that all the bricks are UP and then start heal. - See if this file is mentioned in heal info or not. If yes, just run index heal and this will be healed. - If NO, run client side heal using getfattr - If in doubt and you are seeing that file is not being healed even when all the bricks are UP, try full heal. If possible perform above steps while IO's are not going on that file. If still you are not able to heal the files, please give us xattrs of the file from all the brick, vol info and glustershd and mount logs.
This bug reported is against a version of Gluster that is no longer maintained (or has been EOL'd). See https://www.gluster.org/release-schedule/ for the versions currently maintained. As a result this bug is being closed. If the bug persists on a maintained version of gluster or against the mainline gluster repository, request that it be reopened and the Version field be marked appropriately.