Description of problem: ====================== when a non-zero file is pending with data heals and the source data brick is down , stat works on the file from mount point but displays wrong information wrt the size. Some applications which may be consuming these details can end up getting wrong info. Stat for a zero byte size file makes sense, but file with contents must either throw the correct metadata info if metadata heal can fix it or else, it must throw EIO In case of throwing EIO, i would also suggest to log the possible reason like saying data heal is pending(in the fuse mount and shd logs) Also, note that while we bring down both databirck1 and databrick2 (with only arbiter brick up), we get an EIO if we do a stat, the reason given is that we don't want to display the wrong info. Hence this bug is being raised as we are displaying spurious details Version-Release number of selected component (if applicable): =================== glusterfs 3.9dev built on Jul 11 2016 10:04:54 How reproducible: ================== always Steps to Reproduce: ==================== 1.create a 1x(2+1) replicate arbiter vol 2.now mount the vol by fuse 3.create a directory say dir1 4. Now bring down the first data brick 5. create a file sat f1 under dir1 with some contents 6.Now bring down the other data brick too 7. bring up the first data brick which was down 8. check heal info and trigger a manual heal 9. do an ls -l on mount, you can see the file f1 is shown 10. now do a stat of the file, it can be seen as below File: ‘f1’ Size: 0 Blocks: 0 IO Block: 131072 regular empty file Device: 2ah/42d Inode: 13754313043819253517 Links: 1 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Context: system_u:object_r:fusefs_t:s0 Access: 2016-07-15 17:17:29.467779000 +0530 Modify: 2016-07-15 17:17:29.467779000 +0530 Change: 2016-07-15 17:17:29.471878457 +0530 Birth: - Actual results: ================ the file shows as zero size Expected results: =================== show eio till data heal happens or collect and display the right file size Additional info:
After some code-reading and debugging, I found that this is not a bug specific to arbiter. In AFR we do have checks to fail afr_stat() or any read transaction when the only good copy is down. But the problem here is that the stat is not reaching AFR, it is getting served from the kernel cache as a part of the lookup response sent by AFR. For example, if we fuse mount the volume with attribute-timeout=0 and entry-timeout=0, we will get EIO for the reproducer given in the description because there is no kernel caching and afr_stat will be hit, which will fail the fop with EIO.
This bug is getting closed because GlusteFS-3.7 has reached its end-of-life. Note: This bug is being closed using a script. No verification has been performed to check if it still exists on newer releases of GlusterFS. If this bug still exists in newer GlusterFS releases, please reopen this bug against the newer release.