Bug 1531457

Summary: hard Link file A to B error if A is just created
Product: [Community] GlusterFS Reporter: George <george.lian>
Component: fuseAssignee: bugs <bugs>
Status: POST --- QA Contact:
Severity: urgent Docs Contact:
Priority: medium    
Version: mainlineCC: bugs, olim, pkarampu
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description George 2018-01-05 07:39:23 UTC
Description of problem:


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.create a volume name "test" with replicated 
2.set volume option cluster.consistent-metadata with on:  
  gluster v set test cluster.consistent-metadata on
3. mount volume test on client on /mnt/test
4. create a file aaa size >1
   echo "1234567890" >/mnt/test/aaa
5. shutdown a replicat node, let's say sn-1, only let sn-0 worked
6. cp /mnt/test/aaa /mnt/test/bbb; link /mnt/test/bbb /mnt/test/ccc

Actual results:
# cp /mnt/test/aaa /mnt/test/bbb;link /mnt/test/bbb /mnt/test/ccc
link: cannot create link '/mnt/test/ccc' to '/mnt/test/bbb': No such file or directory
[root@mn-0:/mnt/test]
# ls -l /mnt/test/bbb
-rw-r--r-- 1 root root 25 Jan  5 09:34 /mnt/test/bbb
[root@mn-0:/mnt/test]
# ls -l /mnt/test/ccc
ls: cannot access '/mnt/test/ccc': No such file or directory



Expected results:
should be no error for link


Additional info:
after investigate on kernel with ftrace, it should be error in kernel in fuction vfs_link :
	if (inode->i_nlink == 0 && !(inode->i_state & I_LINKABLE))
		error =  -ENOENT;

Comment 1 George 2018-01-10 05:26:39 UTC
I found a bug fix in 
////////////////////////////////////////////////////
SHA-1: 4c4624c9bad2edf27128cb122c64f15d7d63bbc8

* cluster/afr: Don't let NFS cache stat after writes

Problem:
Afr does post-ops after write but the stat buffer it unwinds is at the
time of write, so if nfs client caches this, it will see different
ctime when it does stat on it after post-op is done. From NFS client's
perspective it thinks the file is changed. Tar which depends on this
to be correct keeps giving 'file changed as we read it' warning.
If Afr instead has to choose to unwind after post-op, eager-lock,
delayed-post-op will have to be disabled which will lead to bad
performance for all write usecases.

Fix:
Don't let client cache stat after write.

Change-Id: Ic6062acc6e5cdd97a9c83c56bd529ec83cee8a23
BUG: 1302948
Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
///////////////////////////////////////////////////////////

from my understand, the link issue maybe introduced by the above fix.
in this fix, it set the ia_nlink to in function gf_zero_fill_stat, and it leadto kernel treat it as the non-exist file with the code script I shared in issue report.
@Pranith Kumar K, could you please have a check and give your comments?

Comment 2 Pranith Kumar K 2018-01-17 11:27:07 UTC
Hi,
       I tried recreating this on my laptop and on both master and 3.12 and I am not able to recreate the issue :-(.
Here is the execution log: https://paste.fedoraproject.org/paste/-csXUKrwsbrZAVW1KzggQQ
Since I was doing this on my laptop, I changed shutting down of the replica to killing the brick process to simulate this test.
Let me know if I missed something.

Pranith

Comment 3 George 2018-01-18 01:19:36 UTC
I suppose your has create 6 bricks, but only let three of 6 shutting down.
the following is my test process output FYI.

root@ubuntu:~# gluster peer probe ubuntu
peer probe: success. Probe on localhost not needed
root@ubuntu:~# gluster v create test replica 2 ubuntu:/home/gfs/b1 ubuntu:/home/gfs/b2 force
volume create: test: success: please start the volume to access data
root@ubuntu:~# gluster v start test
volume start: test: success
root@ubuntu:~# gluster v info test

Volume Name: test
Type: Replicate
Volume ID: fef5fca3-81d9-46d3-8847-74cde6f701a5
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: ubuntu:/home/gfs/b1
Brick2: ubuntu:/home/gfs/b2
Options Reconfigured:
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off
root@ubuntu:~# gluster v status
Status of volume: test
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick ubuntu:/home/gfs/b1                   49152     0          Y       7798
Brick ubuntu:/home/gfs/b2                   49153     0          Y       7818
Self-heal Daemon on localhost               N/A       N/A        Y       7839

Task Status of Volume test
------------------------------------------------------------------------------
There are no active volume tasks


root@ubuntu:~# gluster v set test cluster.consistent-metadata on
volume set: success

root@ubuntu:~# ls /mnt/test
ls: cannot access '/mnt/test': No such file or directory
root@ubuntu:~# mkdir -p /mnt/test
root@ubuntu:~# mount -t glusterfs ubuntu:/test /mnt/test

root@ubuntu:~# cd /mnt/test
root@ubuntu:/mnt/test# echo "abc">aaa
root@ubuntu:/mnt/test# cp aaa bbb;link bbb ccc

root@ubuntu:/mnt/test# kill -9 7818
root@ubuntu:/mnt/test# cp aaa ddd;link ddd eee
link: cannot create link 'eee' to 'ddd': No such file or directory

Comment 4 George 2018-01-24 07:34:35 UTC
Who can tell me while need set buf->ia_nlink to “0”in function gf_zero_fill_stat(), which API or Application will care it?
If I remove this line and also update corresponding in function gf_is_zero_filled_stat, 
The issue seems gone, but I can’t confirm will lead to other issues.

So could anyone please double check it and give your comments here?

My change is as the below:

gf_boolean_t
gf_is_zero_filled_stat (struct iatt *buf)
{
        if (!buf)
                return 1;

        /* Do not use st_dev because it is transformed to store the xlator id
         * in place of the device number. Do not use st_ino because by this time
         * we've already mapped the root ino to 1 so it is not guaranteed to be
         * 0.
         */
//        if ((buf->ia_nlink == 0) && (buf->ia_ctime == 0))
        if (buf->ia_ctime == )
                return 1;

        return 0;
}

void
gf_zero_fill_stat (struct iatt *buf)
{
//       buf->ia_nlink = 0;
        buf->ia_ctime = 0;
}

Comment 5 George 2018-01-24 07:35:42 UTC
Who can tell me while need set buf->ia_nlink to “0”in function gf_zero_fill_stat(), which API or Application will care it?
If I remove this line and also update corresponding in function gf_is_zero_filled_stat, 
The issue seems gone, but I can’t confirm will lead to other issues.

So could anyone please double check it and give your comments here?

My change is as the below:

gf_boolean_t
gf_is_zero_filled_stat (struct iatt *buf)
{
        if (!buf)
                return 1;

        /* Do not use st_dev because it is transformed to store the xlator id
         * in place of the device number. Do not use st_ino because by this time
         * we've already mapped the root ino to 1 so it is not guaranteed to be
         * 0.
         */
//        if ((buf->ia_nlink == 0) && (buf->ia_ctime == 0))
        if (buf->ia_ctime == 0 )
                return 1;

        return 0;
}

void
gf_zero_fill_stat (struct iatt *buf)
{
//       buf->ia_nlink = 0;
        buf->ia_ctime = 0;
}

Comment 6 Amar Tumballi 2018-10-11 10:57:14 UTC
Best thing in this scenario is to send a patch to review.gluster.org, and see how the reviewers react!

Also, you can run './run-tests.sh' to see if no other components are bothered!

Comment 7 Shyamsundar 2018-10-23 14:55:25 UTC
Release 3.12 has been EOLd and this bug was still found to be in the NEW state, hence moving the version to mainline, to triage the same and take appropriate actions.

Comment 8 Worker Ant 2019-06-18 11:08:20 UTC
REVIEW: https://review.gluster.org/22891 (zero_fill_stat: use only ctime to determine the zero'd stat) posted (#1) for review on master by Amar Tumballi