Bug 1531457
Summary: | hard Link file A to B error if A is just created | ||
---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | George <george.lian> |
Component: | fuse | Assignee: | bugs <bugs> |
Status: | CLOSED UPSTREAM | QA Contact: | |
Severity: | urgent | Docs Contact: | |
Priority: | medium | ||
Version: | mainline | CC: | bugs, olim, pkarampu |
Target Milestone: | --- | Keywords: | Triaged |
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-03-13 06:10:55 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
George
2018-01-05 07:39:23 UTC
I found a bug fix in //////////////////////////////////////////////////// SHA-1: 4c4624c9bad2edf27128cb122c64f15d7d63bbc8 * cluster/afr: Don't let NFS cache stat after writes Problem: Afr does post-ops after write but the stat buffer it unwinds is at the time of write, so if nfs client caches this, it will see different ctime when it does stat on it after post-op is done. From NFS client's perspective it thinks the file is changed. Tar which depends on this to be correct keeps giving 'file changed as we read it' warning. If Afr instead has to choose to unwind after post-op, eager-lock, delayed-post-op will have to be disabled which will lead to bad performance for all write usecases. Fix: Don't let client cache stat after write. Change-Id: Ic6062acc6e5cdd97a9c83c56bd529ec83cee8a23 BUG: 1302948 Signed-off-by: Pranith Kumar K <pkarampu> /////////////////////////////////////////////////////////// from my understand, the link issue maybe introduced by the above fix. in this fix, it set the ia_nlink to in function gf_zero_fill_stat, and it leadto kernel treat it as the non-exist file with the code script I shared in issue report. @Pranith Kumar K, could you please have a check and give your comments? Hi, I tried recreating this on my laptop and on both master and 3.12 and I am not able to recreate the issue :-(. Here is the execution log: https://paste.fedoraproject.org/paste/-csXUKrwsbrZAVW1KzggQQ Since I was doing this on my laptop, I changed shutting down of the replica to killing the brick process to simulate this test. Let me know if I missed something. Pranith I suppose your has create 6 bricks, but only let three of 6 shutting down. the following is my test process output FYI. root@ubuntu:~# gluster peer probe ubuntu peer probe: success. Probe on localhost not needed root@ubuntu:~# gluster v create test replica 2 ubuntu:/home/gfs/b1 ubuntu:/home/gfs/b2 force volume create: test: success: please start the volume to access data root@ubuntu:~# gluster v start test volume start: test: success root@ubuntu:~# gluster v info test Volume Name: test Type: Replicate Volume ID: fef5fca3-81d9-46d3-8847-74cde6f701a5 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: ubuntu:/home/gfs/b1 Brick2: ubuntu:/home/gfs/b2 Options Reconfigured: transport.address-family: inet nfs.disable: on performance.client-io-threads: off root@ubuntu:~# gluster v status Status of volume: test Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick ubuntu:/home/gfs/b1 49152 0 Y 7798 Brick ubuntu:/home/gfs/b2 49153 0 Y 7818 Self-heal Daemon on localhost N/A N/A Y 7839 Task Status of Volume test ------------------------------------------------------------------------------ There are no active volume tasks root@ubuntu:~# gluster v set test cluster.consistent-metadata on volume set: success root@ubuntu:~# ls /mnt/test ls: cannot access '/mnt/test': No such file or directory root@ubuntu:~# mkdir -p /mnt/test root@ubuntu:~# mount -t glusterfs ubuntu:/test /mnt/test root@ubuntu:~# cd /mnt/test root@ubuntu:/mnt/test# echo "abc">aaa root@ubuntu:/mnt/test# cp aaa bbb;link bbb ccc root@ubuntu:/mnt/test# kill -9 7818 root@ubuntu:/mnt/test# cp aaa ddd;link ddd eee link: cannot create link 'eee' to 'ddd': No such file or directory Who can tell me while need set buf->ia_nlink to “0”in function gf_zero_fill_stat(), which API or Application will care it? If I remove this line and also update corresponding in function gf_is_zero_filled_stat, The issue seems gone, but I can’t confirm will lead to other issues. So could anyone please double check it and give your comments here? My change is as the below: gf_boolean_t gf_is_zero_filled_stat (struct iatt *buf) { if (!buf) return 1; /* Do not use st_dev because it is transformed to store the xlator id * in place of the device number. Do not use st_ino because by this time * we've already mapped the root ino to 1 so it is not guaranteed to be * 0. */ // if ((buf->ia_nlink == 0) && (buf->ia_ctime == 0)) if (buf->ia_ctime == ) return 1; return 0; } void gf_zero_fill_stat (struct iatt *buf) { // buf->ia_nlink = 0; buf->ia_ctime = 0; } Who can tell me while need set buf->ia_nlink to “0”in function gf_zero_fill_stat(), which API or Application will care it? If I remove this line and also update corresponding in function gf_is_zero_filled_stat, The issue seems gone, but I can’t confirm will lead to other issues. So could anyone please double check it and give your comments here? My change is as the below: gf_boolean_t gf_is_zero_filled_stat (struct iatt *buf) { if (!buf) return 1; /* Do not use st_dev because it is transformed to store the xlator id * in place of the device number. Do not use st_ino because by this time * we've already mapped the root ino to 1 so it is not guaranteed to be * 0. */ // if ((buf->ia_nlink == 0) && (buf->ia_ctime == 0)) if (buf->ia_ctime == 0 ) return 1; return 0; } void gf_zero_fill_stat (struct iatt *buf) { // buf->ia_nlink = 0; buf->ia_ctime = 0; } Best thing in this scenario is to send a patch to review.gluster.org, and see how the reviewers react! Also, you can run './run-tests.sh' to see if no other components are bothered! Release 3.12 has been EOLd and this bug was still found to be in the NEW state, hence moving the version to mainline, to triage the same and take appropriate actions. REVIEW: https://review.gluster.org/22891 (zero_fill_stat: use only ctime to determine the zero'd stat) posted (#1) for review on master by Amar Tumballi This bug is moved to https://github.com/gluster/glusterfs/issues/1100, and will be tracked there from now on. Visit GitHub issues URL for further details |