Pasting the discussion we had on mail below: > Me, Aravinda and Kotresh had a discussion on this. > > RCA: non-linked inodes being used in dht-selfheal codepath. Issue got > uncovered in geo-rep setup because of http://review.gluster.org/12539, as we > now heal in discover codepath too. > > Proposed solution: Link inode in dht-selfheal and use the linked inode for > all selfheal functionality. One question I ran into with this solution is > about lookup not being complete for xlators between dht and xlator managing > itable (fuse-bridge here). So, if there is a non-lookup fop it might use an > inode which might be invalid for these xlators. However, there are two > aspects to this question, when dht is trying to link inode object - inode1 - > before it attempts healing. > > 1. A valid inode is already linked with gfid in itable by the time dht does > inode_link. In this case dht gets this inode (instead of the one it got > during lookup) and somebody else has completed the lookup already, the above > said problem wont be there. > 2. There is no valid inode linked with gfid in itable. In this case, it might > so happen that a racing lookup (with a different inode object - say inode2) > might complete and post _that_ there can be non-lookup fops on inode1 on > which lookup is still in progress. So, the inode1 is still invalid for > xlators b/w dht and fuse-bridge. We can solve this problem easily if dht > always unwinds lookup call with "linked_inode" instead of the inode object > passed down as part of request. So, the racing lookup here which got inode2 > as argument in request path will unwind with inode1. This means dht should > do inode_find for the associated gfid if healing is not required and unwind > lookup with result of inode_find if it finds a valid inode object. Note that > this algo works if all translators use inode passed in the lookup_cbk, > instead of using the one stored in their locals. This can be enforced for > atleast xlators b/w dht and fuse-bridge. This is the "ugly" aspect of this design - introducing restrictions on how other xlators should behave. So, I am suggesting a variant of the "clean" solution proposed below. > > To summarize the algorithm: > 1. If dht decides a directory needs self-heal, > a. it links the inode. > b. use the return value of inode_link for healing. > 2. When no healing is required, dht does an inode_find (gfid, itable) and > unwind with the result (instead of inode passed to it as argument). > > I am yet to think through whether this algo has any caveats with NFS. This > bug has a fuse-mount. > > There is one "clean" solution of separating lookup in to two fops - one does > plain lookup and the other does healing if required. But, that's a huge > change and going to take time. A variant of this solution can be implemented without requiring too much of code-change. 1. Introduce methods in itable that can be used by xlators to guide inode/dentry management. In this case, lets have a method - say itable_set_need_lookup. These methods _should_ be implemented by xlators doing resolution/itable-management (like fuse-bridge, gfapi, protocol/server, nfs etc). Note that the xlator doing itable management and the xlator associated with itable (table->xl) need not be same. So, this helps for other xlators like dht to have a say in how inodes are managed. 2. dht, when it decides that a heal has to be done a. links the inode into itable b. instructs the resolver code that a "resolution" (essentially a lookup in this case, as we have not broken lookup yet) has to be done before resuming any non-lookup fops on this inode. 3. The xlator managing inodes (fuse-bridge in this specific case) sends a lookup on the inode if need lookup is set. This behavior is very much similar to fuse_readdirp_cbk setting need-lookup and related behavior. There are some implementation details - like dht treating revalidates differently from fresh lookup etc. But, these can be worked out while coding.
REVIEW: http://review.gluster.org/14295 (cluster/distribute: use a linked inode in directory heal codepath) posted (#1) for review on master by Raghavendra G (rgowdapp)
REVIEW: http://review.gluster.org/14295 (cluster/distribute: use a linked inode in directory heal codepath) posted (#2) for review on master by Raghavendra G (rgowdapp)
REVIEW: http://review.gluster.org/14319 (cluster/distribute: heal layout in discover codepath too) posted (#3) for review on master by Raghavendra G (rgowdapp)
COMMIT: http://review.gluster.org/14295 committed in master by Raghavendra G (rgowdapp) ------ commit 06f92634d9ad8aa5c56d786e5248016c283e5c5b Author: Raghavendra G <rgowdapp> Date: Wed May 11 17:49:10 2016 +0530 cluster/distribute: use a linked inode in directory heal codepath This is needed for following reasons: * healing is done in lookup and mkdir codepath where inode is not linked _yet_ as normally linking is done in interface layers (fuse-bridge, gfapi, nfsv3 etc). * healing consists of non-lookup fops like inodelk, setattr, setxattr etc. All non-lookup fops expect a linked inode. Change-Id: I1bd8157abbae58431b7f6f6fffee0abfe5225342 BUG: 1334164 Signed-off-by: Raghavendra G <rgowdapp> Reviewed-on: http://review.gluster.org/14295 NetBSD-regression: NetBSD Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.com> Smoke: Gluster Build System <jenkins.com> Reviewed-by: Susant Palai <spalai> Reviewed-by: mohammed rafi kc <rkavunga>
REVIEW: http://review.gluster.org/14364 (cluster/distribute: heal layout in discover codepath too) posted (#1) for review on master by Pranith Kumar Karampuri (pkarampu)
REVIEW: http://review.gluster.org/14365 (cluster/distribute: heal layout in discover codepath too) posted (#1) for review on master by Pranith Kumar Karampuri (pkarampu)
REVIEW: http://review.gluster.org/14366 (cluster/distribute: heal layout in discover codepath too) posted (#1) for review on master by Pranith Kumar Karampuri (pkarampu)
COMMIT: http://review.gluster.org/14365 committed in master by Raghavendra G (rgowdapp) ------ commit a74f8cf4e7edc2ce9f045317a18dacddf25adb8a Author: Raghavendra G <rgowdapp> Date: Fri May 13 11:40:57 2016 +0530 cluster/distribute: heal layout in discover codepath too BUG: 1334164 Change-Id: I4259d88f2b6e4f9d4ad689bc4e438f1db9cfd177 Signed-off-by: Raghavendra G <rgowdapp> Reviewed-on: http://review.gluster.org/14365 Tested-by: Pranith Kumar Karampuri <pkarampu> Smoke: Gluster Build System <jenkins.com> NetBSD-regression: NetBSD Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.com>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.9.0, please open a new bug report. glusterfs-3.9.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://lists.gluster.org/pipermail/gluster-users/2016-November/029281.html [2] https://www.gluster.org/pipermail/gluster-users/