+++ This bug was initially created as a clone of Bug #1116150 +++ Description of problem: Adding a brick and running rebalance leads to migration failures , log message says " remote operation failed: File exists" How reproducible: Tried once Steps to Reproduce: 1.created a 2 brick distribute volume 2. created some data by for i in {1..10} do mkdir $i ; cd $i; cp -R /etc/* .; done 3. add one more brick and run rebalance Actual results: Rebalance failures are seen Additional info: --------------- Rebalance logs ================ [2014-06-18 06:44:50.298216] W [client-rpc-fops.c:306:client3_3_mkdir_cbk] 0-new-client-2: remote operation failed: File exists. Path: /1/2/3/4/5/6/7/8/9/10/xdg [2014-06-18 06:44:50.298252] D [dht-selfheal.c:419:dht_selfheal_dir_mkdir_cbk] 0-new-dht: selfhealing directory /1/2/3/4/5/6/7/8/9/10/xdg failed: File exists Volume Name: new Type: Distribute Volume ID: 5d5b5cdf-6c77-4200-a7d0-9f4ac5828a0c Status: Started Snap Volume: no Number of Bricks: 3 Transport-type: tcp Bricks: Brick1: rhs-client4.lab.eng.blr.redhat.com:/home/n0 Brick2: rhs-client39.lab.eng.blr.redhat.com:/home/n1 Brick3: rhs-client4.lab.eng.blr.redhat.com:/home/n2 Options Reconfigured: diagnostics.client-log-level: DEBUG diagnostics.client-log-buf-size: 20 diagnostics.brick-log-buf-size: 10 diagnostics.brick-log-flush-timeout: 200 diagnostics.client-log-flush-timeout: 300 diagnostics.client-log-format: with-msg-id From the logs, it seems to be a race condition between two rebalance prcocess. STATE 1: BRICK-1 only one brick Cached File in the system STATE 2: Add brick-2 BRICK-1 BRICK-2 STATE 3: Lookup of File on brick-2 by this node's rebalance will fail because hashed file is not created yet. So dht_lookup_everywhere is about to get called. STATE 4: As part of lookup link file at brick-2 will be created. STATE 5: getxattr to check that cached file belongs to this node is done STATE 6: dht_lookup_everywhere_cbk detects the link created by rebalance-1. It will unlink it. STATE 7: getxattr at the link file with "pathinfo" key will be called will fail as the link file is deleted by rebalance on node-2
REVIEW: http://review.gluster.org/8603 (cluster/dht: Fix races to avoid deletion of linkto file) posted (#1) for review on release-3.6 by Shyamsundar Ranganathan (srangana)
REVIEW: http://review.gluster.org/8606 (cluster/dht: Modified logic of linkto file deletion on non-hashed) posted (#1) for review on release-3.6 by Shyamsundar Ranganathan (srangana)
REVIEW: http://review.gluster.org/8607 (cluster/dht: Added keys in dht_lookup_everywhere_done) posted (#1) for review on release-3.6 by Shyamsundar Ranganathan (srangana)
REVIEW: http://review.gluster.org/8612 (storage/posix: Don't unlink .glusterfs-hardlink before linkto check) posted (#1) for review on release-3.6 by Shyamsundar Ranganathan (srangana)
REVIEW: http://review.gluster.org/8603 (cluster/dht: Fix races to avoid deletion of linkto file) posted (#2) for review on release-3.6 by Shyamsundar Ranganathan (srangana)
REVIEW: http://review.gluster.org/8606 (cluster/dht: Modified logic of linkto file deletion on non-hashed) posted (#2) for review on release-3.6 by Shyamsundar Ranganathan (srangana)
REVIEW: http://review.gluster.org/8607 (cluster/dht: Added keys in dht_lookup_everywhere_done) posted (#2) for review on release-3.6 by Shyamsundar Ranganathan (srangana)
REVIEW: http://review.gluster.org/8612 (storage/posix: Don't unlink .glusterfs-hardlink before linkto check) posted (#2) for review on release-3.6 by Shyamsundar Ranganathan (srangana)
REVIEW: http://review.gluster.org/8603 (cluster/dht: Fix races to avoid deletion of linkto file) posted (#3) for review on release-3.6 by Shyamsundar Ranganathan (srangana)
REVIEW: http://review.gluster.org/8606 (cluster/dht: Modified logic of linkto file deletion on non-hashed) posted (#3) for review on release-3.6 by Shyamsundar Ranganathan (srangana)
REVIEW: http://review.gluster.org/8607 (cluster/dht: Added keys in dht_lookup_everywhere_done) posted (#3) for review on release-3.6 by Shyamsundar Ranganathan (srangana)
REVIEW: http://review.gluster.org/8612 (storage/posix: Don't unlink .glusterfs-hardlink before linkto check) posted (#3) for review on release-3.6 by Shyamsundar Ranganathan (srangana)
REVIEW: http://review.gluster.org/8603 (cluster/dht: Fix races to avoid deletion of linkto file) posted (#4) for review on release-3.6 by Shyamsundar Ranganathan (srangana)
REVIEW: http://review.gluster.org/8606 (cluster/dht: Modified logic of linkto file deletion on non-hashed) posted (#4) for review on release-3.6 by Shyamsundar Ranganathan (srangana)
REVIEW: http://review.gluster.org/8607 (cluster/dht: Added keys in dht_lookup_everywhere_done) posted (#4) for review on release-3.6 by Shyamsundar Ranganathan (srangana)
REVIEW: http://review.gluster.org/8612 (storage/posix: Don't unlink .glusterfs-hardlink before linkto check) posted (#4) for review on release-3.6 by Shyamsundar Ranganathan (srangana)
REVIEW: http://review.gluster.org/8627 (cluster/dht: Added code to capture races in dht-lookup path) posted (#1) for review on release-3.6 by Shyamsundar Ranganathan (srangana)
COMMIT: http://review.gluster.org/8603 committed in release-3.6 by Vijay Bellur (vbellur) ------ commit 96c92dcba8b4b4dcd85230d76da05ad9b043c3cf Author: Venkatesh Somyajulu <vsomyaju> Date: Thu Sep 4 14:01:34 2014 -0400 cluster/dht: Fix races to avoid deletion of linkto file Explanation of Race between rebalance processes: https://bugzilla.redhat.com/show_bug.cgi?id=1110694#c4 STATE 1: BRICK-1 only one brick Cached File in the system STATE 2: Add brick-2 BRICK-1 BRICK-2 STATE 3: Lookup of File on brick-2 by this node's rebalance will fail because hashed file is not created yet. So dht_lookup_everywhere is about to get called. STATE 4: As part of lookup link file at brick-2 will be created. STATE 5: getxattr to check that cached file belongs to this node is done STATE 6: dht_lookup_everywhere_cbk detects the link created by rebalance-1. It will unlink it. STATE 7: getxattr at the link file with "pathinfo" key will be called will fail as the link file is deleted by rebalance on node-2 Fix: So in the STATE 6, we should avoid the deletion of link file. Every time dht_lookup_everywhere gets called, lookup will be performed on all the nodes. So to avoid STATE 6, if linkto file is found, it is not deleted until valid case is found in dht_lookup_everywhere_done. Case 1: if linkto file points to cached node, and cached file exists, uwind with success. Case 2: if linkto does not point to current cached node, and cached file exists: a) Unlink stale link file b) Create new link file Case 3: Only linkto file exists: Delete linkto file Case 4: Only cached file Create link file (Handled event without patch) Case 5: Neither cached nor hashed file is present Return with ENOENT (handled even without patch) Change-Id: Ibf53671410d8d613b8e2e7e5d0ec30fc7dcc0298 BUG: 1138385 Signed-off-by: Venkatesh Somyajulu <vsomyaju> Reviewed-on-master: http://review.gluster.org/8231 Reviewed-by: Vijay Bellur <vbellur> Tested-by: Vijay Bellur <vbellur> Reviewed-on: http://review.gluster.org/8603 Reviewed-by: Jeff Darcy <jdarcy> Tested-by: Gluster Build System <jenkins.com>
COMMIT: http://review.gluster.org/8606 committed in release-3.6 by Vijay Bellur (vbellur) ------ commit 9b00f921d2da1ca7a6cc3213da4bd0b74ffa048b Author: Venkatesh Somyajulu <vsomyaju> Date: Thu Sep 4 14:06:28 2014 -0400 cluster/dht: Modified logic of linkto file deletion on non-hashed Currently whenever dht_lookup_everywhere gets called, if in dht_lookup_everywhere_cbk, a linkto file is found on non-hashed subvolume, file is unlinked. But there are cases when this file is under migration. Under such condition, we should avoid deletion of file. When some other rebalance process changes the layout of parent such that dst_file (w.r.t. migration) falls on non-hashed node, then may be lookup could have found it as linkto file but just before unlink, file is under migration or already migrated In such cased unlink can be avoided. Race: ------- If we have two bricks (brick-1 and brick-2) with initial file "a" under BaseDir which is hashed as well as cached on (brick-1). Assume "a" hashing gives 44. Brick-1 Brick-2 Initial Setup: BaseDir/a BaseDir [1-50] [51-100] Now add new-brick Brick-3. 1. Rebalance-1 on node Node-1 (Brick-1 node) will reset the BaseDir Layout. 2. After that it will perform a) Create linkto file on new-hashed (brick-2) b) Perform file migration. 1.Rebalance-1 Fixes the base-layout: Brick-1 Brick-2 Brick-3 --------- ---------- ------------ BaseDir/a BaseDir BaseDir [1-33] [34-66] [67-100] 2. Only a) is BaseDir/a BaseDir/a(linkto) BaseDir performed Create linktofile Now rebalance 2 on node-2 jumped in and it will perform step 1 and 2-a. After (rebal-2, step-1), it changes the layout of the BaseDir. BaseDir/a BaseDir/a(link) BaseDir [67-100] [1-33] [34-66] For (rebale-2, step-2), It will perform lookup at Brick-3 as w.r.t new layout 44 falls for brick-3. But lookup will fail. So dht_lookup_everywhere gets called. NOTE: On brick-2 by rebalance-1, a linkto file was created. Currently that linkto files gets deleted by rebalance-2 lookup as it is considered as stale linkto file. But with patch if rebalance is already in progress or rebalance is over, linkto file will not be unlinked. If rebalance is in progress fd will be open and if rebalance is over then linkto file wont be set. Change-Id: I3fee0d28de3c76197325536a9e30099d2413f079 BUG: 1138385 Signed-off-by: Venkatesh Somyajulu <vsomyaju> Reviewed-on-master: http://review.gluster.org/8345 Tested-by: Gluster Build System <jenkins.com> Reviewed-by: Raghavendra G <rgowdapp> Reviewed-by: Shyamsundar Ranganathan <srangana> Reviewed-by: Vijay Bellur <vbellur>
COMMIT: http://review.gluster.org/8607 committed in release-3.6 by Vijay Bellur (vbellur) ------ commit 13a044ab4d643a39d8138ab33226162ef125dbd3 Author: Venkatesh Somyajulu <vsomyaju> Date: Thu Sep 4 14:08:18 2014 -0400 cluster/dht: Added keys in dht_lookup_everywhere_done Case where both cached (C1) and hashed file are found, but hash does not point to above cached node (C1), then dont unlink if either fd-is-open on hashed or linkto-xattr is not found. Change-Id: I7ef49b88d2c88bf9d25d3aa7893714e6c0766c67 BUG: 1138385 Signed-off-by: Venkatesh Somyajulu <vsomyaju> Change-Id: I86d0a21d4c0501c45d837101ced4f96d6fedc5b9 Signed-off-by: Venkatesh Somyajulu <vsomyaju> Reviewed-on-master: http://review.gluster.org/8429 Tested-by: Gluster Build System <jenkins.com> Reviewed-by: susant palai <spalai> Reviewed-by: Raghavendra G <rgowdapp> Reviewed-by: Vijay Bellur <vbellur> Reviewed-on: http://review.gluster.org/8607 Reviewed-by: Jeff Darcy <jdarcy>
COMMIT: http://review.gluster.org/8627 committed in release-3.6 by Vijay Bellur (vbellur) ------ commit 226ea315d7ff63548b1163966e24f80a5e1641ab Author: Venkatesh Somyajulu <vsomyaju> Date: Fri Sep 5 11:14:44 2014 -0400 cluster/dht: Added code to capture races in dht-lookup path Change-Id: I9270d2d40ebd4b113ff961583dfda7754741f15b BUG: 1138385 Signed-off-by: Venkatesh Somyajulu <vsomyaju> Reviewed-on-master: http://review.gluster.org/8430 Tested-by: Gluster Build System <jenkins.com> Reviewed-by: Vijay Bellur <vbellur> Reviewed-on: http://review.gluster.org/8627 Reviewed-by: Jeff Darcy <jdarcy>
REVIEW: http://review.gluster.org/8668 (cluster/dht: Added code to capture races in dht-lookup path) posted (#1) for review on release-3.6 by Vijay Bellur (vbellur)
COMMIT: http://review.gluster.org/8668 committed in release-3.6 by Vijay Bellur (vbellur) ------ commit df31ed49a51670f30d30abb377eb020cc9e85c10 Author: Venkatesh Somyajulu <vsomyaju> Date: Fri Sep 5 11:14:44 2014 -0400 cluster/dht: Added code to capture races in dht-lookup path Change-Id: I9270d2d40ebd4b113ff961583dfda7754741f151 BUG: 1138385 Signed-off-by: Venkatesh Somyajulu <vsomyaju> Reviewed-on-master: http://review.gluster.org/8430 Tested-by: Gluster Build System <jenkins.com> Reviewed-by: Vijay Bellur <vbellur> Reviewed-by: Jeff Darcy <jdarcy> Reviewed-on: http://review.gluster.org/8668 Tested-by: Vijay Bellur <vbellur>
REVIEW: http://review.gluster.org/8692 (cluster/dht: Modified logic of linkto file deletion on non-hashed) posted (#1) for review on release-3.6 by Vijay Bellur (vbellur)
COMMIT: http://review.gluster.org/8692 committed in release-3.6 by Vijay Bellur (vbellur) ------ commit 8a6a4a61dc5bb74e09b2da4aa05c3099e5497a02 Author: Venkatesh Somyajulu <vsomyaju> Date: Thu Sep 4 14:06:28 2014 -0400 cluster/dht: Modified logic of linkto file deletion on non-hashed Currently whenever dht_lookup_everywhere gets called, if in dht_lookup_everywhere_cbk, a linkto file is found on non-hashed subvolume, file is unlinked. But there are cases when this file is under migration. Under such condition, we should avoid deletion of file. When some other rebalance process changes the layout of parent such that dst_file (w.r.t. migration) falls on non-hashed node, then may be lookup could have found it as linkto file but just before unlink, file is under migration or already migrated In such cased unlink can be avoided. Race: ------- If we have two bricks (brick-1 and brick-2) with initial file "a" under BaseDir which is hashed as well as cached on (brick-1). Assume "a" hashing gives 44. Brick-1 Brick-2 Initial Setup: BaseDir/a BaseDir [1-50] [51-100] Now add new-brick Brick-3. 1. Rebalance-1 on node Node-1 (Brick-1 node) will reset the BaseDir Layout. 2. After that it will perform a) Create linkto file on new-hashed (brick-2) b) Perform file migration. 1.Rebalance-1 Fixes the base-layout: Brick-1 Brick-2 Brick-3 --------- ---------- ------------ BaseDir/a BaseDir BaseDir [1-33] [34-66] [67-100] 2. Only a) is BaseDir/a BaseDir/a(linkto) BaseDir performed Create linktofile Now rebalance 2 on node-2 jumped in and it will perform step 1 and 2-a. After (rebal-2, step-1), it changes the layout of the BaseDir. BaseDir/a BaseDir/a(link) BaseDir [67-100] [1-33] [34-66] For (rebale-2, step-2), It will perform lookup at Brick-3 as w.r.t new layout 44 falls for brick-3. But lookup will fail. So dht_lookup_everywhere gets called. NOTE: On brick-2 by rebalance-1, a linkto file was created. Currently that linkto files gets deleted by rebalance-2 lookup as it is considered as stale linkto file. But with patch if rebalance is already in progress or rebalance is over, linkto file will not be unlinked. If rebalance is in progress fd will be open and if rebalance is over then linkto file wont be set. Change-Id: I3fee0d28de3c76197325536a9e30099d2413f07d BUG: 1138385 Signed-off-by: Venkatesh Somyajulu <vsomyaju> Reviewed-on-master: http://review.gluster.org/8345 Tested-by: Gluster Build System <jenkins.com> Reviewed-by: Raghavendra G <rgowdapp> Reviewed-by: Shyamsundar Ranganathan <srangana> Reviewed-by: Vijay Bellur <vbellur>
REVIEW: http://review.gluster.org/8612 (storage/posix: Don't unlink .glusterfs-hardlink before linkto check) posted (#5) for review on release-3.6 by Shyamsundar Ranganathan (srangana)
COMMIT: http://review.gluster.org/8612 committed in release-3.6 by Vijay Bellur (vbellur) ------ commit d804cb577923bb7d834ac00a704f3cefe8a0afdf Author: Venkatesh Somyajulu <vsomyaju> Date: Thu Sep 4 14:15:10 2014 -0400 storage/posix: Don't unlink .glusterfs-hardlink before linkto check BUG: 1138385 Change-Id: I90a10ac54123fbd8c7383ddcbd04e8879ae51232 Signed-off-by: Venkatesh Somyajulu <vsomyaju> Reviewed-on-master: http://review.gluster.org/8559 Tested-by: Gluster Build System <jenkins.com> Reviewed-by: N Balachandran <nbalacha> Reviewed-by: Vijay Bellur <vbellur> Reviewed-on: http://review.gluster.org/8612
A beta release for GlusterFS 3.6.0 has been released. Please verify if the release solves this bug report for you. In case the glusterfs-3.6.0beta1 release does not have a resolution for this issue, leave a comment in this bug and move the status to ASSIGNED. If this release fixes the problem for you, leave a note and change the status to VERIFIED. Packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update (possibly an "updates-testing" repository) infrastructure for your distribution. [1] http://supercolony.gluster.org/pipermail/gluster-users/2014-September/018836.html [2] http://supercolony.gluster.org/pipermail/gluster-users/
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.6.1, please reopen this bug report. glusterfs-3.6.1 has been announced [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://supercolony.gluster.org/pipermail/gluster-users/2014-November/019410.html [2] http://supercolony.gluster.org/mailman/listinfo/gluster-users
REVIEW: http://review.gluster.org/9382 (cluster/dht: Change log level to avoid annoying logs) posted (#1) for review on release-3.6 by Vijay Bellur (vbellur)
COMMIT: http://review.gluster.org/9382 committed in release-3.6 by Raghavendra Bhat (raghavendra) ------ commit 9eb4afbb4ea1ddb398166b87f783af00d61ddaa9 Author: Vijay Bellur <vbellur> Date: Mon Jan 5 14:32:24 2015 +0530 cluster/dht: Change log level to avoid annoying logs [2015-01-04 08:03:23.820376] I [dht-common.c:1822:dht_lookup_cbk] 0-patchy-dht: Entry /tls missing on subvol patchy-replicate-0 Change-Id: Id9eae47213ed39e8bf969e82cc7b935dfada4598 BUG: 1138385 Reviewed-on: http://review.gluster.org/9382 Tested-by: Gluster Build System <jenkins.com> Reviewed-by: Raghavendra Bhat <raghavendra>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.6.2, please reopen this bug report. glusterfs-3.6.2 has been announced on the Gluster Developers mailinglist [1], packages for several distributions should already be or become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. The fix for this bug likely to be included in all future GlusterFS releases i.e. release > 3.6.2. [1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/5978 [2] http://news.gmane.org/gmane.comp.file-systems.gluster.user [3] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/6137