Description of problem: This problem is hit as a part of this test case, tests/bugs/bug-1117851.t about once every 100 files (based on the backend disk for the volume (i.e ram disk/ssd/others)). The issue being seen when hashed and cached subvols for a file are the same and it is being renamed to another file whose hased subvol is different. The root cause of this issue is due to the fact that both clients race to create the link and linkto file in the above scenario, and the losing client goes ahead and deletes the linkto file in its cleanup, thereby the actual rename attempted by the winning client fails, ending up in both clients failing to rename the file. Fixing the part of the client that fails to create the linkto file, to not delete the linkto file will not be sufficient, as the losing client could have won that race (as link and linkto are wound in parallel). Which is present in this review, http://review.gluster.org/#/c/8338/ The additional fix to handle this failure, is to make the wind's to create the link and the linkto sequential, so that whichever client wins the link race, can then go ahead with creating the linkto file and hence have a clear client proceeding and the other client getting the required errors. Version-Release number of selected component (if applicable): Gluster master How reproducible: 1 in 100 renames if run on bircks on SSD or RAM disk Steps to Reproduce: Test case, tests/bugs/bug-1117851.t treating warnings on file rename failures as errors (see comment in tesst case file) Also, this should be a fork from bug #1117851, but as this is not a data loss, only refering the original bug here.
REVIEW: http://review.gluster.org/8382 (cluster/dht: Fix rename failures when multiple clients race) posted (#1) for review on master by Shyamsundar Ranganathan (srangana)
REVIEW: http://review.gluster.org/8382 (cluster/dht: Fix rename failures when multiple clients race) posted (#2) for review on master by Shyamsundar Ranganathan (srangana)
REVIEW: http://review.gluster.org/8382 (cluster/dht: Fix rename failures when multiple clients race) posted (#3) for review on master by Shyamsundar Ranganathan (srangana)
REVIEW: http://review.gluster.org/8382 (cluster/dht: Fix rename failures when multiple clients race) posted (#4) for review on master by Shyamsundar Ranganathan (srangana)
REVIEW: http://review.gluster.org/8382 (cluster/dht: Fix rename failures when multiple clients race) posted (#5) for review on master by Shyamsundar Ranganathan (srangana)
Abandoned: http://review.gluster.org/8382 This change is made differently where handling the linkto creation was needed first due to FUSE behavior. These changes can be found here, http://review.gluster.org/#/c/8563/ http://review.gluster.org/#/c/8570/ These changes would now make the winning client not fail a rename, in case it failed to rename the linkto file. Hence when one client wins the link race, and the other still deletes the linkto file, the rename failure by the winning client is not a critical failure, hence resolving the issue. The test case modified as a part of this commit will be posted as a separate commit for inclusion post which this bug can be marked for verification.
REVIEW: http://review.gluster.org/8579 (cluster/dht: Modified test case to note rename failures as errors) posted (#1) for review on master by Shyamsundar Ranganathan (srangana)
COMMIT: http://review.gluster.org/8579 committed in master by Vijay Bellur (vbellur) ------ commit 4adfb6fb7c371c6bc03acdaf61f1cca496388356 Author: Shyam <srangana> Date: Tue Sep 2 12:37:07 2014 -0400 cluster/dht: Modified test case to note rename failures as errors The bug referenced in this change, had an race condition that is now fixed by the following commits that are posted for review. http://review.gluster.org/#/c/8563/ http://review.gluster.org/#/c/8570/ These changes would now make the winning client not fail a rename, in case it failed to rename the linkto file. Hence when one client wins the link race, and the other still deletes the linkto file, the rename failure by the winning client is not a critical failure, hence it resolves the issue posted in the bug. As a result modifying the test case to treat the rename failures as errors, to catch any future issues. Change-Id: Ibe9caac7ee87dcbc4f581cfbd36173b734859ccb BUG: 1123950 Signed-off-by: Shyam <srangana> Reviewed-on: http://review.gluster.org/8579 Reviewed-by: Jeff Darcy <jdarcy> Tested-by: Gluster Build System <jenkins.com> Reviewed-by: Vijay Bellur <vbellur>
REVIEW: http://review.gluster.org/8729 (cluster/dht: Modified test case to note rename failures as errors) posted (#1) for review on release-3.5 by N Balachandran (nbalacha)
REVIEW: http://review.gluster.org/8729 (cluster/dht: Modified test case to note rename failures as errors) posted (#2) for review on release-3.5 by N Balachandran (nbalacha)
http://review.gluster.org/8729 was incorrectlt posted against this BZ. Moving this to Modified based on Comment#8
http://review.gluster.org/8729 was incorrectly posted against this BZ. Moving this to Modified based on Comment#8
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report. glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939 [2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user