Description of problem: Currently rebalance as part of remove-brick leaves some files on the removed-brick[intermittent]. There should be a warning message for admins to check the removed-bricks for any files that might have not been migrated and move them to mount point.
REVIEW: http://review.gluster.org/8577 (CLI: Adding warning message in case remove-brick commit executed) posted (#2) for review on master by susant palai (spalai)
REVIEW: http://review.gluster.org/8577 (CLI: Adding warning message in case remove-brick commit executed) posted (#3) for review on master by susant palai (spalai)
COMMIT: http://review.gluster.org/8577 committed in master by Vijay Bellur (vbellur) ------ commit b81cec326d4d43519593cb56b7a0e68ea5c3421c Author: Susant Palai <spalai> Date: Tue Sep 2 05:29:52 2014 -0400 CLI: Adding warning message in case remove-brick commit executed Change-Id: Ia2f1b2cd2687ca8e739e7a1e245e668a7424ffac BUG: 1136702 Signed-off-by: Susant Palai <spalai> Reviewed-on: http://review.gluster.org/8577 Tested-by: Gluster Build System <jenkins.com> Reviewed-by: Vijay Bellur <vbellur>
REVIEW: http://review.gluster.org/8664 (CLI: Show warning on remove-brick commit Signed-off-by: Susant Palai <spalai>) posted (#1) for review on master by susant palai (spalai)
COMMIT: http://review.gluster.org/8664 committed in master by Vijay Bellur (vbellur) ------ commit 1c8d4bf6ab299f8fb44dce354fb8f3232136be02 Author: Susant Palai <spalai> Date: Tue Sep 9 06:05:24 2014 -0400 CLI: Show warning on remove-brick commit Signed-off-by: Susant Palai <spalai> Change-Id: I48a4168f81bd272216549c76b0bc1b23e34894d6 BUG: 1136702 Reviewed-on: http://review.gluster.org/8664 Tested-by: Gluster Build System <jenkins.com> Reviewed-by: N Balachandran <nbalacha> Reviewed-by: Vijay Bellur <vbellur>
No, commit should fail if the migration did not complete successfully. We should *know* if the migration failed. There should *not* be files missing from the volume after a remove-brick completes.
Susant, this sounds like a real risk for data loss: Currently rebalance as part of remove-brick leaves some files on the removed-brick[intermittent]. Having a warning seems like the wrong kind of direction, and I'm generally inclined to agree with JoeJulian about this. Why are we thinking that having unmigrated files is an ok thing at this point, and are we working on solving it completely? :)
Hi Joe/Justin, Agreed to comment 6 & 7. To start with, the above patch is not a permanent fix. It's just a work around till we find a proper solution for the problem.
From my perspective, it will only take one user losing data to poison our reputation. Having a warning and a work-around will only cause confusion and cost me immeasurable time explaining to people what they will need to look for and how to fix it, and subject us to attacks on our competency. I would prefer that this be a blocker and that the problem be corrected.
The remaining files on the brick after a remove-brick seem to be due to the fact that the brick will continue to accept new files created in the cluster during the removal process.. any new files that were hashed onto that brick during the removal process stand to be orphaned after the process has completed. see: https://gist.github.com/mandb/93369097139c6cc3ff98 Expected behavior would be that while a brick is in remove-brick state, new file creation requests would be relayed to another brick. The issue is likely complicated by a few scenarios: a) Clients still see the brick's space as part of the volume capacity... b) Files that are STILL on the brick need to remain write-available, and those files could be grown, so the capacity in a) has to reflect this available write space. I will test with 3.6.2 and see if the behavior has changed.
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report. glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939 [2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user