+++ This bug was initially created as a clone of Bug #1564198 +++ +++ This bug was initially created as a clone of Bug #1553677 +++ Description of problem: ======================= Many files were not migrated from the decommissioned bricks; commit results in data loss. Version-Release number of selected component (if applicable): 3.12.2-5.el7rhgs.x86_64 How reproducible: Reporting at first occurrence Steps to Reproduce: =================== 1) Create a x3 volume with brick-mux enabled and start it. 2) FUSE mount it on multiple clients. 3) From Client-1 : run script to create folders and files continuously From client-2 : start linux kernel untar From client-3 : while true;do find;done From client-4 : while true;do ls -lRt;done 4) While step-3 is in-progress, killed server-1 brick process using kill -9 <pid>. As brick mux is enabled killing single brick on the server using kill -9 would take down all the bricks on the node. 5) Now, add 3 bricks to the volume and after few secs immediately start removing old bricks. 6) Wait for remove-brick to complete. Actual results: =============== Many files were not migrated from the decommissioned bricks; commit results in data loss. Expected results: ================= Remove-brick operation should migrate all the files from the decommissioned brick. RCA: The logs from the previous failed runs indicate 2 problems: 1. At least one process could not read directories because the first_up_subvol was not in the list of local_subvols for the process 2.Since a brick was down, some files would not be migrated if the gfid hashed to that node-uuid --- Additional comment from Worker Ant on 2018-04-05 12:17:33 EDT --- REVIEW: https://review.gluster.org/19827 (cluster/dht: Wind open to all subvols) posted (#1) for review on master by N Balachandran --- Additional comment from Worker Ant on 2018-04-06 06:46:22 EDT --- REVIEW: https://review.gluster.org/19831 (cluster/dht: Handle file migrations when brick down) posted (#1) for review on master by N Balachandran --- Additional comment from Worker Ant on 2018-04-11 09:19:03 EDT --- COMMIT: https://review.gluster.org/19827 committed in master by "Shyamsundar Ranganathan" <srangana> with a commit message- cluster/dht: Wind open to all subvols dht_opendir should wind the open to all subvols whether or not local->subvols is set. This is because dht_readdirp winds the calls to all subvols. Change-Id: I67a96b06dad14a08967c3721301e88555aa01017 updates: bz#1564198 Signed-off-by: N Balachandran <nbalacha> --- Additional comment from Worker Ant on 2018-04-12 22:27:57 EDT --- COMMIT: https://review.gluster.org/19831 committed in master by "Raghavendra G" <rgowdapp> with a commit message- cluster/dht: Handle file migrations when brick down The decision as to which node would migrate a file was based on the gfid of the file. Files were divided among the nodes for the replica/disperse set. However, if a brick was down when rebalance started, the nodeuuids would be saved as NULL and a set of files would not be migrated. Now, if the nodeuuid is NULL, the first non-null entry in the set is the node responsible for migrating the file. Change-Id: I72554c107792c7d534e0f25640654b6f8417d373 fixes: bz#1564198 Signed-off-by: N Balachandran <nbalacha>
Patches: https://review.gluster.org/#/c/19862/ https://review.gluster.org/#/c/19863/
REVISION POSTED: https://review.gluster.org/19862 (cluster/dht: Wind open to all subvols) posted (#2) for review on release-3.12 by N Balachandran
REVIEW: https://review.gluster.org/19862 (cluster/dht: Wind open to all subvols) posted (#2) for review on release-3.12 by N Balachandran
REVISION POSTED: https://review.gluster.org/19863 (cluster/dht: Handle file migrations when brick down) posted (#2) for review on release-3.12 by N Balachandran
REVIEW: https://review.gluster.org/19863 (cluster/dht: Handle file migrations when brick down) posted (#2) for review on release-3.12 by N Balachandran
COMMIT: https://review.gluster.org/19862 committed in release-3.12 by "Shyamsundar Ranganathan" <srangana> with a commit message- cluster/dht: Wind open to all subvols dht_opendir should wind the open to all subvols whether or not local->subvols is set. This is because dht_readdirp winds the calls to all subvols. Change-Id: I67a96b06dad14a08967c3721301e88555aa01017 updates: bz#1566820 Signed-off-by: N Balachandran <nbalacha> (cherry picked from commit c4251edec654b4e0127577e004923d9729bc323d)
COMMIT: https://review.gluster.org/19863 committed in release-3.12 by "Shyamsundar Ranganathan" <srangana> with a commit message- cluster/dht: Handle file migrations when brick down The decision as to which node would migrate a file was based on the gfid of the file. Files were divided among the nodes for the replica/disperse set. However, if a brick was down when rebalance started, the nodeuuids would be saved as NULL and a set of files would not be migrated. Now, if the nodeuuid is NULL, the first non-null entry in the set is the node responsible for migrating the file. Change-Id: I72554c107792c7d534e0f25640654b6f8417d373 fixes: bz#1566820 Signed-off-by: N Balachandran <nbalacha> (cherry picked from commit 1f0765242a689980265c472646c64473a92d94c0) Change-Id: Id1a6e847b0191b6a40707bea789a2a35ea3d9f68
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.12.9, please open a new bug report. glusterfs-3.12.9 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://lists.gluster.org/pipermail/announce/2018-April/000096.html [2] https://www.gluster.org/pipermail/gluster-users/