Description of problem: ======================== In the existing self-heal algorithm, the self-heal happens from the node which has wise change-logs. Lets say there are 3 bricks in the cluster (1 x 3 replicate volume). writes are happening on a file and brick1 crashes. writes still continue to happen from mount point. Brick1 is brought online. Before self-heal could complete brick2 crashed. At this point of time brick2 had [ 265 0 0 ] change-logs on a file. writes still continue to happen . brick3 crashes. (brick3 is the longest lived brick) . When both brick2 and brick3 are brought online, self-heal happens from brick2 to brick3 and brick1 even though brick3 has the latest data. This is because the change-logs of the file on brick2 indicates it's wise. Version-Release number of selected component (if applicable): ============================================================ glusterfs 3.4.0.35.1u2rhs built on Oct 21 2013 14:00:58 How reproducible: =================== Steps to Reproduce: ====================== 1. On a 1 x 3 replicate volume, opened fd on 10 files. started writing data on all the files. writes on the file were in progress all the time. {periodic writes } 2. Brought down brick1 (xfs_progs -> godown) 3. After some time brought back brick1. 4. Before the self-heal could complete, brick2 crashed.(xfs_progs -> godown) At this point of time the extended attributes of one of the file on brick2 was, [ 265 0 0 ] 5. After some time brick3 crashed.(xfs_progs -> godown) At this point of time the extended attributes of one of the file on brick2 was, [ 266 140 1 ] 6. brought back brick2 and brick3 at same time. Actual results: ================== Self-heal happened from brick2 to brick1 and brick3 on this files. [2013-11-06 08:57:38.623632] I [afr-self-heal-common.c:2840:afr_log_self_heal_completion_status] 0-vol_rep-replicate-0: foreground data self heal is successfully completed, from vol_rep-client-1 with 1077862400 1075886080 1077309440 sizes - Pending matrix: [ [ 2 195 56 ] [ 265 0 0 ] [ 266 140 1 ] ] on <gfid:16e4f46a-f1bb-4ab8-a8eb-36479a83fc82> Expected results: ================== self-heal should have happened from brick3 to brick1 and brick2.
This will never happen for afr-v2. Just test it and close it
Tested with 3.1.2 (afrv2.0) and not able to reproduce the reported problem and as per the Dev this is fixed as part of v2 implementation so marking this bug as verified
Thank you for submitting this issue for consideration in Red Hat Gluster Storage. The release for which you requested us to review, is now End of Life. Please See https://access.redhat.com/support/policy/updates/rhs/ If you can reproduce this bug against a currently maintained version of Red Hat Gluster Storage, please feel free to file a new report against the current release.