Description of problem: If you remove a brick with pending self-heals the metadata will now point to the wrong brick. We had a replica 2 volume and 1 brick failed. We added another brick and started the self-heal. Wanting to see the actual status of the heal, I removed the dead brick from the volume expecting the pending attributes for that brick would be removed and/or ignored. Instead the pending attributes remained and the third brick was now referenced such that the attributes for the former second brick now point to the third. Version-Release number of selected component (if applicable): 3.4.3 How reproducible: Always Steps to Reproduce: 1. Create a replica 3 volume 2. Down brick 2 3. Do some actions to the files on the volume 4. gluster volume remove-brick myvol replica 2 server2:/brick force Actual results: # file: ../../85/4b/854bccb8-9119-4449-9906-b57008aef492 trusted.afr.gv-swift-client-0=0x000000000000000000000000 trusted.afr.gv-swift-client-1=0x000000020000000100000000 trusted.afr.gv-swift-client-2=0x000000000000000000000000 trusted.gfid=0x854bccb8911944499906b57008aef492 # From myvol-fuse.vol: volume myvol-replicate-0 type cluster/replicate subvolumes myvol-client-0 myvol-client-1 end-volume Expected results: I expected the identification for the client to remain the same, ie. volume myvol-replicate-0 type cluster/replicate subvolumes myvol-client-0 myvol-client-2 end-volume and any entries in indices where client-0 and client-2 are clean to be removed from indices.
http://review.gluster.org/#/c/7122 http://review.gluster.org/7155 Patches above prevent this problem. We found this problem at the time of snapshot development. CC Ravi
Joe, Ravi writes really good documents :-). Check this page out for more information: http://www.gluster.org/community/documentation/index.php/Features/persistent-AFR-changelog-xattributes pranith
Can this get backported to release-3.4 and release-3.5?
I don't think that is possible. The fix was tied to the next op-version (i.e. GD_OP_VERSION_MAX which is 4 for release 3.6) so that there are no heterogenous nodes (i.e. the feature won't work until all nodes are upgraded to 3.6). If we backport it to previous releases, even if one of the nodes were not upgraded, we don't have a way to figure that out, which could lead to inconsistent volfiles amongst nodes.
To be more clear, for a 1x3 replica, if the middle brick were removed, the nodes which have the fix will use trusted.afr.gv-swift-client-{0,2} for AFR's changelogs while the ones that were not upgraded will still use trusted.afr.gv-swift-client-{0,1}
Ravi, Could you close this bug if the bug can't be backported? Pranith