Description of problem: For a file, Brick-A has pending operations on Brick-B, Brick-B has pending operations on Brick-C and Brick-C has pending operations on Brick-A. Since no two other bricks are blaming one brick any of these bricks can be considered as a good copy and a heal can be done. Reads will fail until heal happens. The consistent read issue we found happens when Any one of the bricks go down in this state. If Brick-A goes down, Reads will be served from Brick-B and if Brick-B goes down Reads will be served from Brick-C. If Brick-C goes down reads will be served from Brick-A. All these reads could give different content. Version-Release number of selected component (if applicable): How reproducible: It is extremely difficult to hit this case. We are mostly going to simulate it by putting breakpoints in gdb. Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
To fix this issue we need to change the read, write and self-heal transactions, which are the heart of AFR transactions. It is better to give some time for it to become stable upstream before taking it to downstream. So targeting it for the later release.
Upstream patch: https://review.gluster.org/#/c/19477/