Description of problem: This is how TA reads work as of patch https://review.gluster.org/#/c/glusterfs/+/20994/2 : If both data bricks are up, read subvol will be based on read_subvols. If only one data brick is up: - First qeury the data-brick that is up. If it blames the other brick, allow the reads. - If if doesn't, query the TA to obtain the source of truth. However, we need to see if we can re-use AFR_TA_DOM_NOTIFY lock even for read-txns so that once ta_bad_child_index is stored in-memory, we can reuse that for subsequent reads until shd resets it after healing. The rough changes to read-txn if any 1 data brick is down would be (subject to discussion and acceptance): 0. If priv->bad_child_index is valid, goto 4 1. get afr xattrs of data=brick that is up. 2. If it contains pending xattrs, update priv->bad_child_index with that value 3. Otherwise: { TA LOCK (AFR_TA_DOM_NOTIFY) TA LOCK (AFR_TA_DOM_MODIFY) get afr xattr from TA. update priv->bad_child_index if xattr present on TA TA UNLOCK (AFR_TA_DOM_MODIFY) If priv->bad_child_index is still AFR_CHILD_UNKNOWN {TA UNLOCK (AFR_TA_DOM_NOTIFY) } else retain AFR_TA_DOM_NOTIFY } 4. Serve read depending on the data brick that is up is good or bad While this is simple, some of the problems that need solving are -what happens when multiple read requests come? -what happens when reads/writes both come We must ensure that from a given mount only one AFR_TA_DOM_NOTIFY lock is sent to TA irrespective of the no. of reads and writes.
Status?
Not being looked into at this moment. Bug was created to keep track of this technical debt.
This bug is moved to https://github.com/gluster/glusterfs/issues/949, and will be tracked there from now on. Visit GitHub issues URL for further details