Description of problem: The tiering feature requires a volume to have multiple DHT translators. But this results in each DHT instance reserving bits from readdir's d_off field. The number of bits in the d_off field is limited; the more that are taken, the higher probability of duplicate/missing entries on a readdir. The solution is to have only one translator in the graph encode bits. Fix 1163161 freed up some bits that AFR took. A similar change must be done for EC, and the encoding logic that currently resides in DHT should be moved to the client translator. Additional info: Outline of the new scheme below. http://www.gluster.org/pipermail/gluster-devel/2015-January/043592.html
REVIEW: http://review.gluster.org/9688 (cluster/dht: Change the subvolume encoding in d_off to be a "global" position in the graph rather than relative (local) to a particular translator.) posted (#1) for review on master by Dan Lambright (dlambrig)
REVIEW: http://review.gluster.org/9688 (cluster/dht: Change the subvolume encoding in d_off to be a "global" position in the graph rather than relative (local) to a particular translator.) posted (#2) for review on master by Dan Lambright (dlambrig)
REVIEW: http://review.gluster.org/9688 (cluster/dht: Change the subvolume encoding in d_off to be a "global" position in the graph rather than relative (local) to a particular translator.) posted (#3) for review on master by Dan Lambright (dlambrig)
REVIEW: http://review.gluster.org/9688 (cluster/dht: Change the subvolume encoding in d_off to be a "global" position in the graph rather than relative (local) to a particular translator.) posted (#4) for review on master by Dan Lambright (dlambrig)
REVIEW: http://review.gluster.org/9688 (cluster/dht: Change the subvolume encoding in d_off to be a "global" position in the graph rather than relative (local) to a particular translator.) posted (#5) for review on master by Dan Lambright (dlambrig)
REVIEW: http://review.gluster.org/9688 (cluster/dht: Change the subvolume encoding in d_off to be a "global" position in the graph rather than relative (local) to a particular translator.) posted (#6) for review on master by Dan Lambright (dlambrig)
COMMIT: http://review.gluster.org/9688 committed in master by Vijay Bellur (vbellur) ------ commit a216745e5db3fdb4fa8d625c971e70f8d0e34d23 Author: Dan Lambright <dlambrig> Date: Wed Feb 18 14:49:50 2015 -0500 cluster/dht: Change the subvolume encoding in d_off to be a "global" position in the graph rather than relative (local) to a particular translator. Encoding the volume in this way allows a single translator to manage which brick is currently being scanned for directory entries. Using a single translator minimizes allocated bits in the d_off. It also allows multiple DHT translators in the same graph to have a common frame of reference (the graph position) for which brick is being read. Multiple DHT translators are needed for the Tiering feature. The fix builds off a previous change (9332) which removed subvolume encoding from AFR. The fix makes an equivalent change to the EC translator. More background can be found in fix 9332 and gluster-dev discussions [1]. DHT and AFR/EC are responsibile (as before) for choosing which brick to enumerate directory entries in over the readdir lifecycle. The client translator receiving the readdir fop encodes the dht_t. It is referred to as the "leaf node" in the graph and corresponds to the brick being scanned. When DHT decodes the d_off, it translates the leaf node to a local subvolume, which represents the next node in the graph leading to the brick. Tracking of leaf nodes is done in common utility functions. Leaf nodes counts and positional information are updated on a graph switch. [1] www.gluster.org/pipermail/gluster-devel/2015-January/043592.html Change-Id: Iaf0ea86d7046b1ceadbad69d88707b243077ebc8 BUG: 1190734 Signed-off-by: Dan Lambright <dlambrig> Reviewed-on: http://review.gluster.org/9688 Reviewed-by: Xavier Hernandez <xhernandez> Reviewed-by: Krishnan Parthasarathi <kparthas> Reviewed-by: Vijay Bellur <vbellur> Tested-by: Vijay Bellur <vbellur>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report. glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939 [2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user