Bug 1190734 - Enhancement to readdir for tiered volumes
Summary: Enhancement to readdir for tiered volumes
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: core
Version: mainline
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Dan Lambright
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-02-09 14:43 UTC by Dan Lambright
Modified: 2015-12-01 16:45 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Enhancement
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-05-14 17:26:27 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1163161 0 high CLOSED With afrv2 + ext4, lookups on directories with large offsets could result in duplicate/missing entries 2021-02-22 00:41:40 UTC

Internal Links: 1163161

Description Dan Lambright 2015-02-09 14:43:55 UTC
Description of problem:

The tiering feature requires a volume to have multiple DHT translators. But this results in each DHT instance reserving bits from readdir's d_off field. The number of bits in the d_off field is limited; the more that are taken, the higher probability of duplicate/missing entries on a readdir. 

The solution is to have only one translator in the graph encode bits. Fix 1163161 freed up some bits that AFR took. A similar change must be done for EC, and the encoding logic that currently resides in DHT should be moved to the client translator.

Additional info:

Outline of the new scheme below.

http://www.gluster.org/pipermail/gluster-devel/2015-January/043592.html

Comment 2 Anand Avati 2015-02-18 19:58:11 UTC
REVIEW: http://review.gluster.org/9688 (cluster/dht: Change the subvolume encoding in d_off to be a "global" position in the graph rather than relative (local) to a particular translator.) posted (#1) for review on master by Dan Lambright (dlambrig@redhat.com)

Comment 3 Anand Avati 2015-02-18 20:34:58 UTC
REVIEW: http://review.gluster.org/9688 (cluster/dht: Change the subvolume encoding in d_off to be a "global" position in the graph rather than relative (local) to a particular translator.) posted (#2) for review on master by Dan Lambright (dlambrig@redhat.com)

Comment 4 Anand Avati 2015-02-18 20:35:33 UTC
REVIEW: http://review.gluster.org/9688 (cluster/dht: Change the subvolume encoding in d_off to be a "global" position in the graph rather than relative (local) to a particular translator.) posted (#3) for review on master by Dan Lambright (dlambrig@redhat.com)

Comment 5 Anand Avati 2015-03-02 22:40:12 UTC
REVIEW: http://review.gluster.org/9688 (cluster/dht: Change the subvolume encoding in d_off to be a "global" position in the graph rather than relative (local) to a particular translator.) posted (#4) for review on master by Dan Lambright (dlambrig@redhat.com)

Comment 6 Anand Avati 2015-03-11 15:58:05 UTC
REVIEW: http://review.gluster.org/9688 (cluster/dht: Change the subvolume encoding in d_off to be a "global" position in the graph rather than relative (local) to a particular translator.) posted (#5) for review on master by Dan Lambright (dlambrig@redhat.com)

Comment 7 Anand Avati 2015-03-16 15:09:15 UTC
REVIEW: http://review.gluster.org/9688 (cluster/dht: Change the subvolume encoding in d_off to be a "global" position in the graph rather than relative (local) to a particular translator.) posted (#6) for review on master by Dan Lambright (dlambrig@redhat.com)

Comment 8 Anand Avati 2015-03-18 11:47:47 UTC
COMMIT: http://review.gluster.org/9688 committed in master by Vijay Bellur (vbellur@redhat.com) 
------
commit a216745e5db3fdb4fa8d625c971e70f8d0e34d23
Author: Dan Lambright <dlambrig@redhat.com>
Date:   Wed Feb 18 14:49:50 2015 -0500

    cluster/dht: Change the subvolume encoding in d_off to be a "global"
    position in the graph rather than relative (local) to a particular
    translator.
    
    Encoding the volume in this way allows a single translator to manage
    which brick is currently being scanned for directory entries. Using a
    single translator minimizes allocated bits in the d_off. It also allows
    multiple DHT translators in the same graph to have a common frame of
    reference (the graph position) for which brick is being read. Multiple
    DHT translators are needed for the Tiering feature.
    
    The fix builds off a previous change (9332) which removed subvolume
    encoding from AFR. The fix makes an equivalent change to the EC
    translator.
    
    More background can be found in fix 9332 and gluster-dev discussions [1].
    
    DHT and AFR/EC are responsibile (as before) for choosing which brick to
    enumerate directory entries in over the readdir lifecycle.
    
    The client translator receiving the readdir fop encodes the dht_t. It
    is referred to as the "leaf node" in the graph and corresponds to the
    brick being scanned.
    
    When DHT decodes the d_off, it translates the leaf node to a local
    subvolume, which represents the next node in the graph leading to
    the brick.
    
    Tracking of leaf nodes is done in common utility functions. Leaf nodes
    counts and positional information are updated on a graph switch.
    
    [1] www.gluster.org/pipermail/gluster-devel/2015-January/043592.html
    
    Change-Id: Iaf0ea86d7046b1ceadbad69d88707b243077ebc8
    BUG: 1190734
    Signed-off-by: Dan Lambright <dlambrig@redhat.com>
    Reviewed-on: http://review.gluster.org/9688
    Reviewed-by: Xavier Hernandez <xhernandez@datalab.es>
    Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com>
    Reviewed-by: Vijay Bellur <vbellur@redhat.com>
    Tested-by: Vijay Bellur <vbellur@redhat.com>

Comment 9 Niels de Vos 2015-05-14 17:26:27 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report.

glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Comment 10 Niels de Vos 2015-05-14 17:28:22 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report.

glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Comment 11 Niels de Vos 2015-05-14 17:35:15 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report.

glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user


Note You need to log in before you can comment on or make changes to this bug.