Bug 1453152 - [Parallel Readdir] : Mounts fail when performance.parallel-readdir is set to "off"
Summary: [Parallel Readdir] : Mounts fail when performance.parallel-readdir is set to ...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: read-ahead
Version: 3.11
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: bugs@gluster.org
QA Contact:
URL:
Whiteboard:
Depends On: 1445246 1446516
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-05-22 09:29 UTC by Poornima G
Modified: 2017-05-30 18:53 UTC (History)
10 users (show)

Fixed In Version: glusterfs-3.11.0
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1446516
Environment:
Last Closed: 2017-05-30 18:53:12 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Poornima G 2017-05-22 09:29:44 UTC
+++ This bug was initially created as a clone of Bug #1446516 +++

+++ This bug was initially created as a clone of Bug #1445246 +++

Description of problem:
-----------------------

2*2 volume,trying too mount via FUSE .

Enable parallel readdir,then set cache limit to > 1G,Turn off parallel readdir and try to mount the volume.

Mount fails.

Snippet from mount logs :

[2017-04-25 10:45:06.698688] I [MSGID: 100030] [glusterfsd.c:2417:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.8.4 (args: /usr/sbin/glusterfs --volfile-server=gqas013.sbu.lab.eng.bos.redhat.com --volfile-id=/testvol /gluster-mount)
[2017-04-25 10:45:06.706458] I [MSGID: 101190] [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2017-04-25 10:45:06.760460] E [MSGID: 101028] [options.c:168:xlator_option_validate_sizet] 0-testvol-readdir-ahead: '2147483648' in 'option rda-cache-limit 2GB' is out of range [0 - 1073741824]
[2017-04-25 10:45:06.760476] W [MSGID: 101029] [options.c:945:xl_opt_validate] 0-testvol-readdir-ahead: validate of rda-cache-limit returned -1
[2017-04-25 10:45:06.760484] E [MSGID: 101090] [graph.c:301:glusterfs_graph_validate_options] 0-testvol-readdir-ahead: validation failed: '2147483648' in 'option rda-cache-limit 2GB' is out of range [0 - 1073741824]
[2017-04-25 10:45:06.760490] E [MSGID: 101090] [graph.c:672:glusterfs_graph_activate] 0-graph: validate options failed
[2017-04-25 10:45:06.760779] W [glusterfsd.c:1288:cleanup_and_exit] (-->/usr/sbin/glusterfs(mgmt_getspec_cbk+0x3c1) [0x7f5c85fe6471] -->/usr/sbin/glusterfs(glusterfs_process_volfp+0x1b1) [0x7f5c85fe0831] -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x7f5c85fdfd6b] ) 0-: received signum (1), shutting down
[2017-04-25 10:45:06.760798] I [fuse-bridge.c:5803:fini] 0-fuse: Unmounting '/gluster-mount'.
~                                                                                           


Mount succeeds when the option is set to on again.                                                        






Version-Release number of selected component (if applicable):
-------------------------------------------------------------

3.8.4-23

How reproducible:
-----------------

Every which way I try.

Additional info:
---------------

[root@gqas013 ~]# gluster v info
 
Volume Name: testvol
Type: Distributed-Replicate
Volume ID: 7f5ae046-00d8-428c-a3f4-75e4f7515a82
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: gqas013.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick0
Brick2: gqas005.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick1
Brick3: gqas006.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick2
Brick4: gqas008.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick3
Options Reconfigured:
performance.rda-cache-limit: 1GB
performance.parallel-readdir: on
server.allow-insecure: on
performance.stat-prefetch: off
transport.address-family: inet
nfs.disable: on
[root@gqas013 ~]#

--- Additional comment from Red Hat Bugzilla Rules Engine on 2017-04-25 06:47:50 EDT ---

This bug is automatically being proposed for the current release of Red Hat Gluster Storage 3 under active development, by setting the release flag 'rhgs‑3.3.0' to '?'. 

If this bug should be proposed for a different release, please manually change the proposed release flag.

--- Additional comment from Ambarish on 2017-04-25 06:49:35 EDT ---

The only way to reproduce the bug is when rda cache limit is > 1GB.

--- Additional comment from Ambarish on 2017-04-25 07:07:06 EDT ---

sos reports here :

http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1445246

gqac* are the clients,gqas* are the servers.

--- Additional comment from Mohammed Rafi KC on 2017-05-02 08:21:28 EDT ---

I couldn't see any description when I logged in using my gmail account, Can you please provide some public description ?

--- Additional comment from Poornima G on 2017-05-08 05:11:08 EDT ---

Description of problem:
-----------------------

2*2 volume,trying too mount via FUSE .

Enable parallel readdir,then set cache limit to > 1G,Turn off parallel readdir and try to mount the volume.

Mount fails.

Snippet from mount logs :

[2017-04-25 10:45:06.698688] I [MSGID: 100030] [glusterfsd.c:2417:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.8.4 (args: /usr/sbin/glusterfs --volfile-server=gqas013.sbu.lab.eng.bos.redhat.com --volfile-id=/testvol /gluster-mount)
[2017-04-25 10:45:06.706458] I [MSGID: 101190] [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2017-04-25 10:45:06.760460] E [MSGID: 101028] [options.c:168:xlator_option_validate_sizet] 0-testvol-readdir-ahead: '2147483648' in 'option rda-cache-limit 2GB' is out of range [0 - 1073741824]
[2017-04-25 10:45:06.760476] W [MSGID: 101029] [options.c:945:xl_opt_validate] 0-testvol-readdir-ahead: validate of rda-cache-limit returned -1
[2017-04-25 10:45:06.760484] E [MSGID: 101090] [graph.c:301:glusterfs_graph_validate_options] 0-testvol-readdir-ahead: validation failed: '2147483648' in 'option rda-cache-limit 2GB' is out of range [0 - 1073741824]
[2017-04-25 10:45:06.760490] E [MSGID: 101090] [graph.c:672:glusterfs_graph_activate] 0-graph: validate options failed
[2017-04-25 10:45:06.760779] W [glusterfsd.c:1288:cleanup_and_exit] (-->/usr/sbin/glusterfs(mgmt_getspec_cbk+0x3c1) [0x7f5c85fe6471] -->/usr/sbin/glusterfs(glusterfs_process_volfp+0x1b1) [0x7f5c85fe0831] -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x7f5c85fdfd6b] ) 0-: received signum (1), shutting down
[2017-04-25 10:45:06.760798] I [fuse-bridge.c:5803:fini] 0-fuse: Unmounting '/gluster-mount'.
~                                                                                           


Mount succeeds when the option is set to on again.                                                        






Version-Release number of selected component (if applicable):
-------------------------------------------------------------

3.8.4-23

How reproducible:
-----------------

Every which way I try.

Additional info:
---------------

[root@gqas013 ~]# gluster v info
 
Volume Name: testvol
Type: Distributed-Replicate
Volume ID: 7f5ae046-00d8-428c-a3f4-75e4f7515a82
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:

--- Additional comment from Worker Ant on 2017-05-19 02:21:32 EDT ---

REVIEW: https://review.gluster.org/17338 (rda, glusterd: Change the max of rda-cache-limit to INFINITY) posted (#1) for review on master by Poornima G (pgurusid)

--- Additional comment from Worker Ant on 2017-05-19 07:55:01 EDT ---

REVIEW: https://review.gluster.org/17338 (rda, glusterd: Change the max of rda-cache-limit to INFINITY) posted (#2) for review on master by Poornima G (pgurusid)

--- Additional comment from Worker Ant on 2017-05-21 01:19:19 EDT ---

COMMIT: https://review.gluster.org/17338 committed in master by Atin Mukherjee (amukherj) 
------
commit e43b40296956d132c70ffa3aa07b0078733b39d4
Author: Poornima G <pgurusid>
Date:   Fri May 19 11:09:13 2017 +0530

    rda, glusterd: Change the max of rda-cache-limit to INFINITY
    
    Issue:
    The max value of rda-cache-limit is 1GB before this patch.
    When parallel-readdir is enabled, there will be many instances of
    readdir-ahead, hence the rda-cache-limit depends on the number of
    instances. Eg: On a volume with distribute count 4, rda-cache-limit
    when parallel-readdir is enabled, will be 4GB instead of 1GB.
    Consider a followinf sequence of operations:
    - Enable parallel readdir
    - Set rda-cache-limit to lets say 3GB
    - Disable parallel-readdir, this results in one instance of readdir-ahead
      and the rda-cache-limit will be back to 1GB, but the current value is 3GB
      and hence the mount will stop working as 3GB > max 1GB.
    
    Solution:
    To fix this, we can limit the cache to 1GB even when parallel-readdir
    is enabled. But there is no necessity to limit the cache to 1GB, it
    can be increased if the system has enough resources. Hence getting rid
    of the rda-cache-limit max value is more apt. If we just change the
    rda-cache-limit max to INFINITY, we will render older(<3.11) clients
    broken, when the rda-cache-limit is set to > 1GB (as the older clients
    still expect a value < 1GB). To safely change the max value of
    rda-cache-limit to INFINITY, add a check in glusted to verify all
    the clients are > 3.11 if the value exceeds 1GB.
    
    Change-Id: Id0cdda3b053287b659c7bf511b13db2e45b92032
    BUG: 1446516
    Signed-off-by: Poornima G <pgurusid>
    Reviewed-on: https://review.gluster.org/17338
    Smoke: Gluster Build System <jenkins.org>
    Reviewed-by: Atin Mukherjee <amukherj>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>

Comment 1 Worker Ant 2017-05-22 09:31:25 UTC
REVIEW: https://review.gluster.org/17354 (rda, glusterd: Change the max of rda-cache-limit to INFINITY) posted (#1) for review on release-3.11 by Poornima G (pgurusid)

Comment 2 Worker Ant 2017-05-22 15:05:47 UTC
COMMIT: https://review.gluster.org/17354 committed in release-3.11 by Shyamsundar Ranganathan (srangana) 
------
commit 42fc1abdb41817b691cda87ddc7ea94129279475
Author: Poornima G <pgurusid>
Date:   Fri May 19 11:09:13 2017 +0530

    rda, glusterd: Change the max of rda-cache-limit to INFINITY
    
    Issue:
    The max value of rda-cache-limit is 1GB before this patch.
    When parallel-readdir is enabled, there will be many instances of
    readdir-ahead, hence the rda-cache-limit depends on the number of
    instances. Eg: On a volume with distribute count 4, rda-cache-limit
    when parallel-readdir is enabled, will be 4GB instead of 1GB.
    Consider a followinf sequence of operations:
    - Enable parallel readdir
    - Set rda-cache-limit to lets say 3GB
    - Disable parallel-readdir, this results in one instance of readdir-ahead
      and the rda-cache-limit will be back to 1GB, but the current value is 3GB
      and hence the mount will stop working as 3GB > max 1GB.
    
    Solution:
    To fix this, we can limit the cache to 1GB even when parallel-readdir
    is enabled. But there is no necessity to limit the cache to 1GB, it
    can be increased if the system has enough resources. Hence getting rid
    of the rda-cache-limit max value is more apt. If we just change the
    rda-cache-limit max to INFINITY, we will render older(<3.11) clients
    broken, when the rda-cache-limit is set to > 1GB (as the older clients
    still expect a value < 1GB). To safely change the max value of
    rda-cache-limit to INFINITY, add a check in glusted to verify all
    the clients are > 3.11 if the value exceeds 1GB.
    
    >Reviewed-on: https://review.gluster.org/17338
    >Smoke: Gluster Build System <jenkins.org>
    >Reviewed-by: Atin Mukherjee <amukherj>
    >NetBSD-regression: NetBSD Build System <jenkins.org>
    >CentOS-regression: Gluster Build System <jenkins.org>
    >(cherry picked from commit e43b40296956d132c70ffa3aa07b0078733b39d4)
    
    Change-Id: Id0cdda3b053287b659c7bf511b13db2e45b92032
    BUG: 1453152
    Signed-off-by: Poornima G <pgurusid>
    Reviewed-on: https://review.gluster.org/17354
    Smoke: Gluster Build System <jenkins.org>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: Shyamsundar Ranganathan <srangana>

Comment 3 Worker Ant 2017-05-26 10:38:49 UTC
REVIEW: https://review.gluster.org/17400 (nl-cache: Remove the max limit for nl-cache-limit and nl-cache-timeout) posted (#1) for review on release-3.11 by Poornima G (pgurusid)

Comment 4 Worker Ant 2017-05-26 16:34:21 UTC
COMMIT: https://review.gluster.org/17400 committed in release-3.11 by Shyamsundar Ranganathan (srangana) 
------
commit 1f1f66ef6662ee84f13d49911cdf72556b1c73ef
Author: Poornima G <pgurusid>
Date:   Fri May 12 10:27:28 2017 +0530

    nl-cache: Remove the max limit for nl-cache-limit and nl-cache-timeout
    
    The max limit is better unset when arbitrary. Otherwise in the future
    if max has to be changed, it can break backward compatility.
    
    >Reviewed-on: https://review.gluster.org/17261
    >Smoke: Gluster Build System <jenkins.org>
    >NetBSD-regression: NetBSD Build System <jenkins.org>
    >CentOS-regression: Gluster Build System <jenkins.org>
    >Reviewed-by: Jeff Darcy <jeff.us>
    >(cherry picked from commit 64f41b962b643b966e376a10a16671c569bf6299)
    
    Change-Id: I4337a3789a2d0d5cc8e2bf687a22536c97608461
    BUG: 1453152
    Signed-off-by: Poornima G <pgurusid>
    Reviewed-on: https://review.gluster.org/17400
    Smoke: Gluster Build System <jenkins.org>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: Shyamsundar Ranganathan <srangana>

Comment 5 Shyamsundar 2017-05-30 18:53:12 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.11.0, please open a new bug report.

glusterfs-3.11.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/announce/2017-May/000073.html
[2] https://www.gluster.org/pipermail/gluster-users/


Note You need to log in before you can comment on or make changes to this bug.