Bug 1445246

Summary: [Parallel Readdir] : Mounts fail when performance.parallel-readdir is set to "off"
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Ambarish <asoman>
Component: glusterfsAssignee: Poornima G <pgurusid>
Status: CLOSED ERRATA QA Contact: Ambarish <asoman>
Severity: high Docs Contact:
Priority: unspecified    
Version: rhgs-3.3CC: amukherj, bturner, rhinduja, rhs-bugs, skoduri, vbellur, vdas
Target Milestone: ---   
Target Release: RHGS 3.3.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: glusterfs-3.8.4-26 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1446516 (view as bug list) Environment:
Last Closed: 2017-09-21 04:39:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1438245    
Bug Blocks: 1417151, 1446516, 1453152    

Description Ambarish 2017-04-25 10:47:45 UTC
Description of problem:
-----------------------

2*2 volume,trying too mount via FUSE .

Enable parallel readdir,then set cache limit to > 1G,Turn off parallel readdir and try to mount the volume.

Mount fails.

Snippet from mount logs :

[2017-04-25 10:45:06.698688] I [MSGID: 100030] [glusterfsd.c:2417:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.8.4 (args: /usr/sbin/glusterfs --volfile-server=gqas013.sbu.lab.eng.bos.redhat.com --volfile-id=/testvol /gluster-mount)
[2017-04-25 10:45:06.706458] I [MSGID: 101190] [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2017-04-25 10:45:06.760460] E [MSGID: 101028] [options.c:168:xlator_option_validate_sizet] 0-testvol-readdir-ahead: '2147483648' in 'option rda-cache-limit 2GB' is out of range [0 - 1073741824]
[2017-04-25 10:45:06.760476] W [MSGID: 101029] [options.c:945:xl_opt_validate] 0-testvol-readdir-ahead: validate of rda-cache-limit returned -1
[2017-04-25 10:45:06.760484] E [MSGID: 101090] [graph.c:301:glusterfs_graph_validate_options] 0-testvol-readdir-ahead: validation failed: '2147483648' in 'option rda-cache-limit 2GB' is out of range [0 - 1073741824]
[2017-04-25 10:45:06.760490] E [MSGID: 101090] [graph.c:672:glusterfs_graph_activate] 0-graph: validate options failed
[2017-04-25 10:45:06.760779] W [glusterfsd.c:1288:cleanup_and_exit] (-->/usr/sbin/glusterfs(mgmt_getspec_cbk+0x3c1) [0x7f5c85fe6471] -->/usr/sbin/glusterfs(glusterfs_process_volfp+0x1b1) [0x7f5c85fe0831] -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x7f5c85fdfd6b] ) 0-: received signum (1), shutting down
[2017-04-25 10:45:06.760798] I [fuse-bridge.c:5803:fini] 0-fuse: Unmounting '/gluster-mount'.
~                                                                                           


Mount succeeds when the option is set to on again.                                                        






Version-Release number of selected component (if applicable):
-------------------------------------------------------------

3.8.4-23

How reproducible:
-----------------

Every which way I try.

Additional info:
---------------

[root@gqas013 ~]# gluster v info
 
Volume Name: testvol
Type: Distributed-Replicate
Volume ID: 7f5ae046-00d8-428c-a3f4-75e4f7515a82
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: gqas013.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick0
Brick2: gqas005.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick1
Brick3: gqas006.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick2
Brick4: gqas008.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick3
Options Reconfigured:
performance.rda-cache-limit: 1GB
performance.parallel-readdir: on
server.allow-insecure: on
performance.stat-prefetch: off
transport.address-family: inet
nfs.disable: on
[root@gqas013 ~]#

Comment 2 Ambarish 2017-04-25 10:49:35 UTC
The only way to reproduce the bug is when rda cache limit is > 1GB.

Comment 4 Poornima G 2017-05-08 09:22:07 UTC
Fixing BZ: 1438245 will fix this issue as well.

Comment 5 Poornima G 2017-05-08 09:27:46 UTC
When parallel readdir is enabled, the cache limit is (dist count * 1GB). Lets say the cache limit was set to 2 GB and then parallel readdir was disabled, the mount fails as the rda instance is only one(without parallel readdir) and the cache limit is set to 2GB more than the limit(1GB)

Comment 6 Vivek Das 2017-05-16 14:40:57 UTC
This is even reproducible for cifs as well.
Enable parallel readdir,then set cache limit to > 2G,Turn off parallel readdir and try to do a cifs mount.

Mount fails.

Comment 9 Atin Mukherjee 2017-05-19 06:45:30 UTC
upstream patch : https://review.gluster.org/#/c/17338/

Comment 10 Atin Mukherjee 2017-05-22 11:47:04 UTC
downstream patch : https://code.engineering.redhat.com/gerrit/#/c/106815/

Comment 12 Ambarish 2017-06-09 13:51:54 UTC
Verified on 3.8.4-27.

Subsequent mounts succeed post disabling paralel readdir,even after setting rda cache limit to a high value.

Comment 14 errata-xmlrpc 2017-09-21 04:39:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2774