Bug 1337811

Summary:	[GSS] - enabling glusternfs with nfs.rpc-auth-allow to many hosts failed
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	Prashant Dhange <pdhange>
Component:	gluster-nfs	Assignee:	Bipin Kunal <bkunal>
Status:	CLOSED ERRATA	QA Contact:	Manisha Saini <msaini>
Severity:	high	Docs Contact:
Priority:	high
Version:	rhgs-3.1	CC:	amukherj, asrivast, bkunal, olim, rhinduja, rhs-bugs, rnalakka, skoduri, storage-qa-internal
Target Milestone:	---	Keywords:	Patch, ZStream
Target Release:	RHGS 3.2.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	glusterfs-3.8.4-2	Doc Type:	Bug Fix
Doc Text:	Previously, when 'showmount' was run, the structure of data passed from the mount protocol meant that the groupnodes defined in the nfs.rpc-auth-allow volume option were handled as a single string, which caused errors when the string of groupnodes was longer than 255 characters. This single string is now handled as a list of strings so that 'showmount' receives the correct number of hostnames.	Story Points:	---
Clone Of:
Clones:	1343286 (view as bug list)		Environment:
Last Closed:	2017-03-23 05:32:13 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1343286
Bug Blocks:	1351522, 1351530

Description Prashant Dhange 2016-05-20 06:45:46 UTC

Description of problem:
Shares with an nfs.rpc-auth-allow and a more than approx 400 character crashes the corresponding rpc services.

We do see below error message in /var/log/glusterfs/nfs.log:
[2016-05-18 23:54:30.348639] E [rpcsvc.c:565:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed to complete successfully
The message "E [MSGID: 112067] [nfs3.c:4702:nfs3_fsstat] 0-nfs-nfsv3: Bad Handle" repeated 2 times between [2016-05-18 23:53:30.731217] and [2016-05-18 23:54:31.785566]
[2016-05-18 23:54:31.785585] W [MSGID: 112199] [nfs3-helpers.c:3418:nfs3_log_common_res] 0-nfs-nfsv3: (null) => (XID: e3929939, FSSTAT: NFS: 10001(Illegal NFS file handle), POSIX: 14(Bad address)) [No data available]
[2016-05-18 23:54:31.785763] E [nfs3.c:341:__nfs3_get_volume_id] (-->/usr/lib64/glusterfs/3.7.5/xlator/nfs/server.so(nfs3_fsstat_reply+0x41) [0x7f4508728e01] -->/usr/lib64/glusterfs/3.7.5/xlator/nfs/server.so(nfs3_request_xlator_deviceid+0x78) [0x7f4508728478] -->/usr/lib64/glusterfs/3.7.5/xlator/nfs/server.so(__nfs3_get_volume_id+0xae) [0x7f45087283be] ) 0-nfs-nfsv3: invalid argument: xl [Invalid argument]
[2016-05-18 23:54:31.960984] E [MSGID: 112067] [nfs3.c:4702:nfs3_fsstat] 0-nfs-nfsv3: Bad Handle
[2016-05-18 23:54:31.961021] W [MSGID: 112199] [nfs3-helpers.c:3418:nfs3_log_common_res] 0-nfs-nfsv3: (null) => (XID: e4929939, FSSTAT: NFS: 10001(Illegal NFS file handle), POSIX: 14(Bad address)) [Invalid argument]
[2016-05-18 23:54:31.961073] E [nfs3.c:341:__nfs3_get_volume_id] (-->/usr/lib64/glusterfs/3.7.5/xlator/nfs/server.so(nfs3_fsstat_reply+0x41) [0x7f4508728e01] -->/usr/lib64/glusterfs/3.7.5/xlator/nfs/server.so(nfs3_request_xlator_deviceid+0x78) [0x7f4508728478] -->/usr/lib64/glusterfs/3.7.5/xlator/nfs/server.so(__nfs3_get_volume_id+0xae) [0x7f45087283be] ) 0-nfs-nfsv3: invalid argument: xl [Invalid argument]
[2016-05-18 23:54:33.252289] W [rpcsvc.c:278:rpcsvc_program_actor] 0-rpc-service: RPC program version not available (req 100003 4) for 10.222.160.25:776
[2016-05-18 23:54:33.252319] E [rpcsvc.c:565:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed to complete successfully
[2016-05-18 23:54:33.252825] W [rpcsvc.c:278:rpcsvc_program_actor] 0-rpc-service: RPC program version not available (req 100003 4) for 10.222.160.25:776
[2016-05-18 23:54:33.252841] E [rpcsvc.c:565:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed to complete successfully

The rpc service for glusternfs failing with invalid NFS handle.


Version-Release number of selected component (if applicable):


How reproducible:
reproducible


Steps to Reproduce:
1. Create gluster volume
2. Allow access to list of hosts using nfs.rpc-auth-allow (more than 400 characters)
# gluster volume set <vol-name> nfs.rpc-auth-allow ${HOSTS}
3. Show mount
# showmount -e <hostname>
rpc mount export: RPC: Timed out


Actual results:
rpc service failed

Expected results:
Should allow access to all hosts from ${HOSTS} list

Additional info:
sosreport is uploaded at :-
http://collab-shell.usersys.redhat.com/01637136/

Comment 2 Oonkwee Lim 2016-05-20 16:11:14 UTC

More information from customer:

It is complete reproducible on a test cluster (installed from rhgs 3.1.2 iso).
queries will timeout when nfs.rpc-auth-allow exceed 256 characters.

Steps to reproduce will follow in a private comment.

Comment 9 Atin Mukherjee 2016-08-06 03:27:55 UTC

Niels,

May I know the reason of moving back the component back to gluster-nfs? We have realligned the downstream components with upstream to keep a parity and in upstream we have nfs component and hence the change.

Comment 10 Niels de Vos 2016-08-06 08:14:17 UTC

(In reply to Atin Mukherjee from comment #9)
> Niels,
> 
> May I know the reason of moving back the component back to gluster-nfs? We
> have realligned the downstream components with upstream to keep a parity and
> in upstream we have nfs component and hence the change.

This is a Gluster/NFS (gNFS) bug, we use the "nfs" component for changes to GlusterFS in relation with NFS-Ganesha.

Comment 12 Niels de Vos 2016-10-17 10:14:09 UTC

A patch for this has been included in RHGS-3.2.0 since it contains a rebase of GlusterFS 3.8 (http://review.gluster.org/14700).

Comment 13 Atin Mukherjee 2016-10-17 15:04:10 UTC

Rahul - can this BZ be tested with latest build?

Comment 15 Manisha Saini 2016-11-28 12:47:09 UTC

Verified this Bug on glusterfs-3.8.4-5.el7rhgs.x86_64

Steps:

1.HOSTS=$(echo 192.168.10.{1..40} | tr ' ' ',')

2.[root@dhcp47-159 ganesha]# gluster volume set Vol1 nfs.rpc-auth-allow ${HOSTS}
volume set: success

3.[root@dhcp47-159 ganesha]# showmount -e localhost
Export list for localhost:
/Vol1 192.168.10.1,192.168.10.2,192.168.10.3,192.168.10.4,192.168.10.5,192.168.10.6,192.168.10.7,192.168.10.8,192.168.10.9,192.168.10.10,192.168.10.11,192.168.10.12,192.168.10.13,192.168.10.14,192.168.10.15,192.168.10.16,192.168.10.17,192.168.10.18,192.168.10.19,192.168.10.20,192.168.10.21,192.168.10.22,192.168.10.23,192.168.10.24,192.168.10.25,192.168.10.26,192.168.10.27,192.168.10.28,192.168.10.29,192.168.10.30,192.168.10.31,192.168.10.32,192.168.10.33,192.168.10.34,192.168.10.35,192.168.10.36,192.168.10.37,192.168.10.38,192.168.10.39,192.168.10.40


[root@dhcp46-241 ganesha]# gluster v info
 
Volume Name: Vol1
Type: Distributed-Replicate
Volume ID: 9678475b-3ecb-4f22-995b-346c5bcdecca
Status: Started
Snapshot Count: 0
Number of Bricks: 6 x 2 = 12
Transport-type: tcp
Bricks:
Brick1: 10.70.47.3:/mnt/data1/b1
Brick2: 10.70.47.159:/mnt/data1/b1
Brick3: 10.70.46.241:/mnt/data1/b1
Brick4: 10.70.46.219:/mnt/data1/b1
Brick5: 10.70.47.3:/mnt/data2/b2
Brick6: 10.70.47.159:/mnt/data2/b2
Brick7: 10.70.46.241:/mnt/data2/b2
Brick8: 10.70.46.219:/mnt/data2/b2
Brick9: 10.70.47.3:/mnt/data3/b3
Brick10: 10.70.47.159:/mnt/data3/b3
Brick11: 10.70.46.241:/mnt/data3/b3
Brick12: 10.70.46.219:/mnt/data3/b3
Options Reconfigured:
nfs.rpc-auth-allow: 192.168.10.1,192.168.10.2,192.168.10.3,192.168.10.4,192.168.10.5,192.168.10.6,192.168.10.7,192.168.10.8,192.168.10.9,192.168.10.10,192.168.10.11,192.168.10.12,192.168.10.13,192.168.10.14,192.168.10.15,192.168.10.16,192.168.10.17,192.168.10.18,192.168.10.19,192.168.10.20,192.168.10.21,192.168.10.22,192.168.10.23,192.168.10.24,192.168.10.25,192.168.10.26,192.168.10.27,192.168.10.28,192.168.10.29,192.168.10.30,192.168.10.31,192.168.10.32,192.168.10.33,192.168.10.34,192.168.10.35,192.168.10.36,192.168.10.37,192.168.10.38,192.168.10.39,192.168.10.40
transport.address-family: inet
performance.readdir-ahead: on
nfs.disable: off
nfs-ganesha: disable
cluster.enable-shared-storage: disable


As the issue reported is no more observed with this build,Hence marking this Bug as Verified

Comment 17 Bipin Kunal 2017-03-20 12:02:14 UTC

Doc-text Looks good to me.

Comment 19 errata-xmlrpc 2017-03-23 05:32:13 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0486.html