Bug 1000131 - Users Belonging To Many Groups Cannot Access Mounted Volume
Users Belonging To Many Groups Cannot Access Mounted Volume
Status: CLOSED CURRENTRELEASE
Product: GlusterFS
Classification: Community
Component: rpc (Show other bugs)
3.4.0
x86_64 Linux
unspecified Severity high
: ---
: ---
Assigned To: GlusterFS Bugs list
:
Depends On:
Blocks: 1000957
  Show dependency treegraph
 
Reported: 2013-08-22 15:07 EDT by Neil Stoddard
Modified: 2014-04-17 09:14 EDT (History)
3 users (show)

See Also:
Fixed In Version: glusterfs-3.4.3
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1000957 (view as bug list)
Environment:
Last Closed: 2014-04-17 09:14:02 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Neil Stoddard 2013-08-22 15:07:42 EDT
Description of problem:
When a user with a large number of group memberships (>100) attempts to access a mounted gluster volume they get error:

cannot open directory .: Transport endpoint is not connected

The mnt-gluster.log shows:

[2013-08-21 20:37:18.486289] W [xdr-rpcclnt.c:79:rpc_request_to_xdr] 0-rpc: failed to encode call msg
[2013-08-21 20:37:18.486302] E [rpc-clnt.c:1251:rpc_clnt_record_build_record] 0-gv0-client-0: Failed to build record header
[2013-08-21 20:37:18.486310] W [rpc-clnt.c:1311:rpc_clnt_record] 0-gv0-client-0: cannot build rpc-record
[2013-08-21 20:37:18.486317] W [rpc-clnt.c:1452:rpc_clnt_submit] 0-gv0-client-0: cannot build rpc-record
[2013-08-21 20:37:18.486327] W [client-rpc-fops.c:2624:client3_3_lookup_cbk] 0-gv0-client-0: remote operation failed: Transport endpoint is not connected. Path: /Distributions (00000000-0000-0000-0000-000000000000)

I attached gdb to the glusterfs and was able to get it to break into rpc-clnt in rpc_clnt_record_build_record.  The problem happens in the call to xdr_sizeof (around line 1232 I believe).  The glibc library has a limit of 400 Bytes for auth data when encoding this specific rpc message. (in glibc you can look at sunrpc/rpc/auth.h) And as a result xdr_sizeof returns 0 for users with large numbers of group memberships.  This may be an issue that should be fixed in glibc, but since I'm rather a novice with this code I thought I'd file it as a bug here first.

This is an issue for us when integrating gluster with samba using active directory authentication where sometimes users belong to over 100 groups.

Version-Release number of selected component (if applicable):


How reproducible:
I am able to easily reproduce it easily with our user directory, but it relies on active directory.  I imagine it would also be reproducible if you create >100 groups on a single system and add a user to all of them, although I haven't had time to try it yet.  Then try to access a mounted gluster volume.

Steps to Reproduce:
I believe the following would reproduce, I have not had time to try this yet.
1.Create >100 groups
2.Add a user to all these groups
3.From the user attempt to access a mounted gluster volume.

Actual results:
cannot open directory .: Transport endpoint is not connected
When trying to access a gluster mount.

Expected results:
Be able to access the gluster mount.

Additional info:
I'm currently building glibc with a larger MAX_AUTH_BYTES value in sunrpc/rpc/auth.h to see if the issue resolves, I'll post here once it finishes building...

As I mentioned this may be an issue to fix in glibc rather than gluster.
Comment 1 Neil Stoddard 2013-08-22 17:25:22 EDT
Rebuilding glibc with MAX_AUTH_BYTES set to 1024 (may be excessive) allows users to access a mounted gluster volume.
Comment 2 Anand Avati 2013-08-22 17:33:57 EDT
REVIEW: http://review.gluster.org/5695 (rpc: fix typo which refers glibc macro) posted (#1) for review on master by Anand Avati (avati@redhat.com)
Comment 3 Anand Avati 2013-08-22 17:44:34 EDT
Hah! turns out to be a long standing bug.. This was a harmless typo initially when we used RPCSVC_MAX_AUTH_DATA everywhere else and that value was also 400. When we replaced RPCSVC_MAX_AUTH_DATA with GF_MAX_AUTH_DATA as 2048, this location was left out (sed does not detect typos!), and the harmless typo became harmful :-)

Please test the patch http://review.gluster.org/5695 and vote on it. Thanks.
Comment 4 Anand Avati 2013-08-23 06:34:47 EDT
COMMIT: http://review.gluster.org/5695 committed in master by Vijay Bellur (vbellur@redhat.com) 
------
commit d64df6a92c2492812ef7c23cc133f5d7a113ec42
Author: Anand Avati <avati@redhat.com>
Date:   Thu Aug 22 14:14:22 2013 -0700

    rpc: fix typo which refers glibc macro
    
    A typo which read MAX_AUTH_BYTES instead of GF_MAX_AUTH_BYTES was
    picking the value 400 instead of the larger 2048. This causes
    failures when number of aux group ids is a large number.
    
    Change-Id: Idb8d59aee2690fd53e24c2e09f58a16fe387ef27
    BUG: 1000131
    Signed-off-by: Anand Avati <avati@redhat.com>
    Reviewed-on: http://review.gluster.org/5695
    Tested-by: Gluster Build System <jenkins@build.gluster.com>
    Reviewed-by: Amar Tumballi <amarts@redhat.com>
    Reviewed-by: Vijay Bellur <vbellur@redhat.com>
Comment 5 Neil Stoddard 2013-08-23 09:57:17 EDT
You folks rock!  I didn't expect such a fast response on this issue.  

Unfortunately the patch doesn't fix the problem.

(The fix you made looked like it was also wrong, but not this specific issue).  

I'm a novice with the gluster code base but if I'm reading things right in rpc/rpc-lib/src/rpc-clnt.c you make the following call on line 1232:

xdr_size = xdr_sizeof ((xdrproc_t)xdr_callmsg, &request);

xdr_callmsg is defined in glibc in sunrpc/rpc_cmsg.c (grep of gluster didn't find any redefinition).  It looks like xdr_sizeof builds up a list of operations to perform and then calls the passed in function (xdr_callmsg).  The source for this in turn does:

      if (cmsg->rm_call.cb_cred.oa_length > MAX_AUTH_BYTES)
        {
          return (FALSE);
        }
      if (cmsg->rm_call.cb_verf.oa_length > MAX_AUTH_BYTES)
        {
          return (FALSE);
        }
 
This causes xdr_sizeof to return 0.

And there goes my problem.  So even though you use GF_MAX_AUTH_BYTES in gluster code, a glibc function gets called that uses MAX_AUTH_BYTES.  I may totally be reading this wrong, since I just started looking at this code for the first time the day before last, but that is where I believe the problem lies.
Comment 6 Neil Stoddard 2013-08-23 10:00:27 EDT
Sorry, I misspoke.  The fix you made was RIGHT (not wrong), that code was originally incorrect too, but the patch does not fix my issue.
Comment 7 Anand Avati 2013-09-09 20:29:56 EDT
REVIEW: http://review.gluster.org/5854 (rpc: fix typo which refers glibc macro) posted (#1) for review on release-3.4 by Anand Avati (avati@redhat.com)
Comment 8 Anand Avati 2013-09-09 20:49:44 EDT
REVIEW: http://review.gluster.org/5854 (rpc: fix typo which refers glibc macro) posted (#2) for review on release-3.4 by Anand Avati (avati@redhat.com)
Comment 9 Anand Avati 2013-09-09 21:11:32 EDT
REVIEW: http://review.gluster.org/5854 (rpc: fix typo which refers glibc macro) posted (#3) for review on release-3.4 by Anand Avati (avati@redhat.com)
Comment 10 Anand Avati 2013-09-10 04:28:08 EDT
COMMIT: http://review.gluster.org/5854 committed in release-3.4 by Vijay Bellur (vbellur@redhat.com) 
------
commit f43a223ad1e53041f46b351aa260203ea0685613
Author: Anand Avati <avati@redhat.com>
Date:   Thu Aug 22 14:14:22 2013 -0700

    rpc: fix typo which refers glibc macro
    
    A typo which read MAX_AUTH_BYTES instead of GF_MAX_AUTH_BYTES was
    picking the value 400 instead of the larger 2048. This causes
    failures when number of aux group ids is a large number.
    
    Change-Id: Idb8d59aee2690fd53e24c2e09f58a16fe387ef27
    BUG: 1000131
    Signed-off-by: Anand Avati <avati@redhat.com>
    Reviewed-on: http://review.gluster.org/5854
    Tested-by: Gluster Build System <jenkins@build.gluster.com>
    Reviewed-by: Vijay Bellur <vbellur@redhat.com>
Comment 11 Niels de Vos 2014-04-17 09:14:02 EDT
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.4.3, please reopen this bug report.

glusterfs-3.4.3 has been announced on the Gluster Developers mailinglist [1], packages for several distributions should already be or become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

The fix for this bug likely to be included in all future GlusterFS releases i.e. release > 3.4.3. In the same line the recent release i.e. glusterfs-3.5.0 [3] likely to have the fix. You can verify this by reading the comments in this bug report and checking for comments mentioning "committed in release-3.5".

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/5978
[2] http://news.gmane.org/gmane.comp.file-systems.gluster.user
[3] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/6137

Note You need to log in before you can comment on or make changes to this bug.