Bug 1437332 - auth failure after upgrade to GlusterFS 3.10
Summary: auth failure after upgrade to GlusterFS 3.10
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: protocol
Version: rhgs-3.3
Hardware: Unspecified
OS: Linux
unspecified
urgent
Target Milestone: ---
: RHGS 3.3.0
Assignee: Atin Mukherjee
QA Contact: Nag Pavan Chilakam
URL:
Whiteboard: brick-multiplexing
Depends On: 1429117 1433815
Blocks: 1417151
TreeView+ depends on / blocked
 
Reported: 2017-03-30 06:03 UTC by Atin Mukherjee
Modified: 2019-03-19 13:23 UTC (History)
10 users (show)

Fixed In Version: glusterfs-3.8.4-21
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1433815
Environment:
Last Closed: 2017-09-21 04:35:56 UTC
Embargoed:
bmekala: needinfo-


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2017:2774 0 normal SHIPPED_LIVE glusterfs bug fix and enhancement update 2017-09-21 08:16:29 UTC

Description Atin Mukherjee 2017-03-30 06:03:54 UTC
+++ This bug was initially created as a clone of Bug #1433815 +++

+++ This bug was initially created as a clone of Bug #1429117 +++

Description of problem:
We enabled the IP based auth feature with
gluster volume set store_temp auth.allow xxx.xxx.xxx...
This worked fine up to GlusterFS 3.9. After upgrading to 3.10, we noticed that we cannot mount any volume from a remove client anymore.
Looking at the brick logs we found:

[2017-03-04 15:56:17.469490] I [MSGID: 115091] [server-handshake.c:659:server_setvolume] 0-store_temp-server: Failed to get client opversion
[2017-03-04 15:56:17.469520] E [MSGID: 115004] [authenticate.c:224:gf_authenticate] 0-auth: no authentication module is interested in accepting remote-client (null)
[2017-03-04 15:56:17.469602] E [MSGID: 115001] [server-handshake.c:718:server_setvolume] 0-store_temp-server: Cannot authenticate client from backupserver-9596-2017/03/04-15:56:17:438653-store_temp-client-2-0-0 3.9.1 [Permission denied]
[2017-03-04 15:56:28.472405] I [MSGID: 115036] [server.c:559:server_rpc_notify] 0-store_temp-server: disconnecting connection from backupserver-9596-2017/03/04-15:56:17:438653-store_temp-client-2-0-0
[2017-03-04 15:56:28.472518] I [MSGID: 101055] [client_t.c:436:gf_client_unref] 0-store_temp-server: Shutting down connection backupserver-9596-2017/03/04-15:56:17:438653-store_temp-client-2-0-0

This problem exists even when creating completely new volumes. We already restarted and even rebooted all GlusterFS peers and the clients as well. All peers and all clients have been upgraded to 3.10


Version-Release number of selected component (if applicable):
3.10

How reproducible:
-Create a new volume 
-enable auth.allow based on IPs

Steps to Reproduce:
1. gluster volume create store_temp disperse 3 redundancy 1 ...
2. gluster volume set store_temp auth.allow xxx.xxx.xxx.xxx
3. gluster volume start store_temp
4. gluster mount ... (on a client)

Actual results:
-error message at clients "failed to set the volume [Permission denied]"
-error message at server: "no authentication module is interested in accepting remote-client (null)"

Expected results:
successful mount

Additional info:
Ubuntu 16.04

--- Additional comment from Jiffin on 2017-03-07 07:29:04 EST ---

Can you provide entire logs including bricks,glusterd and glusterfs client.
Also it will be easier if can take the tcdump from server and client

--- Additional comment from Jonathan Michalon on 2017-03-07 09:11:19 EST ---

I am stumbling on the same problem.
Setting log level to DEBUG (gluster volume set volname diagnostics.brick-log-level DEBUG) I got this first interesting stuff:
  allowed = "192.168.122.186", received addr = "R"
Then some time afterwards:
  allowed = "192.168.122.186", received addr = "m"

So it was looking like we were reading some random memory. And indeed looking into source code, between 3.9 and 3.10 the big switch/case filling peer_addr disappeared in /xlators/protocol/auth/addr/src/addr.c 
I think this is enough to tell that there is some problem here :)

--- Additional comment from Atin Mukherjee on 2017-03-13 02:00:47 EDT ---

auth failures need not be in glusterd, moving this to core component.

--- Additional comment from Yong on 2017-03-19 03:35:18 EDT ---

I have the same issue, I think this is critical, please help

--- Additional comment from Worker Ant on 2017-03-19 21:00:07 EDT ---

REVIEW: https://review.gluster.org/16920 (protocol : fix auth-allow regression) posted (#1) for review on master by Atin Mukherjee (amukherj)

--- Additional comment from Worker Ant on 2017-03-20 16:03:19 EDT ---

REVIEW: https://review.gluster.org/16920 (protocol : fix auth-allow regression) posted (#2) for review on master by Atin Mukherjee (amukherj)

--- Additional comment from Worker Ant on 2017-03-24 14:22:14 EDT ---

REVIEW: https://review.gluster.org/16920 (protocol : fix auth-allow regression) posted (#3) for review on master by Atin Mukherjee (amukherj)

--- Additional comment from Worker Ant on 2017-03-27 13:11:09 EDT ---

REVIEW: https://review.gluster.org/16920 (protocol : fix auth-allow regression) posted (#4) for review on master by Atin Mukherjee (amukherj)

--- Additional comment from Worker Ant on 2017-03-27 13:24:54 EDT ---

REVIEW: https://review.gluster.org/16920 (protocol : fix auth-allow regression) posted (#5) for review on master by Atin Mukherjee (amukherj)

--- Additional comment from Worker Ant on 2017-03-28 01:38:03 EDT ---

REVIEW: https://review.gluster.org/16920 (protocol : fix auth-allow regression) posted (#6) for review on master by Atin Mukherjee (amukherj)

--- Additional comment from Worker Ant on 2017-03-28 16:04:33 EDT ---

REVIEW: https://review.gluster.org/16920 (protocol : fix auth-allow regression) posted (#7) for review on master by Jeff Darcy (jeff.us)

--- Additional comment from Worker Ant on 2017-03-28 17:51:31 EDT ---

REVIEW: https://review.gluster.org/16920 (protocol : fix auth-allow regression) posted (#8) for review on master by Jeff Darcy (jeff.us)

--- Additional comment from Worker Ant on 2017-03-28 18:05:24 EDT ---

REVIEW: https://review.gluster.org/16920 (protocol : fix auth-allow regression) posted (#9) for review on master by Jeff Darcy (jeff.us)

--- Additional comment from Worker Ant on 2017-03-28 18:24:02 EDT ---

REVIEW: https://review.gluster.org/16920 (protocol : fix auth-allow regression) posted (#10) for review on master by Jeff Darcy (jeff.us)

--- Additional comment from Worker Ant on 2017-03-30 01:57:02 EDT ---

COMMIT: https://review.gluster.org/16920 committed in master by Atin Mukherjee (amukherj) 
------
commit 0bd58241143e91b683a3e5c4335aabf9eed537fe
Author: Atin Mukherjee <amukherj>
Date:   Mon Mar 20 05:15:25 2017 +0530

    protocol : fix auth-allow regression
    
    One of the brick multiplexing patches (commit 1a95fc3) had some changes
    in gf_auth () & server_setvolume () functions which caused auth-allow
    feature to be broken. mount doesn't succeed even if it's part of the
    auth-allow list. This fix does the following:
    
    1. Reintroduce the peer-info data back in gf_auth () so that fnmatch has
    valid input and it can decide on the result.
    
    2. config-params dict should capture key values pairs for all the bricks
    in case brick multiplexing is on. In case brick multiplexing isn't
    enabled, then config-params should carry attributes from protocol/server
    such that all rpc auth related attributes stay in tact in the
    dictionary.
    
    Change-Id: I007c4c6d78620a896b8858a29459a77de8b52412
    BUG: 1433815
    Signed-off-by: Atin Mukherjee <amukherj>
    Reviewed-on: https://review.gluster.org/16920
    Tested-by: Jeff Darcy <jeff.us>
    Smoke: Gluster Build System <jenkins.org>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: Jeff Darcy <jeff.us>
    Reviewed-by: MOHIT AGRAWAL <moagrawa>

Comment 2 Atin Mukherjee 2017-03-30 06:07:04 UTC
upstream patch : https://review.gluster.org/16920

Comment 4 Atin Mukherjee 2017-04-03 10:47:18 UTC
downstream patch : https://code.engineering.redhat.com/gerrit/#/c/102295/

Comment 7 Nag Pavan Chilakam 2017-07-10 09:50:03 UTC
onqa_validation:

1) I tested by setting auth.allow for an IP (10.70.35.103), only this client was able to access ie mount the volume, any other IP was unable to mount

2) tried some combinations with auth.reject and the functionality was all sane and good

3)However if I set the auth allow to xxx.xxx.xxx.xxx(as mentioned in bug description), then none of the IPs are allowed to access the volume.

Talked with Atin, and got to know that the fix is for scenario#1(which was not working before this fix), and fqdn fix is not yet in, which will  be a seperate fix.
Based on the above discussion , moving to verified

test version:3.8.4-32

Comment 8 Nag Pavan Chilakam 2017-07-10 09:50:49 UTC
(In reply to nchilaka from comment #7)
> onqa_validation:
> 
> 1) I tested by setting auth.allow for an IP (10.70.35.103), only this client
> was able to access ie mount the volume, any other IP was unable to mount
> 
> 2) tried some combinations with auth.reject and the functionality was all
> sane and good
> 
> 3)However if I set the auth allow to xxx.xxx.xxx.xxx(as mentioned in bug
> description), then none of the IPs are allowed to access the volume.
> 
> Talked with Atin, and got to know that the fix is for scenario#1(which was
> not working before this fix), and fqdn fix is not yet in, which will  be a
> seperate fix.
> Based on the above discussion , moving to verified
> 
> test version:3.8.4-32



Atin can you confirm if this is fine?

Comment 12 errata-xmlrpc 2017-09-21 04:35:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2774


Note You need to log in before you can comment on or make changes to this bug.