Previously, the brick processes did not consider rebalance processes to be trusted clients. As a consequence, if the auth.allow option was set for a volume, connections from the rebalance processes for that volume were rejected by the brick processes, causing rebalance to hang. With this fix, the rebalance process is treated as a trusted client by the brick processes. Now, the rebalance works even if the auth.allow option is set for a volume.
DescriptionCedric Buissart
2015-04-21 13:52:26 UTC
Description of problem:
When setting auth.allow, rebalance will get stuck unless the IPs of the gluster nodes themselves are included.
The rebalance will be kept as 'in progress', but will be kept at 0 Byte.
Node Rebalanced-files size scanned failures skipped status run time in secs
--------- ----------- ----------- ----------- ----------- ----------- ------------ --------------
localhost 0 0Bytes 0 0 0 in progress 0.00
rhs30-node3 0 0Bytes 0 0 0 in progress 0.00
rhs30-node4 0 0Bytes 0 0 0 in progress 0.00
192.168.100.206 0 0Bytes 0 0 0 in progress 0.00
volume rebalance: thingluster: success:
On the bricks logs, we can see the authentication being prevented :
[2015-04-21 13:43:03.131329] E [server-handshake.c:589:server_setvolume] 0-thingluster-server: Cannot authenticate client from cbuissar-rhs30-node1-8521-2015/04/21-13:42:58:108057-thingluster-client-0-0-0 3.6.0.53
[2015-04-21 13:43:08.419405] E [authenticate.c:239:gf_authenticate] 0-auth: no authentication module is interested in accepting remote-client (null)
Version-Release number of selected component (if applicable): tested on 3.0u3 and 3.0u4
How reproducible: 100%/easy
Steps to Reproduce:
1. set auth.allow to some client IP
2. mount and move files
3. start rebalance
Actual results:
rebalance is hung, authentication errors are shown in the brick logs
Expected results:
Rebalance should still work if we restrict auth.allow.
Additional info:
Workaround : add all the IPs of the gluster nodes in auth.allow.
And the rebalance-<volume>.log :
[2015-04-21 13:43:08.412805] W [client-handshake.c:1108:client_setvolume_cbk] 0-thingluster-client-3: failed to set the volume (Permission denied)
[2015-04-21 13:43:08.412821] W [client-handshake.c:1134:client_setvolume_cbk] 0-thingluster-client-3: failed to get 'process-uuid' from reply dict
[2015-04-21 13:43:08.412828] E [client-handshake.c:1140:client_setvolume_cbk] 0-thingluster-client-3: SETVOLUME on remote-host failed: Authentication failed
[2015-04-21 13:43:08.412834] I [client-handshake.c:1225:client_setvolume_cbk] 0-thingluster-client-3: sending AUTH_FAILED even
Tested with build "glusterfs-3.7.1-12" and after setting auth.allow and nfs.rpc-auth-allow able to run re balance job without any problem so marking this bug as verified
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://rhn.redhat.com/errata/RHSA-2015-1845.html
Description of problem: When setting auth.allow, rebalance will get stuck unless the IPs of the gluster nodes themselves are included. The rebalance will be kept as 'in progress', but will be kept at 0 Byte. Node Rebalanced-files size scanned failures skipped status run time in secs --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 0 0Bytes 0 0 0 in progress 0.00 rhs30-node3 0 0Bytes 0 0 0 in progress 0.00 rhs30-node4 0 0Bytes 0 0 0 in progress 0.00 192.168.100.206 0 0Bytes 0 0 0 in progress 0.00 volume rebalance: thingluster: success: On the bricks logs, we can see the authentication being prevented : [2015-04-21 13:43:03.131329] E [server-handshake.c:589:server_setvolume] 0-thingluster-server: Cannot authenticate client from cbuissar-rhs30-node1-8521-2015/04/21-13:42:58:108057-thingluster-client-0-0-0 3.6.0.53 [2015-04-21 13:43:08.419405] E [authenticate.c:239:gf_authenticate] 0-auth: no authentication module is interested in accepting remote-client (null) Version-Release number of selected component (if applicable): tested on 3.0u3 and 3.0u4 How reproducible: 100%/easy Steps to Reproduce: 1. set auth.allow to some client IP 2. mount and move files 3. start rebalance Actual results: rebalance is hung, authentication errors are shown in the brick logs Expected results: Rebalance should still work if we restrict auth.allow. Additional info: Workaround : add all the IPs of the gluster nodes in auth.allow.