I created regular volume as below: [root@rhs-client21 ~]# gluster v info newvol Volume Name: newvol Type: Tier Volume ID: d38264e9-6ce8-4c46-b052-ffd5e55554e1 Status: Started Number of Bricks: 16 Transport-type: tcp Hot Tier : Hot Tier Type : Distributed-Replicate Number of Bricks: 2 x 2 = 4 Brick1: rhs-client20:/rhs/brick5/newvol_hot Brick2: rhs-client21:/rhs/brick5/newvol_hot Brick3: rhs-client20:/rhs/brick4/newvol_hot Brick4: rhs-client21:/rhs/brick4/newvol_hot Cold Tier: Cold Tier Type : Distribute Number of Bricks: 12 Brick5: rhs-client4:/rhs/brick1/newvol Brick6: rhs-client20:/rhs/brick1/newvol Brick7: rhs-client21:/rhs/brick1/newvol Brick8: rhs-client30:/rhs/brick1/newvol Brick9: 10.70.37.59:/rhs/brick1/newvol Brick10: 10.70.37.150:/rhs/brick1/newvol Brick11: rhs-client4:/rhs/brick2/newvol Brick12: rhs-client20:/rhs/brick2/newvol Brick13: rhs-client21:/rhs/brick2/newvol Brick14: rhs-client30:/rhs/brick2/newvol Brick15: 10.70.37.59:/rhs/brick2/newvol Brick16: 10.70.37.150:/rhs/brick2/newvol Options Reconfigured: performance.readdir-ahead: on features.quota: on features.inode-quota: on features.quota-deem-statfs: on features.ctr-enabled: on cluster.tier-mode: cache 2) I then started IOs from two clients as below: a) linux untar on rhsauto070 b) rhsauto026: dd command for file creates in loop for 50 files of 300MB while above was going on with some dd files created already. 3)I then attached the tier (mentoned in vol info), I saw that on rhsauto026 the dd failed immediatly for all the current and all pending files Also, the client logs shows as below: [2016-01-19 19:03:40.677980] W [MSGID: 114043] [client-handshake.c:1114:client_setvolume_cbk] 2-newvol-client-0: failed to set the volume [Permission denied] [2016-01-19 19:03:40.678114] W [MSGID: 114007] [client-handshake.c:1143:client_setvolume_cbk] 2-newvol-client-0: failed to get 'process-uuid' from reply dict [Invalid argument] [2016-01-19 19:03:40.678133] E [MSGID: 114044] [client-handshake.c:1149:client_setvolume_cbk] 2-newvol-client-0: SETVOLUME on remote-host failed [Permission denied] [2016-01-19 19:03:40.678145] I [MSGID: 114049] [client-handshake.c:1240:client_setvolume_cbk] 2-newvol-client-0: sending AUTH_FAILED event [2016-01-19 19:03:40.678159] E [fuse-bridge.c:5200:notify] 0-fuse: Server authenication failed. Shutting down. [2016-01-19 19:03:40.678171] I [fuse-bridge.c:5669:fini] 0-fuse: Unmounting '/mnt/newvol'. [2016-01-19 19:03:40.678271] I [fuse-bridge.c:4965:fuse_thread_proc] 0-fuse: unmounting /mnt/newvol [2016-01-19 19:03:40.678872] W [glusterfsd.c:1236:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7dc5) [0x7f49315b7dc5] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x7f4932c22905] -->/usr/sbin/glusterfs(cleanup_and_exit+0x69) [0x7f4932c22789] ) 0-: received signum (15), shutting down Also, I am now to cd to the the mount location, says transport end pint eror Note: the kernel untarr on rhsauot070 was still going on without any interfeernce
Created attachment 1116638 [details] client error
It is very inconsistently reproducible. RCA: It is race between graph change in client graph and an option change in server graph. During server_reconfigure we authenticate each connected clients against the current options. To do this authentication we store previous values in a dictionary during the connection establishment phase (server_setvolume). If the authentication fails during reconfigure then we will disconnect the transport. Here it introduce a race between server_setvolume and reconfugure. If a reconfigure called before doing a setvolume, the transport will be disconnected. After three seconds time-out transport will be reconnected.
Changing the component since this can be reproduced in any volume also this bug falls into protocol layer. NOTE: With RCA given in comment3, the failure should not umount
upstream master patch merged. http://review.gluster.org/#/c/13271/ release 3.7 : http://review.gluster.org/#/c/13280/
Created attachment 1117135 [details] mount log
patches mentioned in comment5 are merged in upstream, so the fix would be available for 3.2 as part of the rebase.
Moving to MODIFIED. Patch available downstream as commit 30e4d0d.
As tier is not being actively developed, I'm closing this bug. Feel free to open it if necessary.