Description of problem: ======================= The "gluster volume status <vol-name> inode" command is getting timed out if number of files in the fuse mount is >=10000. cli error message: ------------------ ]# gluster volume status replica inode Error : Request timed out num of files in the mount point (zero size files): -------------------------------- mnt]# ls |wc -l 10000 error messages in the glusterd log: ----------------------------------- [2016-05-27 05:34:14.661100] I [MSGID: 106499] [glusterd-handler.c:4330:__glusterd_handle_status_volume] 0-management: Received status volume req for volume replica [2016-05-27 05:35:18.640983] I [MSGID: 106499] [glusterd-handler.c:4330:__glusterd_handle_status_volume] 0-management: Received status volume req for volume replica [2016-05-27 05:36:51.757363] I [MSGID: 106499] [glusterd-handler.c:4330:__glusterd_handle_status_volume] 0-management: Received status volume req for volume replica [2016-05-27 05:38:36.911552] I [MSGID: 106499] [glusterd-handler.c:4330:__glusterd_handle_status_volume] 0-management: Received status volume req for volume replica [2016-05-27 05:43:44.196443] I [socket.c:3508:socket_submit_reply] 0-socket.management: not connected (priv->connected = -1) [2016-05-27 05:43:44.196462] E [rpcsvc.c:1314:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x1, Program: GlusterD svc cli, ProgVers: 2, Proc: 27) to rpc-transport (socket.management) [2016-05-27 05:43:44.196480] E [MSGID: 106430] [glusterd-utils.c:474:glusterd_submit_reply] 0-glusterd: Reply submission failed Version-Release number of selected component (if applicable): ============================================================= glusterfs-3.7.9-6 How reproducible: ================= Always Steps to Reproduce: =================== 1. Have two node cluster with replica volume (1*2) 2. fuse mount the volume 3. start creating some 10000 files in the mount point (eg: touch files{1..10000}) 4. When the files creation is in progress or done (step-3), issue "gluster volume status <vol-name> inode" multiple times. //this command will get timed out once 10000 files creation is done in the mount point. Actual results: =============== volume status inode command is getting timedout if number of files are more in the mount point. Expected results: ================== command should not get timed out if more files are present in the fuse mount point. Additional info: ================
This is a known issue. volume status <volname> inode issues brick ops which are costly here given the number of files you have in a volume. In GlusterFS 3.8 version (to be rebased to rhgs-3.1.2 in downstream) I have introduced an option called timeout in cli where you can configure the time out value for a particular CLI command and that should help here.
http://review.gluster.org/13882 has actually introduced a --timeout option by which you can increase the CLI timeout for a command which takes longer time to execute. So to avoid time out issues for this, we can use 'gluster --timeout=600 volume status inode'
We have introduced a --timeout configurable value for CLI to tackle the heavy lifting commands not to get timed out from GlusterD. We'd need to use timeout option to the CLI to use status inode to get away of this.
Upstream mainline : http://review.gluster.org/13882 Upstream 3.8 : Available through branching And the fix is available in rhgs-3.2.0 as part of rebase to GlusterFS 3.8.4.
Verified this bug using the build - glusterfs-3.8.4-2 When number of files in the mount point is 20000, command is failing with timeout & Another transaction messages ( fyi - No multiple commands issued on the cluster nodes, only this single command is issued on one of the cluster node ) mnt]# ls |wc -l 20000 [root@ ~]# gluster --timeout=600 volume status replica inode Error : Request timed out [root@ ~]# [root@ ~]# gluster --timeout=1200 volume status replica inode Another transaction is in progress for replica. Please try again after sometime. [root@ ~]# Moving back to assigned state.
(In reply to Byreddy from comment #11) > Verified this bug using the build - glusterfs-3.8.4-2 > > When number of files in the mount point is 20000, command is failing with > timeout & Another transaction messages ( fyi - No multiple commands issued > on the cluster nodes, only this single command is issued on one of the > cluster node ) There is no guarantee that command will not time out with x secs of time out configured for y number of inodes. You did a right thing of trying out with a bigger value as timeout option, however please note the previous command might still not be finished and hence getting an 'another transaction is in progress' message is expected. I *can not* accept this BZ as failed QA. > > mnt]# ls |wc -l > 20000 > > [root@ ~]# gluster --timeout=600 volume status replica inode > Error : Request timed out > [root@ ~]# > [root@ ~]# gluster --timeout=1200 volume status replica inode > Another transaction is in progress for replica. Please try again after > sometime. > [root@ ~]# > > > Moving back to assigned state.
Moving this bug to ON_QA to re-verify because here --timeout directly proportional to the number of files so we have to set huge cli timeout value if number of files in the mount is more ( exact timeout to set is not specified here )
Verified this based on comment 13 details, the failed thing mentioned in comment 11 worked well with --timeout=10000 value. There is problem with cli option value setting that is, the option --timeout is taking non numeric values and negative values which is incorrect, it should throw proper error message i will raise a separate bug to track it. Moving bug to verified state as expected from this bug is working fine.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2017-0486.html