Bug 1340338

Summary:	"volume status inode" command is getting timed out if number of files are more in the mount point
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	Byreddy <bsrirama>
Component:	glusterd	Assignee:	Atin Mukherjee <amukherj>
Status:	CLOSED ERRATA	QA Contact:	Byreddy <bsrirama>
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	rhgs-3.1	CC:	rcyriac, rhinduja, rhs-bugs, storage-qa-internal, vbellur
Target Milestone:	---
Target Release:	RHGS 3.2.0
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:	glusterfs-3.8.4-1	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2017-03-23 05:33:08 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1351522

Description Byreddy 2016-05-27 05:53:04 UTC

Description of problem:
=======================
The "gluster volume status <vol-name> inode" command is getting timed out if number of files in the fuse mount is >=10000.


cli error message:
------------------
]# gluster volume status replica inode
Error : Request timed out

num of files in the mount point (zero size files):
--------------------------------
mnt]# ls |wc -l
10000



error messages in the glusterd log:
-----------------------------------

[2016-05-27 05:34:14.661100] I [MSGID: 106499] [glusterd-handler.c:4330:__glusterd_handle_status_volume] 0-management: Received status volume req for volume replica
[2016-05-27 05:35:18.640983] I [MSGID: 106499] [glusterd-handler.c:4330:__glusterd_handle_status_volume] 0-management: Received status volume req for volume replica
[2016-05-27 05:36:51.757363] I [MSGID: 106499] [glusterd-handler.c:4330:__glusterd_handle_status_volume] 0-management: Received status volume req for volume replica
[2016-05-27 05:38:36.911552] I [MSGID: 106499] [glusterd-handler.c:4330:__glusterd_handle_status_volume] 0-management: Received status volume req for volume replica
[2016-05-27 05:43:44.196443] I [socket.c:3508:socket_submit_reply] 0-socket.management: not connected (priv->connected = -1)
[2016-05-27 05:43:44.196462] E [rpcsvc.c:1314:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x1, Program: GlusterD svc cli, ProgVers: 2, Proc: 27) to rpc-transport (socket.management)
[2016-05-27 05:43:44.196480] E [MSGID: 106430] [glusterd-utils.c:474:glusterd_submit_reply] 0-glusterd: Reply submission failed





Version-Release number of selected component (if applicable):
=============================================================
glusterfs-3.7.9-6


How reproducible:
=================
Always


Steps to Reproduce:
===================
1. Have two node cluster with replica volume (1*2)
2. fuse mount the volume
3. start creating some 10000 files in the mount point (eg: touch files{1..10000})
4. When the files creation is in progress or done (step-3), issue "gluster volume status <vol-name> inode" multiple times.  //this command will get timed out once 10000 files creation is done in the mount point.



Actual results:
===============
volume status inode command is getting timedout if number of files are more in the mount point.


Expected results:
==================
command should not get timed out if more files are present in the fuse mount point.


Additional info:
================

Comment 2 Atin Mukherjee 2016-05-27 07:31:25 UTC

This is a known issue. volume status <volname> inode issues brick ops which are costly here given the number of files you have in a volume. In GlusterFS 3.8 version (to be rebased to rhgs-3.1.2 in downstream) I have introduced an option called timeout in cli where you can configure the time out value for a particular CLI command and that should help here.

Comment 3 Atin Mukherjee 2016-06-29 10:26:14 UTC

http://review.gluster.org/13882 has actually introduced a --timeout option by which you can increase the CLI timeout for a command which takes longer time to execute. So to avoid time out issues for this, we can use 'gluster --timeout=600 volume status inode'

Comment 5 Atin Mukherjee 2016-08-10 07:15:36 UTC

We have introduced a --timeout configurable value for CLI to tackle the heavy lifting commands not to get timed out from GlusterD. We'd need to use timeout option to the CLI to use status inode to get away of this.

Comment 6 Atin Mukherjee 2016-09-17 13:54:19 UTC

Upstream mainline : http://review.gluster.org/13882
Upstream 3.8 : Available through branching

And the fix is available in rhgs-3.2.0 as part of rebase to GlusterFS 3.8.4.

Comment 11 Byreddy 2016-10-03 05:32:28 UTC

Verified this bug using the build - glusterfs-3.8.4-2

When number of files in the mount point is 20000, command is failing with timeout &  Another transaction messages ( fyi - No multiple commands issued on the cluster nodes, only this single command is issued on one of the cluster node )

 mnt]# ls |wc -l
20000

[root@ ~]# gluster --timeout=600 volume status replica inode
Error : Request timed out
[root@ ~]# 
[root@ ~]# gluster --timeout=1200 volume status replica inode
Another transaction is in progress for replica. Please try again after sometime.
[root@ ~]# 


Moving back to assigned state.

Comment 12 Atin Mukherjee 2016-10-03 06:31:48 UTC

(In reply to Byreddy from comment #11)
> Verified this bug using the build - glusterfs-3.8.4-2
> 
> When number of files in the mount point is 20000, command is failing with
> timeout &  Another transaction messages ( fyi - No multiple commands issued
> on the cluster nodes, only this single command is issued on one of the
> cluster node )

There is no guarantee that command will not time out with x secs of time out configured for y number of inodes. You did a right thing of trying out with a bigger value as timeout option, however please note the previous command might still not be finished and hence getting an 'another transaction is in progress' message is expected.

I *can not* accept this BZ as failed QA.

> 
>  mnt]# ls |wc -l
> 20000
> 
> [root@ ~]# gluster --timeout=600 volume status replica inode
> Error : Request timed out
> [root@ ~]# 
> [root@ ~]# gluster --timeout=1200 volume status replica inode
> Another transaction is in progress for replica. Please try again after
> sometime.
> [root@ ~]# 
> 
> 
> Moving back to assigned state.

Comment 13 Byreddy 2016-10-03 08:00:07 UTC

Moving this bug to ON_QA to re-verify  because here --timeout directly proportional to the number of files so we have to set huge cli  timeout value if number of files in the mount is more ( exact timeout to set is not specified here )

Comment 14 Byreddy 2016-10-03 08:08:23 UTC

Verified this based on comment 13 details, the failed thing mentioned in comment 11 worked well with --timeout=10000  value.

There is problem with cli option value setting that is, the option --timeout is taking non numeric values and negative values which is incorrect, it should throw proper error message i will raise a separate bug to track it.

Moving bug to verified state as expected from this bug is working fine.

Comment 17 errata-xmlrpc 2017-03-23 05:33:08 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0486.html