Bug 1340338
Summary: | "volume status inode" command is getting timed out if number of files are more in the mount point | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Byreddy <bsrirama> |
Component: | glusterd | Assignee: | Atin Mukherjee <amukherj> |
Status: | CLOSED ERRATA | QA Contact: | Byreddy <bsrirama> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | rhgs-3.1 | CC: | rcyriac, rhinduja, rhs-bugs, storage-qa-internal, vbellur |
Target Milestone: | --- | ||
Target Release: | RHGS 3.2.0 | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | glusterfs-3.8.4-1 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2017-03-23 05:33:08 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1351522 |
Description
Byreddy
2016-05-27 05:53:04 UTC
This is a known issue. volume status <volname> inode issues brick ops which are costly here given the number of files you have in a volume. In GlusterFS 3.8 version (to be rebased to rhgs-3.1.2 in downstream) I have introduced an option called timeout in cli where you can configure the time out value for a particular CLI command and that should help here. http://review.gluster.org/13882 has actually introduced a --timeout option by which you can increase the CLI timeout for a command which takes longer time to execute. So to avoid time out issues for this, we can use 'gluster --timeout=600 volume status inode' We have introduced a --timeout configurable value for CLI to tackle the heavy lifting commands not to get timed out from GlusterD. We'd need to use timeout option to the CLI to use status inode to get away of this. Upstream mainline : http://review.gluster.org/13882 Upstream 3.8 : Available through branching And the fix is available in rhgs-3.2.0 as part of rebase to GlusterFS 3.8.4. Verified this bug using the build - glusterfs-3.8.4-2 When number of files in the mount point is 20000, command is failing with timeout & Another transaction messages ( fyi - No multiple commands issued on the cluster nodes, only this single command is issued on one of the cluster node ) mnt]# ls |wc -l 20000 [root@ ~]# gluster --timeout=600 volume status replica inode Error : Request timed out [root@ ~]# [root@ ~]# gluster --timeout=1200 volume status replica inode Another transaction is in progress for replica. Please try again after sometime. [root@ ~]# Moving back to assigned state. (In reply to Byreddy from comment #11) > Verified this bug using the build - glusterfs-3.8.4-2 > > When number of files in the mount point is 20000, command is failing with > timeout & Another transaction messages ( fyi - No multiple commands issued > on the cluster nodes, only this single command is issued on one of the > cluster node ) There is no guarantee that command will not time out with x secs of time out configured for y number of inodes. You did a right thing of trying out with a bigger value as timeout option, however please note the previous command might still not be finished and hence getting an 'another transaction is in progress' message is expected. I *can not* accept this BZ as failed QA. > > mnt]# ls |wc -l > 20000 > > [root@ ~]# gluster --timeout=600 volume status replica inode > Error : Request timed out > [root@ ~]# > [root@ ~]# gluster --timeout=1200 volume status replica inode > Another transaction is in progress for replica. Please try again after > sometime. > [root@ ~]# > > > Moving back to assigned state. Moving this bug to ON_QA to re-verify because here --timeout directly proportional to the number of files so we have to set huge cli timeout value if number of files in the mount is more ( exact timeout to set is not specified here ) Verified this based on comment 13 details, the failed thing mentioned in comment 11 worked well with --timeout=10000 value. There is problem with cli option value setting that is, the option --timeout is taking non numeric values and negative values which is incorrect, it should throw proper error message i will raise a separate bug to track it. Moving bug to verified state as expected from this bug is working fine. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2017-0486.html |