Bug 1337836
Summary: | [Volume Scale] heketi-cli should not attempt to stop and delete a volume as soon as it receives a CLI timeout (120sec) but instead wait until the frame-timeout of 600sec | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Prasanth <pprakash> |
Component: | glusterd | Assignee: | Atin Mukherjee <amukherj> |
Status: | CLOSED ERRATA | QA Contact: | Prasanth <pprakash> |
Severity: | urgent | Docs Contact: | |
Priority: | unspecified | ||
Version: | rhgs-3.1 | CC: | amainkar, asrivast, madam, pprakash, rcyriac, rhinduja, rhs-bugs, vbellur |
Target Milestone: | --- | Keywords: | Reopened |
Target Release: | RHGS 3.2.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | aplo | ||
Fixed In Version: | glusterfs-3.8.4-1 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2017-03-23 05:32:22 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1351522 |
Description
Prasanth
2016-05-20 08:09:55 UTC
Hi, I have been taking a look at this bug, but the code does not wait for 120 secs, it waits for 10 mins max on a command. If the timeout is from the gluster command coming back after 120 secs, then the bug seems to be in gluster, right? We do not use sshexec in OpenShift. We now use kubeexec. I'll close until this shows up again. Please retry using the latest. This issue is due to the gluster CLI returning from the command before the command is completed. It returns with an error code to Heketi, so Heketi thinks something bad went wrong. The CLI is supposed to wait until the command is completed and return success or failure. And I will reply to Luis'es query on comment 12 once I am in front of my laptop. (In reply to Luis Pabón from comment #12) > This issue is due to the gluster CLI returning from the command before the > command is completed. It returns with an error code to Heketi, so Heketi > thinks something bad went wrong. > > The CLI is supposed to wait until the command is completed and return > success or failure. Ideally yes, it is supposed to but the question is how long? Will it wait for ever if it doesn't get a response back from GlusterD? There could be different reasons where GlusterD may consume time processing the request. Alternatively GlusterD could be deadlocked or a RPC time out could very well be a possibility where the rpc callback will only be hit after 600 secs i.e. the default frame timeout value. With all these considerations, it was decided that an average wait time for CLI shouldn't be more than 120 secs and it should time out even if GlusterD is processing the request. IMHO we can not implement to have CLI waiting for GlusterD to respond. If you have any other suggestion please do let us know. Now the good news is this issue is already fixed in upstream with http://review.gluster.org/14849 and the RCA of this issue goes same as with BZ 1336267, BZ 1337477 & BZ 1337495 and fix is targeted for RHGS-3.2 release. I've a question for Prashanth/Luis. "Actual results: Even though the volume creation is successful in the gluster nodes, glusterd somehow fails to start the volume and eventually CLI times out after 120sec. However, glusterd still tries to process this request in the background until the frame-timeout of 600sec happens. However, on receiving the "Request timed out" message after 120sec, heketi-cli immediately tries to stop and delete this volume followed by umount --> lvremove --> rmdir, etc. Since gluster volume stop itself fails as it is Unable to acquire lock for that volume, all the subsequent actions are also going to fail." if volume stop fails since it failed to acquire the lock why heketi cli goes for deleting the volume followed by the other clean ups? That will always lead to inconsistency across the layers. Shouldn't heketi do this only when CLI succeeds? Since the patch is already posted and merged upstream moving this bug to post state. (In reply to Atin Mukherjee from comment #16) > I've a question for Prashanth/Luis. > > "Actual results: Even though the volume creation is successful in the > gluster nodes, glusterd somehow fails to start the volume and eventually CLI > times out after 120sec. However, glusterd still tries to process this > request in the background until the frame-timeout of 600sec happens. > However, on receiving the "Request timed out" message after 120sec, > heketi-cli immediately tries to stop and delete this volume followed by > umount --> lvremove --> rmdir, etc. Since gluster volume stop itself fails > as it is Unable to acquire lock for that volume, all the subsequent actions > are also going to fail." > > if volume stop fails since it failed to acquire the lock why heketi cli goes > for deleting the volume followed by the other clean ups? That will always > lead to inconsistency across the layers. Shouldn't heketi do this only when > CLI succeeds? Atin, You are right, heketi should NOT do a full clean up during this situation as it directly leads to in-consistency between the data in heketi db and gluster. And this is the exact reason for me logging this bug and I've put the same thoughts in the expected result as well. So I believe, it's a good question for Luis to explain this current behaviour! :) I think the problem here is that this CLI command looks like is the only one that Heketi uses which could be Asynchronous. We either need a synchronous command which blocks until the request is complete, or we need a method/command from GlusterD to get status for this command once it times out. The second option must be well documented and tested. (In reply to Luis Pabón from comment #22) > I think the problem here is that this CLI command looks like is the only one > that Heketi uses which could be Asynchronous. We either need a synchronous > command which blocks until the request is complete, or we need a > method/command from GlusterD to get status for this command once it times > out. > > The second option must be well documented and tested. Also timing out is inconsistent. Either the command is always asynchronous, or the command is always synchronous. If we need a new gluster command to make it always asynchronous or always synchronous, that would be fine. Upstream mainline : http://review.gluster.org/14849 Upstream 3.8 : http://review.gluster.org/14860 And the fix is available in rhgs-3.2.0 as part of rebase to GlusterFS 3.8.4. The reported issue seems to be fixed in glusterfs-3.8.4 and I was able to scale gluster volumes using heketi-cli even beyond 300 volumes without any issues. ###################### # gluster --version glusterfs 3.8.4 built on Feb 20 2017 03:15:38 Repository revision: git://git.gluster.com/glusterfs.git Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com> GlusterFS comes with ABSOLUTELY NO WARRANTY. You may redistribute copies of GlusterFS under the terms of the GNU General Public License. # heketi-cli volume list |wc -l 500 # gluster volume list |wc -l 500 # gluster peer status Number of Peers: 2 Hostname: dhcp46-150.lab.eng.blr.redhat.com Uuid: c4bdf1ad-04ab-4301-b9fe-f144272079ef State: Peer in Cluster (Connected) Hostname: 10.70.47.163 Uuid: fcd44049-a3b9-4f85-851c-79915812cf3f State: Peer in Cluster (Connected) # gluster volume info vol291 Volume Name: vol291 Type: Replicate Volume ID: 48eeed36-e50e-429b-b474-10e4e336ffca Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: 10.70.46.150:/var/lib/heketi/mounts/vg_1dae8c1c2feb1c16338a0440f64bcfed/brick_6f278e35a09ea7cff70b008784cb99c1/brick Brick2: 10.70.47.161:/var/lib/heketi/mounts/vg_cf48e4fe475f69149d157bbfae86db75/brick_5cc9403c371cff1f2c4b6504bca5f2e9/brick Brick3: 10.70.47.163:/var/lib/heketi/mounts/vg_94e77f6c32ac54b0c819ceee0899981f/brick_a6fb045df6a79ee353031f9da40309c5/brick Options Reconfigured: transport.address-family: inet performance.readdir-ahead: on nfs.disable: on # gluster volume info vol500 Volume Name: vol500 Type: Replicate Volume ID: 72c9be57-3284-4e12-8449-84889a782c23 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: 10.70.47.163:/var/lib/heketi/mounts/vg_814d6fa82429363cf09aaafe5ba7d850/brick_1f6b1df64468e950ee64aebe57f498d9/brick Brick2: 10.70.46.150:/var/lib/heketi/mounts/vg_60357bad265972bb79fd3155e6d473fa/brick_d14fc99faef482004d606480d405df7d/brick Brick3: 10.70.47.161:/var/lib/heketi/mounts/vg_893e765b342697a5086bf56d58332501/brick_3f4e9c68a58d6435ab377d85d08f90ff/brick Options Reconfigured: transport.address-family: inet performance.readdir-ahead: on nfs.disable: on # gluster pool list UUID Hostname State c4bdf1ad-04ab-4301-b9fe-f144272079ef dhcp46-150.lab.eng.blr.redhat.com Connected fcd44049-a3b9-4f85-851c-79915812cf3f 10.70.47.163 Connected a614a3d8-478f-48d4-8542-e6d9c3b526ad localhost Connected ###################### Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2017-0486.html |