Description of problem: On processing a volume create request, heketi-cli should not attempt to stop and delete a volume as soon as it receives a CLI timeout (120sec) but instead, wait until the frame-timeout of 600sec. Version-Release number of selected component (if applicable): # heketi --version Heketi v1.3.0-dev-12-g2f52279 glusterfs-3.7.9-5.el7rhgs.x86_64 glusterfs-server-3.7.9-5.el7rhgs.x86_64 How reproducible: Always Steps to Reproduce: 1. A gluster cluster of 3 RHGS 3.1.3 nodes having glusterd.service MemoryLimit=32G 2. Using heketi-cli, try to create and start around 300 gluster volumes in a loop 3. Check for command output and heketi and gluster logs Actual results: Even though the volume creation is successful in the gluster nodes, glusterd somehow fails to start the volume and eventually CLI times out after 120sec. However, glusterd still tries to process this request in the background until the frame-timeout of 600sec happens. However, on receiving the "Request timed out" message after 120sec, heketi-cli immediately tries to stop and delete this volume followed by umount --> lvremove --> rmdir, etc. Since gluster volume stop itself fails as it is Unable to acquire lock for that volume, all the subsequent actions are also going to fail. See logs from Heketi-cli: ####### May 20 13:01:06 dhcp42-197 heketi: [sshexec] ERROR 2016/05/20 13:01:06 /src/github.com/heketi/utils/ssh/ssh.go:155: Failed to run command [sudo gluster volume start vol296] on 10.70.42.222:22: Err[Process exited with: 1. Reason was: ()]: Stdout [Error : Request timed out May 20 13:01:06 dhcp42-197 heketi: ]: Stderr [] May 20 13:01:06 dhcp42-197 heketi: [sshexec] ERROR 2016/05/20 13:01:06 /src/github.com/heketi/utils/ssh/ssh.go:155: Failed to run command [sudo gluster --mode=script volume stop vol296 force] on 10.70.42.222:22: Err[Process exited with: 1. Reason was: ()]: Stdout []: Stderr [volume stop: vol296: failed: Another transaction is in progress for vol296. Please try again after sometime. May 20 13:01:06 dhcp42-197 heketi: ] May 20 13:01:06 dhcp42-197 heketi: [sshexec] ERROR 2016/05/20 13:01:06 /src/github.com/heketi/heketi/executors/sshexec/volume.go:146: Unable to stop volume vol296: Process exited with: 1. Reason was: () May 20 13:01:07 dhcp42-197 heketi: [negroni] Started GET /queue/1899fe9e4701655e57691aa9dae27385 May 20 13:01:07 dhcp42-197 heketi: [negroni] Completed 200 OK in 82.184µs May 20 13:01:08 dhcp42-197 heketi: [negroni] Started GET /queue/1899fe9e4701655e57691aa9dae27385 May 20 13:01:08 dhcp42-197 heketi: [negroni] Completed 200 OK in 111.284µs May 20 13:01:08 dhcp42-197 heketi: [sshexec] ERROR 2016/05/20 13:01:08 /src/github.com/heketi/utils/ssh/ssh.go:155: Failed to run command [sudo gluster --mode=script volume delete vol296] on 10.70.42.222:22: Err[Process exited with: 1. Reason was: ()]: Stdout []: Stderr [volume delete: vol296: failed: Another transaction is in progress for vol296. Please try again after sometime. May 20 13:01:08 dhcp42-197 heketi: ] May 20 13:01:08 dhcp42-197 heketi: [sshexec] ERROR 2016/05/20 13:01:08 /src/github.com/heketi/heketi/executors/sshexec/volume.go:158: Unable to delete volume vol296: Process exited with: 1. Reason was: () May 20 13:01:08 dhcp42-197 heketi: [heketi] INFO 2016/05/20 13:01:08 Deleting brick a20b78834b0ae628ab21a867fbc3906b May 20 13:01:08 dhcp42-197 heketi: [heketi] INFO 2016/05/20 13:01:08 Deleting brick a4d290acfc7bdd8cb4d32537ca734509 May 20 13:01:08 dhcp42-197 heketi: [heketi] INFO 2016/05/20 13:01:08 Deleting brick baabad7c391b6cc6d6918fd3d3d71820 May 20 13:01:08 dhcp42-197 heketi: [heketi] INFO 2016/05/20 13:01:08 Deleting brick 83749ea4e93e0d9ab11b9c3cb37ef8a2 May 20 13:01:08 dhcp42-197 heketi: [heketi] INFO 2016/05/20 13:01:08 Deleting brick 928a0fc2285f7e9a3c4a4cf12e6967b6 May 20 13:01:08 dhcp42-197 heketi: [heketi] INFO 2016/05/20 13:01:08 Deleting brick ae5390d0bceea4dc0cdd32aa91c5499f May 20 13:01:08 dhcp42-197 heketi: [sshexec] ERROR 2016/05/20 13:01:08 /src/github.com/heketi/utils/ssh/ssh.go:155: Failed to run command [sudo umount /var/lib/heketi/mounts/vg_5013101181241c140bd579f226cfd3d1/brick_baabad7c391b6cc6d6918fd3d3d71820] on 10.70.43.158:22: Err[Process exited with: 32. Reason was: ()]: Stdout []: Stderr [umount: /var/lib/heketi/mounts/vg_5013101181241c140bd579f226cfd3d1/brick_baabad7c391b6cc6d6918fd3d3d71820: target is busy. ####### This will end up in a inconsistent state of cluster/volume as the same volume will be seen as "Created" from gluster side with some bricks missing but the heketi db is having NO information about this volume at all. ##### [root@dhcp43-158 ~]# gluster vol info vol296 Volume Name: vol296 Type: Distributed-Replicate Volume ID: 7b500717-3406-4a29-be63-0deaffa18bbc Status: Created Number of Bricks: 2 x 3 = 6 Transport-type: tcp Bricks: Brick1: 10.70.42.222:/var/lib/heketi/mounts/vg_0974b8b4fdf60a1bc086fa3c03a4cf24/brick_a20b78834b0ae628ab21a867fbc3906b/brick Brick2: 10.70.42.85:/var/lib/heketi/mounts/vg_65f837f8e26d1afb8b2050194afd0203/brick_a4d290acfc7bdd8cb4d32537ca734509/brick Brick3: 10.70.43.158:/var/lib/heketi/mounts/vg_5013101181241c140bd579f226cfd3d1/brick_baabad7c391b6cc6d6918fd3d3d71820/brick Brick4: 10.70.42.222:/var/lib/heketi/mounts/vg_13f8363b53cac7068787f533316cd884/brick_83749ea4e93e0d9ab11b9c3cb37ef8a2/brick Brick5: 10.70.42.85:/var/lib/heketi/mounts/vg_3589dc63d585700961b1e0f2d2be4801/brick_928a0fc2285f7e9a3c4a4cf12e6967b6/brick Brick6: 10.70.43.158:/var/lib/heketi/mounts/vg_84d8689748532a237c0815823c5c6294/brick_ae5390d0bceea4dc0cdd32aa91c5499f/brick Options Reconfigured: performance.readdir-ahead: on ##### Expected results: heketi-cli should not try to clean up the volume immediately but instead wait for the frame-timeout to happen to avoid ending up in this situation.
https://github.com/heketi/heketi/issues/348
Hi, I have been taking a look at this bug, but the code does not wait for 120 secs, it waits for 10 mins max on a command. If the timeout is from the gluster command coming back after 120 secs, then the bug seems to be in gluster, right?
We do not use sshexec in OpenShift. We now use kubeexec. I'll close until this shows up again.
Please retry using the latest.
This issue is due to the gluster CLI returning from the command before the command is completed. It returns with an error code to Heketi, so Heketi thinks something bad went wrong. The CLI is supposed to wait until the command is completed and return success or failure.
And I will reply to Luis'es query on comment 12 once I am in front of my laptop.
(In reply to Luis Pabón from comment #12) > This issue is due to the gluster CLI returning from the command before the > command is completed. It returns with an error code to Heketi, so Heketi > thinks something bad went wrong. > > The CLI is supposed to wait until the command is completed and return > success or failure. Ideally yes, it is supposed to but the question is how long? Will it wait for ever if it doesn't get a response back from GlusterD? There could be different reasons where GlusterD may consume time processing the request. Alternatively GlusterD could be deadlocked or a RPC time out could very well be a possibility where the rpc callback will only be hit after 600 secs i.e. the default frame timeout value. With all these considerations, it was decided that an average wait time for CLI shouldn't be more than 120 secs and it should time out even if GlusterD is processing the request. IMHO we can not implement to have CLI waiting for GlusterD to respond. If you have any other suggestion please do let us know. Now the good news is this issue is already fixed in upstream with http://review.gluster.org/14849 and the RCA of this issue goes same as with BZ 1336267, BZ 1337477 & BZ 1337495 and fix is targeted for RHGS-3.2 release. I've a question for Prashanth/Luis. "Actual results: Even though the volume creation is successful in the gluster nodes, glusterd somehow fails to start the volume and eventually CLI times out after 120sec. However, glusterd still tries to process this request in the background until the frame-timeout of 600sec happens. However, on receiving the "Request timed out" message after 120sec, heketi-cli immediately tries to stop and delete this volume followed by umount --> lvremove --> rmdir, etc. Since gluster volume stop itself fails as it is Unable to acquire lock for that volume, all the subsequent actions are also going to fail." if volume stop fails since it failed to acquire the lock why heketi cli goes for deleting the volume followed by the other clean ups? That will always lead to inconsistency across the layers. Shouldn't heketi do this only when CLI succeeds?
Since the patch is already posted and merged upstream moving this bug to post state.
(In reply to Atin Mukherjee from comment #16) > I've a question for Prashanth/Luis. > > "Actual results: Even though the volume creation is successful in the > gluster nodes, glusterd somehow fails to start the volume and eventually CLI > times out after 120sec. However, glusterd still tries to process this > request in the background until the frame-timeout of 600sec happens. > However, on receiving the "Request timed out" message after 120sec, > heketi-cli immediately tries to stop and delete this volume followed by > umount --> lvremove --> rmdir, etc. Since gluster volume stop itself fails > as it is Unable to acquire lock for that volume, all the subsequent actions > are also going to fail." > > if volume stop fails since it failed to acquire the lock why heketi cli goes > for deleting the volume followed by the other clean ups? That will always > lead to inconsistency across the layers. Shouldn't heketi do this only when > CLI succeeds? Atin, You are right, heketi should NOT do a full clean up during this situation as it directly leads to in-consistency between the data in heketi db and gluster. And this is the exact reason for me logging this bug and I've put the same thoughts in the expected result as well. So I believe, it's a good question for Luis to explain this current behaviour! :)
I think the problem here is that this CLI command looks like is the only one that Heketi uses which could be Asynchronous. We either need a synchronous command which blocks until the request is complete, or we need a method/command from GlusterD to get status for this command once it times out. The second option must be well documented and tested.
(In reply to Luis Pabón from comment #22) > I think the problem here is that this CLI command looks like is the only one > that Heketi uses which could be Asynchronous. We either need a synchronous > command which blocks until the request is complete, or we need a > method/command from GlusterD to get status for this command once it times > out. > > The second option must be well documented and tested. Also timing out is inconsistent. Either the command is always asynchronous, or the command is always synchronous. If we need a new gluster command to make it always asynchronous or always synchronous, that would be fine.
Upstream mainline : http://review.gluster.org/14849 Upstream 3.8 : http://review.gluster.org/14860 And the fix is available in rhgs-3.2.0 as part of rebase to GlusterFS 3.8.4.
The reported issue seems to be fixed in glusterfs-3.8.4 and I was able to scale gluster volumes using heketi-cli even beyond 300 volumes without any issues. ###################### # gluster --version glusterfs 3.8.4 built on Feb 20 2017 03:15:38 Repository revision: git://git.gluster.com/glusterfs.git Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com> GlusterFS comes with ABSOLUTELY NO WARRANTY. You may redistribute copies of GlusterFS under the terms of the GNU General Public License. # heketi-cli volume list |wc -l 500 # gluster volume list |wc -l 500 # gluster peer status Number of Peers: 2 Hostname: dhcp46-150.lab.eng.blr.redhat.com Uuid: c4bdf1ad-04ab-4301-b9fe-f144272079ef State: Peer in Cluster (Connected) Hostname: 10.70.47.163 Uuid: fcd44049-a3b9-4f85-851c-79915812cf3f State: Peer in Cluster (Connected) # gluster volume info vol291 Volume Name: vol291 Type: Replicate Volume ID: 48eeed36-e50e-429b-b474-10e4e336ffca Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: 10.70.46.150:/var/lib/heketi/mounts/vg_1dae8c1c2feb1c16338a0440f64bcfed/brick_6f278e35a09ea7cff70b008784cb99c1/brick Brick2: 10.70.47.161:/var/lib/heketi/mounts/vg_cf48e4fe475f69149d157bbfae86db75/brick_5cc9403c371cff1f2c4b6504bca5f2e9/brick Brick3: 10.70.47.163:/var/lib/heketi/mounts/vg_94e77f6c32ac54b0c819ceee0899981f/brick_a6fb045df6a79ee353031f9da40309c5/brick Options Reconfigured: transport.address-family: inet performance.readdir-ahead: on nfs.disable: on # gluster volume info vol500 Volume Name: vol500 Type: Replicate Volume ID: 72c9be57-3284-4e12-8449-84889a782c23 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: 10.70.47.163:/var/lib/heketi/mounts/vg_814d6fa82429363cf09aaafe5ba7d850/brick_1f6b1df64468e950ee64aebe57f498d9/brick Brick2: 10.70.46.150:/var/lib/heketi/mounts/vg_60357bad265972bb79fd3155e6d473fa/brick_d14fc99faef482004d606480d405df7d/brick Brick3: 10.70.47.161:/var/lib/heketi/mounts/vg_893e765b342697a5086bf56d58332501/brick_3f4e9c68a58d6435ab377d85d08f90ff/brick Options Reconfigured: transport.address-family: inet performance.readdir-ahead: on nfs.disable: on # gluster pool list UUID Hostname State c4bdf1ad-04ab-4301-b9fe-f144272079ef dhcp46-150.lab.eng.blr.redhat.com Connected fcd44049-a3b9-4f85-851c-79915812cf3f 10.70.47.163 Connected a614a3d8-478f-48d4-8542-e6d9c3b526ad localhost Connected ######################
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2017-0486.html