Description of problem: Using vdsClient over SSL we got a "SSLError: The read operation timed out" error issuing a command that took more than 60 sec in VDSM. More precisely hosted-engine-setup is calling connectStoragePool for an iSCSI storage domain while a different NFS storage domain is failing due to a different issue. VDSM takes more than 60 secs due to the NFS storage issue and in the mean time hosted-engine got "SSLError: The read operation timed out". Version-Release number of selected component (if applicable): rhel7 installed on both hosts rhev 3.5 vt12 How reproducible: ? Steps to Reproduce: 1. 2. 3. Actual results: If VDSM takes more than 60 secs, it fails with "SSLError: The read operation timed out" Expected results: connectStoragePool is a synchronous request, it should wait indefinitely for the storage call to either succeed or fail avoiding any failure due to SSL timeouts. Additional info:
This sounds like normal behavior which depends on the request timeout. if vdsm currently does not provide a way to set such timeout we should allow it
Simone, why is hosted-engine-setup still using vdsClient? vdsClient is a horrid little script, which nobody loves or wants to support. Whenever possible, just use from vdsm import vdscli s = vdscli.connect() ... s.connectStoragePool() which would give you a lot more control on the arguments, return value, and socket options (like timeout)
before adding such timeout, can we confirm that this is not storage issue? 60seconds is indeed long delay to wait. is it reproducible ? can you specify steps to see that and share the logs? parallel to that we think if timeout is the right way to go . but maybe we face storage bug that can solve that long delay
Simone , can you please respond on comment #2, Is it doable for 3.5.1 ?
(In reply to Dan Kenigsberg from comment #2) > Simone, why is hosted-engine-setup still using vdsClient? vdsClient is a > horrid little script, which nobody loves or wants to support. (In reply to Barak from comment #4) > Simone , can you please respond on comment #2, > Is it doable for 3.5.1 ? We already pushed a patch for it ( http://gerrit.ovirt.org/#/c/33996/ ) , it's under review process but it need to be better tested cause it's quite widespread over different areas. So I don't think that it will be ready for 3.5.1.
(In reply to Yaniv Bronhaim from comment #3) > before adding such timeout, can we confirm that this is not storage issue? > 60seconds is indeed long delay to wait. is it reproducible ? can you specify > steps to see that and share the logs? The original bug is here with the relevant logs https://bugzilla.redhat.com/show_bug.cgi?id=1169290 It's not that easy to reproduce: We had two storage domains, an iSCSI for hosted engine and a NFS one. The NFS one is failing for external reasons, the connectStoragePool on the iSCSI one got the timeout since VDSM needs to re-scan all the storage domains and it can take more than 60 secs due to the failing one. > parallel to that we think if timeout is the right way to go . but maybe we > face storage bug that can solve that long delay Storage people said that 60 secs could not be enough ( https://bugzilla.redhat.com/show_bug.cgi?id=1169290#c11 https://bugzilla.redhat.com/show_bug.cgi?id=1169290#c13 ). We are also going to move to vdscli, it can solve also this one if it already provides a way to increase that timeout value.
(In reply to Simone Tiraboschi from comment #5) > (In reply to Dan Kenigsberg from comment #2) > > Simone, why is hosted-engine-setup still using vdsClient? vdsClient is a > > horrid little script, which nobody loves or wants to support. > > (In reply to Barak from comment #4) > > Simone , can you please respond on comment #2, > > Is it doable for 3.5.1 ? > > We already pushed a patch for it ( http://gerrit.ovirt.org/#/c/33996/ ) , > it's under review process but it need to be better tested cause it's quite > widespread over different areas. So I don't think that it will be ready for > 3.5.1. so why doesn't this work enough? lets continue the review and check if this reproduces with those scripts. it shouldn't.. the call directly blocks forever afaik
(In reply to Yaniv Bronhaim from comment #7) > (In reply to Simone Tiraboschi from comment #5) > > (In reply to Dan Kenigsberg from comment #2) > > > Simone, why is hosted-engine-setup still using vdsClient? vdsClient is a > > > horrid little script, which nobody loves or wants to support. > > > > (In reply to Barak from comment #4) > > > Simone , can you please respond on comment #2, > > > Is it doable for 3.5.1 ? > > > > We already pushed a patch for it ( http://gerrit.ovirt.org/#/c/33996/ ) , > > it's under review process but it need to be better tested cause it's quite > > widespread over different areas. So I don't think that it will be ready for > > 3.5.1. > > so why doesn't this work enough? lets continue the review and check if this > reproduces with those scripts. it shouldn't.. the call directly blocks > forever afaik Ok, try to move that part to vdscli. The full deprecation of vdsClient by hostedengine is targeted for 3.6.0
Till we get a clear reproducer this bug will be treated as medium level sevirity. I would prefer pushing the solution suggested in comment #2 to 3.5.x instead of adding this timeout to the vdsCli (which will probably be changed in 3.6 ... due to the expected move of the client to json-rpc).
Done, but I still haven't found the way to tweak the socket option on vdscli in order to get a longer SSL timeout.
Infra's Yeela should be able to assist in finding this.
Simone, this requires adding the option in the vdscli code. I will gladly help with that.
(In reply to Yeela Kaplan from comment #12) > Simone, this requires adding the option in the vdscli code. I will gladly > help with that. Please open a BZ for tracking this and make it blocking this bug.
(In reply to Sandro Bonazzola from comment #13) > (In reply to Yeela Kaplan from comment #12) > > Simone, this requires adding the option in the vdscli code. I will gladly > > help with that. > > Please open a BZ for tracking this and make it blocking this bug. You tried to make it work with vdscli. What was the result?
I'm not able to reproduce using vdscli so it should be ok just using vdscli instead of vdsClient for the slow storage operations.
Moving to modified as per comment #15. No additional change is needed.
Automated message: can you please update doctext or set it as not required?
If failing as for: https://bugzilla.redhat.com/show_bug.cgi?id=1190636#c5
*** This bug has been marked as a duplicate of bug 1190207 ***