Bug 1173638 - vdsClient/vdscli SSLError timeout error
Summary: vdsClient/vdscli SSLError timeout error
Keywords:
Status: CLOSED DUPLICATE of bug 1190207
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm
Version: 3.5.0
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 3.6.0
Assignee: Yeela Kaplan
QA Contact: Jiri Belka
URL:
Whiteboard: infra
Depends On:
Blocks: 1190207 1190636 1234915
TreeView+ depends on / blocked
 
Reported: 2014-12-12 14:59 UTC by Simone Tiraboschi
Modified: 2016-02-10 19:26 UTC (History)
15 users (show)

Fixed In Version: ovirt-3.6.0-alpha1
Doc Type: Bug Fix
Doc Text:
Cause: hosted-engine was talking with VDSM wrapping vdsClient utility Consequence: this could cause SSL timeout errors for long sync commands Fix: use vdscli library for storage operations instead of vdsClient Result: No more SSL timeouts
Clone Of:
: 1190207 (view as bug list)
Environment:
Last Closed: 2015-03-23 08:38:13 UTC
oVirt Team: Infra
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1169290 0 high CLOSED [hosted-engine] [iSCSI support] connectStoragePools fails with "SSLError: The read operation timed out" while adding a n... 2021-02-22 00:41:40 UTC
oVirt gerrit 36798 0 master MERGED vdsm: use vdscli instead of vdsClient for storage operations Never
oVirt gerrit 37559 0 ovirt-hosted-engine-setup-1.2 MERGED vdsm: use vdscli instead of vdsClient for storage operations Never

Internal Links: 1169290

Description Simone Tiraboschi 2014-12-12 14:59:32 UTC
Description of problem:
Using vdsClient over SSL we got a "SSLError: The read operation timed out" error issuing a command that took more than 60 sec in VDSM.

More precisely hosted-engine-setup is calling connectStoragePool for an iSCSI storage domain while a different NFS storage domain is failing due to a different issue.
VDSM takes more than 60 secs due to the NFS storage issue and in the mean time hosted-engine got "SSLError: The read operation timed out".


Version-Release number of selected component (if applicable):
rhel7 installed on both hosts
rhev 3.5 vt12

How reproducible:
?

Steps to Reproduce:
1.
2.
3.

Actual results:
If VDSM takes more than 60 secs, it fails with "SSLError: The read operation timed out"

Expected results:
connectStoragePool is a synchronous request, it should wait indefinitely for the storage call to either succeed or fail avoiding any failure due to SSL timeouts.


Additional info:

Comment 1 Yaniv Bronhaim 2014-12-16 11:59:09 UTC
This sounds like normal behavior which depends on the request timeout. if vdsm currently does not provide a way to set such timeout we should allow it

Comment 2 Dan Kenigsberg 2015-01-02 14:18:49 UTC
Simone, why is hosted-engine-setup still using vdsClient? vdsClient is a horrid little script, which nobody loves or wants to support.

Whenever possible, just use

 from vdsm import vdscli
 s = vdscli.connect()
 ...
 s.connectStoragePool()

which would give you a lot more control on the arguments, return value, and socket options (like timeout)

Comment 3 Yaniv Bronhaim 2015-01-04 15:17:15 UTC
before adding such timeout, can we confirm that this is not storage issue? 60seconds is indeed long delay to wait. is it reproducible ? can you specify steps to see that and share the logs?

parallel to that we think if timeout is the right way to go . but maybe we face storage bug that can solve that long delay

Comment 4 Barak 2015-01-06 12:29:05 UTC
Simone , can you please respond on comment #2,
Is it doable for 3.5.1 ?

Comment 5 Simone Tiraboschi 2015-01-07 11:25:20 UTC
(In reply to Dan Kenigsberg from comment #2)
> Simone, why is hosted-engine-setup still using vdsClient? vdsClient is a
> horrid little script, which nobody loves or wants to support.

(In reply to Barak from comment #4)
> Simone , can you please respond on comment #2,
> Is it doable for 3.5.1 ?

We already pushed a patch for it ( http://gerrit.ovirt.org/#/c/33996/ ) , it's under review process but it need to be better tested cause it's quite widespread over different areas. So I don't think that it will be ready for 3.5.1.

Comment 6 Simone Tiraboschi 2015-01-07 11:33:22 UTC
(In reply to Yaniv Bronhaim from comment #3)
> before adding such timeout, can we confirm that this is not storage issue?
> 60seconds is indeed long delay to wait. is it reproducible ? can you specify
> steps to see that and share the logs?

The original bug is here with the relevant logs https://bugzilla.redhat.com/show_bug.cgi?id=1169290

It's not that easy to reproduce: 
We had two storage domains, an iSCSI for hosted engine and a NFS one.
The NFS one is failing for external reasons, the connectStoragePool on the iSCSI one got the timeout since VDSM needs to re-scan all the storage domains and it can take more than 60 secs due to the failing one.

> parallel to that we think if timeout is the right way to go . but maybe we
> face storage bug that can solve that long delay

Storage people said that 60 secs could not be enough (
https://bugzilla.redhat.com/show_bug.cgi?id=1169290#c11 https://bugzilla.redhat.com/show_bug.cgi?id=1169290#c13 
).

We are also going to move to vdscli, it can solve also this one if it already provides a way to increase that timeout value.

Comment 7 Yaniv Bronhaim 2015-01-07 14:30:20 UTC
(In reply to Simone Tiraboschi from comment #5)
> (In reply to Dan Kenigsberg from comment #2)
> > Simone, why is hosted-engine-setup still using vdsClient? vdsClient is a
> > horrid little script, which nobody loves or wants to support.
> 
> (In reply to Barak from comment #4)
> > Simone , can you please respond on comment #2,
> > Is it doable for 3.5.1 ?
> 
> We already pushed a patch for it ( http://gerrit.ovirt.org/#/c/33996/ ) ,
> it's under review process but it need to be better tested cause it's quite
> widespread over different areas. So I don't think that it will be ready for
> 3.5.1.

so why doesn't this work enough? lets continue the review and check if this reproduces with those scripts. it shouldn't.. the call directly blocks forever afaik

Comment 8 Simone Tiraboschi 2015-01-09 11:33:54 UTC
(In reply to Yaniv Bronhaim from comment #7)
> (In reply to Simone Tiraboschi from comment #5)
> > (In reply to Dan Kenigsberg from comment #2)
> > > Simone, why is hosted-engine-setup still using vdsClient? vdsClient is a
> > > horrid little script, which nobody loves or wants to support.
> > 
> > (In reply to Barak from comment #4)
> > > Simone , can you please respond on comment #2,
> > > Is it doable for 3.5.1 ?
> > 
> > We already pushed a patch for it ( http://gerrit.ovirt.org/#/c/33996/ ) ,
> > it's under review process but it need to be better tested cause it's quite
> > widespread over different areas. So I don't think that it will be ready for
> > 3.5.1.
> 
> so why doesn't this work enough? lets continue the review and check if this
> reproduces with those scripts. it shouldn't.. the call directly blocks
> forever afaik

Ok, try to move that part to vdscli. The full deprecation of vdsClient by hostedengine is targeted for 3.6.0

Comment 9 Barak 2015-01-11 15:09:50 UTC
Till we get a clear reproducer this bug will be treated as medium level sevirity.
I would prefer pushing the solution suggested in comment #2 to 3.5.x instead of adding this timeout to the vdsCli (which will probably be changed in 3.6 ... due to the expected move of the client to json-rpc).

Comment 10 Simone Tiraboschi 2015-01-12 12:56:14 UTC
Done, but I still haven't found the way to tweak the socket option on vdscli in order to get a longer SSL timeout.

Comment 11 Dan Kenigsberg 2015-01-12 13:26:07 UTC
Infra's Yeela should be able to assist in finding this.

Comment 12 Yeela Kaplan 2015-01-13 15:21:47 UTC
Simone, this requires adding the option in the vdscli code. I will gladly help with that.

Comment 13 Sandro Bonazzola 2015-02-05 14:39:44 UTC
(In reply to Yeela Kaplan from comment #12)
> Simone, this requires adding the option in the vdscli code. I will gladly
> help with that.

Please open a BZ for tracking this and make it blocking this bug.

Comment 14 Oved Ourfali 2015-02-05 17:08:18 UTC
(In reply to Sandro Bonazzola from comment #13)
> (In reply to Yeela Kaplan from comment #12)
> > Simone, this requires adding the option in the vdscli code. I will gladly
> > help with that.
> 
> Please open a BZ for tracking this and make it blocking this bug.

You tried to make it work with vdscli. What was the result?

Comment 15 Simone Tiraboschi 2015-02-06 14:34:16 UTC
I'm not able to reproduce using vdscli so it should be ok just using vdscli instead of vdsClient for the slow storage operations.

Comment 16 Sandro Bonazzola 2015-02-06 16:27:50 UTC
Moving to modified as per comment #15. No additional change is needed.

Comment 18 Sandro Bonazzola 2015-02-20 11:08:20 UTC
Automated message: can you please update doctext or set it as not required?

Comment 20 Simone Tiraboschi 2015-03-12 12:34:30 UTC
If failing as for:
https://bugzilla.redhat.com/show_bug.cgi?id=1190636#c5

Comment 21 Yaniv Bronhaim 2015-03-23 08:38:13 UTC

*** This bug has been marked as a duplicate of bug 1190207 ***


Note You need to log in before you can comment on or make changes to this bug.