Bug 1433718

Summary: Add reasonable timeout for the Python SDK requests
Product: [oVirt] ovirt-hosted-engine-setup Reporter: Artyom <alukiano>
Component: GeneralAssignee: Simone Tiraboschi <stirabos>
Status: CLOSED CURRENTRELEASE QA Contact: Artyom <alukiano>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 2.1.0.4CC: bugs, ykaul
Target Milestone: ovirt-4.1.2Keywords: Reopened, Triaged
Target Release: 2.1.0.6Flags: rule-engine: ovirt-4.1+
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-05-23 08:11:43 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Integration RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1446582    

Description Artyom 2017-03-19 13:19:49 UTC
Description of problem:
In my tests or in the manual deployment sometime I encounter the cases that request to the python SDK stuck and the code need to wait half hour for the timeout.
2017-03-19 13:24:26 DEBUG otopi.plugins.gr_he_setup.engine.add_host add_host._wait_host_ready:92 VDSM host in installing state
2017-03-19 13:24:27 DEBUG otopi.plugins.gr_he_setup.engine.add_host add_host._wait_host_ready:92 VDSM host in installing state
2017-03-19 13:24:29 DEBUG otopi.plugins.gr_he_setup.engine.add_host add_host._wait_host_ready:92 VDSM host in installing state
2017-03-19 13:24:30 DEBUG otopi.plugins.gr_he_setup.engine.add_host add_host._wait_host_ready:92 VDSM host in installing state
2017-03-19 13:24:31 DEBUG otopi.plugins.gr_he_setup.engine.add_host add_host._wait_host_ready:92 VDSM host in installing state
2017-03-19 13:24:32 DEBUG otopi.plugins.gr_he_setup.engine.add_host add_host._wait_host_ready:92 VDSM host in installing state
2017-03-19 13:54:34 DEBUG otopi.plugins.gr_he_setup.engine.add_host add_host._wait_host_ready:86 Error fetching host state: [ERROR]::oVirt API connection failure, (28, 'Operation timed out after 1800000 milliseconds with 0 out of -1 bytes received')
2017-03-19 13:54:34 DEBUG otopi.plugins.gr_he_setup.engine.add_host add_host._wait_host_ready:92 VDSM host in  state
2017-03-19 13:54:35 DEBUG otopi.plugins.gr_he_setup.engine.add_host add_host._wait_host_ready:92 VDSM host in up state
2017-03-19 13:54:35 INFO otopi.plugins.gr_he_setup.engine.add_host add_host._wait_host_ready:103 The VDSM Host is now operational
2017-03-19 13:54:35 DEBUG otopi.plugins.gr_he_setup.engine.add_host add_host._closeup:672 Setting CPU for the cluster

Version-Release number of selected component (if applicable):
ovirt-hosted-engine-setup-2.1.0.4-1.el7ev.noarch

How reproducible:
5-10%

Steps to Reproduce:
1. Run hosted engine deploy
2.
3.

Actual results:
Deployment can be stuck on the python SDK command for half hour

Expected results:
In case if the python SDK command stuck add the reasonable timeout(I believe one minute will more than enough)

Additional info:
I checked the python SDK documentation and you can add the timeout parameter when you initialize the new instance.

Comment 1 Yaniv Kaul 2017-03-19 21:48:30 UTC
Unfortunately, it may take 15m to install a host, if the RPMs download is slllllooowww..

Comment 2 Artyom 2017-03-20 08:32:36 UTC
In the case of the hosted-engine host deploy we sample host status each second, so it will not be a problem, also HE host has already the most part of the relevant packages. So maybe we can reduce the timeout time.
What do you think?

Comment 3 Yaniv Kaul 2017-03-20 08:39:45 UTC
(In reply to Artyom from comment #2)
> In the case of the hosted-engine host deploy we sample host status each
> second, so it will not be a problem, also HE host has already the most part
> of the relevant packages. So maybe we can reduce the timeout time.
> What do you think?

I see no reason to query every second actually, but I also don't see the real value in reducing this timeout.

Comment 4 Artyom 2017-03-20 11:33:07 UTC
The real value is to save some time in the case when the python API request is stuck like you can see from the log that I provided. My host deployment finished after 2 minutes, but because some problem with the python SDK I needed to wait the half hour. Also, I do not ask general timeout reduce, I ask timeout reduce only in this specific case when we deploy first HE host to the engine.

Comment 5 Yaniv Kaul 2017-03-20 11:41:08 UTC
(In reply to Artyom from comment #4)
> The real value is to save some time in the case when the python API request
> is stuck like you can see from the log that I provided. My host deployment
> finished after 2 minutes, but because some problem with the python SDK I
> needed to wait the half hour. Also, I do not ask general timeout reduce, I
> ask timeout reduce only in this specific case when we deploy first HE host
> to the engine.

I suggest finding the real problem in the SDK.

Comment 6 Simone Tiraboschi 2017-03-20 13:44:00 UTC
Artyom is right, we were passing the wrong value on API constructor from the python SDK resulting in unacceptably long timeout.

Comment 7 Artyom 2017-05-07 14:08:35 UTC
Verified on ovirt-hosted-engine-setup-2.1.0.6-1.el7ev.noarch