Bug 854535

Summary: PRD32 - bootstrap: support longer bootstrap duration
Product: Red Hat Enterprise Virtualization Manager Reporter: Alon Bar-Lev <alonbl>
Component: ovirt-engineAssignee: Alon Bar-Lev <alonbl>
Status: CLOSED ERRATA QA Contact: Tareq Alayan <talayan>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: unspecifiedCC: abaron, bazulay, dyasny, iheim, lpeer, oramraz, Rhev-m-bugs, sgrinber, yeylon, ykaul, yzaslavs
Target Milestone: ---Keywords: Improvement
Target Release: 3.2.0Flags: dyasny: Triaged+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: infra
Fixed In Version: Doc Type: Enhancement
Doc Text:
Soft and hard timeouts have been introduced for bootstrap operations. The soft timeout, defined by the SSHInactivityTimoutSeconds parameter, is 5 minutes. The hard timeout, defined by the SSHInactivityHardTimoutSeconds parameter, is 30 minutes.
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-06-10 21:09:31 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 866889, 875920, 915537    

Description Alon Bar-Lev 2012-09-05 09:45:49 UTC
Current implementation of bootstrap installation has hard limit of 10 minutes.

This is not enough if slow connection to the internet is available.

The underline ssh library supports two types of timeouts:

Hard timeout - timeout of the entire operation.

Soft timeout - timeout since last network activity.

Having these two types of timeout enables us to extend the time of the entire process while still detecting failure of progress.

Comment 1 Alon Bar-Lev 2012-09-05 09:47:12 UTC
commit df5892854a6ca1da4c680ca7eb5df1e496258b7e
Author: Alon Bar-Lev <alonbl>
Date:   Tue Sep 4 14:05:27 2012 +0300

    bootstrap: introduce soft timeouts for operations
    
    The only place non default ssh timeout was enforced is when getting node
    id.
     AddVdsCommand::canDoAction()
    
    So it relatively simple to add soft timeout / hard timeout support by
    just modify the defaults.
    
    Hard timeout - maximum duration of command.
    Soft timeout - maximum duration since last network activity.
    
    Reduce the default of SSHInactivityTimoutSeconds to 5 minutes, it is now
    the soft timeout limit.
    
    Add a new configuration parameter of SSHInactivityHardTimoutSeconds
    which is 30 minutes.
    
    Change-Id: Ic37e4384fda412f92bffd7a8aa809d0dfd4d8157
    Signed-off-by: Alon Bar-Lev <alonbl>

http://gerrit.ovirt.org/#/c/7734/

Comment 2 Alon Bar-Lev 2012-09-05 09:52:56 UTC
Example of user report[1]

[1] http://www.mail-archive.com/users@ovirt.org/msg03271.html

Comment 4 Alon Bar-Lev 2013-03-24 13:06:21 UTC
Test notes:

1. use vanilla host.

2. make sure your host is connected to the yum mirror using slow network connection.

3. perform host-deploy.

You should notice:

1. "Downloading xxx" messages at least every minute at even log.

2.  defaults: SSHInactivityHardTimoutSeconds=1800, SSHInactivityTimoutSeconds=300, that's mean that unless there is 5 minutes of quiet no disconnect, and maximum duration is 30 minutes.

3. As long as the whole operation does not take more than 30 minutes it should succeed.

In the previous implementation the whole operation should have completed in something like 10 minutes, and it was hung for 10 minutes even if there was no activity.

Comment 5 Cheryn Tan 2013-04-03 06:51:44 UTC
This bug is currently attached to errata RHEA-2013:14491. If this change is not to be documented in the text for this errata please either remove it from the errata, set the requires_doc_text flag to minus (-), or leave a "Doc Text" value of "--no tech note required" if you do not have permission to alter the flag.

Otherwise to aid in the development of relevant and accurate release documentation, please fill out the "Doc Text" field above with these four (4) pieces of information:

* Cause: What actions or circumstances cause this bug to present.

* Consequence: What happens when the bug presents.

* Fix: What was done to fix the bug.

* Result: What now happens when the actions or circumstances above occur. (NB: this is not the same as 'the bug doesn't present anymore')

Once filled out, please set the "Doc Type" field to the appropriate value for the type of change made and submit your edits to the bug.

For further details on the Cause, Consequence, Fix, Result format please refer to:

https://bugzilla.redhat.com/page.cgi?id=fields.html#cf_release_notes

Thanks in advance.

Comment 6 Alon Bar-Lev 2013-04-03 07:21:23 UTC
No doc is required.

Comment 7 errata-xmlrpc 2013-06-10 21:09:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2013-0888.html