Bug 1455667 - Increase SSHInactivityTimeoutSeconds​ for Upgrade host action
Summary: Increase SSHInactivityTimeoutSeconds​ for Upgrade host action
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.Infra
Version: 4.1.3
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ovirt-4.1.3
: 4.1.3.2
Assignee: Martin Perina
QA Contact: Jiri Belka
URL:
Whiteboard:
Depends On:
Blocks: 1450831
TreeView+ depends on / blocked
 
Reported: 2017-05-25 18:01 UTC by Ryan Barry
Modified: 2017-07-06 13:19 UTC (History)
19 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: During NGN installation/upgrade there are performed extensive I/O operation without any output on SSH connection, which may cause the time out of SSH connection and fail the whole process. The time out is defined by default to 300 seconds in SSHInactivityTimeoutSeconds option and can be changed by engine-config. But we don't want to change the default for all SSH related operation as it may increase time outs where not necessary. So we decided to double value of SSHInactivityTimeoutSeconds for host installation/upgrade flow to prevent time outs on long running operations. Consequence: Fix: Result:
Clone Of:
Environment:
Last Closed: 2017-07-06 13:19:53 UTC
oVirt Team: Infra
Embargoed:
rule-engine: ovirt-4.1+
rule-engine: exception+
lsvaty: testing_ack+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 77572 0 master MERGED core: Double SSH timeout for install/upgrade 2017-06-06 10:06:59 UTC
oVirt gerrit 77869 0 ovirt-engine-4.1 MERGED core: Double SSH timeout for install/upgrade 2017-06-07 07:44:04 UTC

Description Ryan Barry 2017-05-25 18:01:30 UTC
Description of problem:
When updating oVirt Node, we are bumping up against the timeout, and upgrades sometimes fail. Especially on systems with slower disks, or as problems are fixed in Node which increases the upgrade time, failing to upgrade from engine is a more likely scenario

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Install oVirt Node NGN 4.1.1
2. Upgrade to oVirt Node 4.1.2 on a system with slow disks

Actual results:
Upgrade fails sometimes at 300s. Succeeds every time at 600s.

Expected results:
Upgrade succeeds.

Additional info:

Comment 1 Red Hat Bugzilla Rules Engine 2017-05-25 18:03:17 UTC
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.

Comment 2 Martin Perina 2017-05-31 08:13:07 UTC
(In reply to Ryan Barry from comment #0)
> Description of problem:
> When updating oVirt Node, we are bumping up against the timeout, and
> upgrades sometimes fail. Especially on systems with slower disks, or as
> problems are fixed in Node which increases the upgrade time, failing to
> upgrade from engine is a more likely scenario
> 
> Version-Release number of selected component (if applicable):
> 
> 
> How reproducible:
> 
> 
> Steps to Reproduce:
> 1. Install oVirt Node NGN 4.1.1
> 2. Upgrade to oVirt Node 4.1.2 on a system with slow disks
> 
> Actual results:
> Upgrade fails sometimes at 300s. Succeeds every time at 600s.

The issue here is that the same timeout is used for all SSH connection timeout, so if we increase timeout to 10 minutes it may also in some cases increase timeout to detect failure during SSH Soft Fencing, which could make HA VMs be restarted later.

So I'd prefer only to increase SSHInactivityTimeoutSeconds​ for Upgrade action and use 2 * SSHInactivityTimeoutSeconds​ as a value for SSH timeout during upgrade

Comment 3 Jiri Belka 2017-06-21 09:31:43 UTC
ok, ovirt-engine-4.1.3.4-0.1.el7.noarch

tested while upgrading to rhvh-4.1-0.20170609.0+1


Note You need to log in before you can comment on or make changes to this bug.