Bug 1304969

Summary: [Docs][RFE] Document latency tolerance thresholds for the ovirtmgmt network
Product: Red Hat Enterprise Virtualization Manager Reporter: Lucy Bopf <lbopf>
Component: DocumentationAssignee: rhev-docs <rhev-docs>
Status: CLOSED INSUFFICIENT_DATA QA Contact: rhev-docs <rhev-docs>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 3.6.0CC: lsurette, pstehlik, ratamir, rbalakri, srevivo, ykaul, ylavi, zdover
Target Milestone: ---Keywords: FutureFeature, Improvement
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-11-05 12:25:08 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Docs RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Lucy Bopf 2016-02-05 07:20:57 UTC
The documentation should provide some information about latency tolerance for the ovirtmgmt (RHEV management) network in the following scenarios:

- GUI to Manager
- Manager to hosts
- Storage (NFS/Gluster/ISCSI)
- Live migration

This should include latency estimates for each, and an estimate of the point at which users may see errors or timeouts.

Comment 4 Yaniv Lavi 2016-11-22 13:00:19 UTC
What is needed:
1. An estimate of the point at which the latency between GUI and Manager renders RHV unusable
2. An estimate of the point at which the latency between the Manager and the hosts    renders RHV unusable
3. An estimate of the point at which the latency between the Storage and the engine renders RHV unusable
4. An estimate of the point at which you should assume that live migration is not going to work, but has timed-out

Can we provide this?
Is the scale effort planning to address this?

Comment 5 Yaniv Kaul 2016-11-22 13:07:51 UTC
This is quite an effort, but it is all tests, no development. The only plan we have to improve things here is the move to GWT-RPC, which is unlikely to happen in 4.1 (and may require re-testing item 1).

I don't think any of it belong to the scale team, but regardless, moving NEEDINFO to Gil. 

Note that no. 4 is quite impossible to estimate. Essentially, there's a race between the speed (affected by latency, bandwidth and available CPU) of migration and the speed at which the VM dirties its pages. Latency just adds more chance for the VM to 'win' - but we can't tell by how much really.

Comment 6 Gil Klein 2016-12-05 18:47:55 UTC
Don't we have a clear timeout that we can just document for all of the scenarios described in comment #4?

Comment 7 Yaniv Lavi 2016-12-06 23:43:26 UTC
(In reply to Gil Klein from comment #6)
> Don't we have a clear timeout that we can just document for all of the
> scenarios described in comment #4?

Usability is not failure point. We need testing on this.

Comment 8 Gil Klein 2017-01-03 11:51:36 UTC
(In reply to Yaniv Dary from comment #4)
> What is needed:
> 1. An estimate of the point at which the latency between GUI and Manager
> renders RHV unusable
Do we have any way to test this?
> 2. An estimate of the point at which the latency between the Manager and the
> hosts    renders RHV unusable
Pavel, can you help measure this and reply back?
> 3. An estimate of the point at which the latency between the Storage and the
> engine renders RHV unusable
Raz, can you help measure this and reply back?
> 4. An estimate of the point at which you should assume that live migration
> is not going to work, but has timed-out
I agree with Yaniv K insight provided in comment #5. It depends on the guest memory activity. As this is not RHV specific, I think we should check if platform has any docs about it.
> 
> Can we provide this?
> Is the scale effort planning to address this?

Comment 9 Yaniv Lavi 2017-02-07 08:56:25 UTC
Moving to future until info is provided.