Hide Forgot
The documentation should provide some information about latency tolerance for the ovirtmgmt (RHEV management) network in the following scenarios: - GUI to Manager - Manager to hosts - Storage (NFS/Gluster/ISCSI) - Live migration This should include latency estimates for each, and an estimate of the point at which users may see errors or timeouts.
What is needed: 1. An estimate of the point at which the latency between GUI and Manager renders RHV unusable 2. An estimate of the point at which the latency between the Manager and the hosts renders RHV unusable 3. An estimate of the point at which the latency between the Storage and the engine renders RHV unusable 4. An estimate of the point at which you should assume that live migration is not going to work, but has timed-out Can we provide this? Is the scale effort planning to address this?
This is quite an effort, but it is all tests, no development. The only plan we have to improve things here is the move to GWT-RPC, which is unlikely to happen in 4.1 (and may require re-testing item 1). I don't think any of it belong to the scale team, but regardless, moving NEEDINFO to Gil. Note that no. 4 is quite impossible to estimate. Essentially, there's a race between the speed (affected by latency, bandwidth and available CPU) of migration and the speed at which the VM dirties its pages. Latency just adds more chance for the VM to 'win' - but we can't tell by how much really.
Don't we have a clear timeout that we can just document for all of the scenarios described in comment #4?
(In reply to Gil Klein from comment #6) > Don't we have a clear timeout that we can just document for all of the > scenarios described in comment #4? Usability is not failure point. We need testing on this.
(In reply to Yaniv Dary from comment #4) > What is needed: > 1. An estimate of the point at which the latency between GUI and Manager > renders RHV unusable Do we have any way to test this? > 2. An estimate of the point at which the latency between the Manager and the > hosts renders RHV unusable Pavel, can you help measure this and reply back? > 3. An estimate of the point at which the latency between the Storage and the > engine renders RHV unusable Raz, can you help measure this and reply back? > 4. An estimate of the point at which you should assume that live migration > is not going to work, but has timed-out I agree with Yaniv K insight provided in comment #5. It depends on the guest memory activity. As this is not RHV specific, I think we should check if platform has any docs about it. > > Can we provide this? > Is the scale effort planning to address this?
Moving to future until info is provided.