Description of problem: The customer requests a tool (i.e. web tool on labs), or guidance in the Docs (table?) about estimating the required network bandwidth for successful live migration of VMs. It could be based on these factors: * RHV Migration Policy Selected * Dirty Ratio during migration * VM Memory Size and Configuration For example, it would show an estimate of bandwidth required for a VM to migrate with the following: + Minimal Downtime Policy + 512GB of Memory + Low/Mid/High Dirty Ratio = estimated_bandwidth required So that customers can have a better idea of the network requirements for migrations, if its 10G or 40G that they need.
The documentation text flag should only be set after 'doc text' field is provided. Please provide the documentation text and set the flag to '?' again.
Could also have a note on how the customer can get the dirty ratio, even though it changes constantly: # virsh -r domjobinfo <VM Name> Job type: Unbounded Operation: Outgoing migration Time elapsed: 6789 ms Data processed: 517.500 MiB Data remaining: 27.680 MiB Data total: 4.345 GiB Memory processed: 517.500 MiB Memory remaining: 27.680 MiB Memory total: 4.345 GiB Memory bandwidth: 101.060 MiB/s Dirty rate: 203242 pages/s <----------- Page size: 4096 bytes Iteration: 2 Constant pages: 1009395 Normal pages: 130008 Normal data: 507.844 MiB Expected downtime: 100 ms Setup time: 56 ms Compression cache: 64.000 MiB Compressed data: 0.000 B Compressed pages: 0 Compression cache misses: 504 Compression overflows: 0
Martin, please give some input here, either on the idea for a tool (comment 0), or guidance in the Docs (table?) about estimating the required network bandwidth for successful live migration of VMs.
Ales/Arik, any recommendations?
For post-copy I think it doesn't matter - it depends on the performance degregation the user is willing to accept. Unless the channel disconnects, the migration should complete successfully. For pre-copy, there are two relevant phases: 1. To reach the watermark in which we pause the guest 2. To copy the remaining dirty pages while the guest is paused I don't think we should document example values for which the migration would complete successfully but maybe we can provide more data, e.g., how much time we wait for getting to the watermark in the former phase or how much time we enable the guest to be paused in the latter phase (e.g., with minimal-downtime policy it's 500ms) to assist users in setting the bandwidth when taking into consideration also things like the latency of the network, the amount of migrations that will happen simultaneously on that network, and soon also the amount of connections. I wouldn't go with a tool, as Germano wrote in comment 2 it's hard to retrieve those values and if the tool is expected to apply some calculation on given properties that are specified by the user then I think it will be simpler to write for formula somewhere. Milan, what do you think?
I agree. It's hard to predict the required bandwidth but there are things that can be considered. There are several factors involved, some of them being of dynamic nature -- besides the dirty rate (which is also dependent on autoconverge!) whether a single CPU power can saturate the available network bandwidth (we work on implementing multiple migration connections that allow using more CPU power for migrations and make the predictions even more complicated at the same time). There can be theoretical formulas but I'm afraid that in the end result the customer must always test what works best in the given environment. Helping customers to understand how the things work and what they can realistically expect depending on various tweaks to make informed decisions looks like a better idea than providing a tool that's likely to be not much more than a gimmick. BTW, I think there are some efforts on the platform to provide a dirty rate information for a VM even when it's not migrating, which could be useful in this context.
Just for context because I came back here to look at the solution again and I am guessing it's hidden in a private comment: https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/configuring_and_managing_virtualization/migrating-virtual-machines_configuring-and-managing-virtualization#migrating-a-virtual-machine-using-the-cli_migrating-virtual-machines # virsh domdirtyrate-calc vm-name 30 # virsh domstats vm-name --dirtyrate
That is correct, thank you for clarification Klaas!