Bug 2024126

Summary: No guidelines of how to optimize for concurrent migrations
Product: Container Native Virtualization (CNV) Reporter: Fabian Deutsch <fdeutsch>
Component: DocumentationAssignee: Shikha Jhala <sjhala>
Status: CLOSED CURRENTRELEASE QA Contact: zhe peng <zpeng>
Severity: high Docs Contact:
Priority: medium    
Version: 4.9.0CC: apinnick, jhopper, zpeng
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: bug tracker 4.13
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-07-03 19:41:46 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Fabian Deutsch 2021-11-17 11:42:28 UTC
Document URL: https://docs.openshift.com/container-platform/4.9/virt/install/preparing-cluster-for-virt.html

Section Number and Name: 

Describe the issue: No parapgraph to describe how to calculcate the required spare memory in the cluster in order to enable a certain number of concurrent migrations

Suggestions for improvement: A paragraph describing the relevance of migrations i.e.it's important to ensure there is enough network bandwidth and to use shared storage to enable fast migrations, as this is commonly used to ensure proper cluster upgrades. Further: There must be enough free memory on nodes, to be able to act as a live migration target for VMs. While the system is limitting concurrent live migrations per cluster by default to 5 (link to https://docs.openshift.com/container-platform/4.9/virt/live_migration/virt-live-migration-limits.html) the number of live migrations is further limit by the availability of free memory on nodes. An admin needs to ensure that there will always be enough memory on the cluster to live migrate at least one VM at a time, or at best more. The exact size of the required free memory depends on the sze of the VMs.
Baseline can be: Minimum required free memory: Sum of memory of 5 largest VMs on the cluster.

Additional information:

Comment 1 Jenifer Abrams 2021-11-17 20:04:26 UTC
Consider on the mem availability that you need enough spare cluster capacity to be able to fully migrate num VMs * num nodes that are allowed to drain in parallel (MCP maxUnavailable * number of MCPs that update in parallel). Otherwise there could be a case where VMs are migrating off draining nodes up to a point where you have no more mem requests available which could end up blocking all node drains from completing (blocked on migration PDB). From a net bandwidth perspective we do want multiple nodes draining a few VMs at the same time, but we want to make sure to prevent a "stuck" drain scenario.

Comment 2 ctomasko 2022-12-13 13:46:03 UTC
Documentation team is experiencing capacity constraints. This bug is unassigned until a writer is available.

Comment 4 Shikha Jhala 2023-06-06 19:13:31 UTC
@fdeutsch Can you please review the PR: https://github.com/openshift/openshift-docs/pull/60910. Thanks.

Comment 5 Shikha Jhala 2023-06-30 01:51:19 UTC
@zpeng Can you please provide QE review for this fix: https://github.com/openshift/openshift-docs/pull/60910. If the PR looks good, can you please move the BZ status to Verified? Thank you.

Comment 6 zhe peng 2023-06-30 03:47:53 UTC
I've review the PR, the concurrent migrations NOTES has been added to the doc. 
move this bug to verified.

Comment 7 Shikha Jhala 2023-07-03 19:41:46 UTC
PR is merged. Changes are live on the customer-facing site: https://docs.openshift.com/container-platform/4.13/virt/install/preparing-cluster-for-virt.html#live-migration_preparing-cluster-for-virt