Bug 1834979
Summary: | Topology Manager policy not respected when creating pods concurrently | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Victor Pickard <vpickard> |
Component: | Node | Assignee: | Victor Pickard <vpickard> |
Status: | CLOSED WONTFIX | QA Contact: | Sunil Choudhary <schoudha> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | 4.4 | CC: | aos-bugs, asimonel, ddharwar, jokerman, mburke, rphillips, schoudha, vpickard, zshi |
Target Milestone: | --- | ||
Target Release: | 4.5.0 | ||
Hardware: | x86_64 | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Known Issue | |
Doc Text: |
Cause: An issue in Topology Manager could result in NUMA resources not being aligned to the same NUMA node if Guaranteed QoS pods are created simultaneously on the same node.
Consequence: Resources requested in the pod spec may not be NUMA aligned.
Workaround (if any): Do not create pods with Guaranteed QoS concurrently on the same node. If this does occur, delete and recreate the pod.
Result: Pod resources should be NUMA aligned after deleting and recreating the pod with Guaranteed Qos resource requests.
|
Story Points: | --- |
Clone Of: | 1813567 | Environment: | |
Last Closed: | 2020-05-12 19:35:05 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1813567 | ||
Bug Blocks: |
Description
Victor Pickard
2020-05-12 19:25:26 UTC
This issue is resolved in 4.5 with the k8s rebase that pulls in k8s 1.18. In order to resolve this in 4.4.z, a pretty significant backport would be required. We are not planning to fix this in 4.4, so we will add documentation on how to avoid this scenario, and also, a work around if the bug is encountered. The work around for this bug is as follows: 1. Don't spin up multiple pods on a node simultaneously. This is likely to trigger this issue, and resources would not be NUMA aligned. 2. If NUMA resource alignment fails due to 1 above, the work around is to delete the pod, then create the pod again. Let me adjust the workaround text to make it a little clearer. 1. Don't spin up multiple pods with a Guaranteed QoS on a node simultaneously. This is likely to trigger this issue, and resources may not be NUMA aligned as requested in the pod spec. 2. If this bug is encountered, the work around is to delete and then recreate the pod. PR to add this to Known Issues in the 4.4 release notes with the workaround. https://github.com/openshift/openshift-docs/pull/22058 |