Bug 1722846
| Summary: | Incorrect instance count on some of the compute nodes. | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | sawaghma |
| Component: | openstack-nova | Assignee: | OSP DFG:Compute <osp-dfg-compute> |
| Status: | CLOSED EOL | QA Contact: | OSP DFG:Compute <osp-dfg-compute> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 10.0 (Newton) | CC: | dasmith, dbarhate, eglynn, gkadam, jhakimra, kchamart, lyarwood, mark.a.sloan, mburns, nlevinki, sbauza, sgordon, smooney, vromanso |
| Target Milestone: | --- | Keywords: | Triaged, ZStream |
| Target Release: | --- | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-07-07 09:44:58 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
sawaghma
2019-06-21 13:39:09 UTC
Hi Team, Any update on registered issue? Regards, Sagar W Might be related to [1] and its bug [2], in which the periodic task doesn't catch an exception raised from the hardware module. [1] https://review.opendev.org/#/c/661208/ [2] https://launchpad.net/bugs/1829349 Hi Team, Any update? Regards, Sagar W Hi Team, We are waiting for response! Regards, Sagar W the exception indicates the host has a vm that is requesting pinning to cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 15, 16, 17, 18, 20, 21, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 103, 104, 105, 106, 108, 109] however only the following cores are free [0, 1, 3, 4, 5, 6, 7, 8, 9, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 88, 89, 91, 92, 93, 94, 95, 96, 97, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109] this indicate that either the vm was live migrated and the cores used on the source host were not free on the destination host or the vcpu pin set was modifed on a host with running vms. in the former case we have a downstream only check to prevent live migration with cpu pinning if the cpus are not free on the destination host. we also added a workaround config option to allow you to disable the downstream only behavior however if you disable that feature and continue to live migrate numa instance the operator is required to ensure the host has the same cpus free on the destination. similarly if the admin modifies the vcpu_pin_set without first removing all pinned vms from the host its there responsibility to ensure that the new vcpu_pin_set is valid for all vms currently on the host. it is unclear that the upstream patch is a correct fix as incorrect numa topology information could result in vms being killed if there is not enough ram available to support a new instance. as such it is not clear that the upstream review should be merged or backported. in this specific case i think we need to confrim with the customer if they have set cpu_pinning_migration_quick_fail=false in there nova.conf and if they have performed live migrations of pinned guest or modifed the vcpu_pin_set to remvoe core 2 and 10 Hello Sean, Please be noted that, user is not able to find the "cpu_pinning_migration_quick_fail=false" entry in nova.conf. Also user has done some live migration activities for few instances, though he is not sure how to verify if they were pinned instances. Kindly let us know the way to find the instances are pinned. Regards, Sagar W Hello Team, Appreciate if someone can provide current status on this bug. |