Bug 1559314 - CPUPinningInvalid causes 'Error updating resources' & shows wrong ram usage for a Hypervisor
Summary: CPUPinningInvalid causes 'Error updating resources' & shows wrong ram usage f...
Keywords:
Status: CLOSED DUPLICATE of bug 1222414
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-nova
Version: 10.0 (Newton)
Hardware: x86_64
OS: Linux
unspecified
low
Target Milestone: ---
: ---
Assignee: Stephen Finucane
QA Contact: OSP DFG:Compute
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-03-22 09:36 UTC by Jaison Raju
Modified: 2023-03-21 18:46 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-03-23 12:13:55 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker OSP-13594 0 None None None 2022-03-13 15:16:50 UTC

Description Jaison Raju 2018-03-22 09:36:55 UTC
Description of problem:
When instance using dedicated cpu_policy are migrated to a compute node with instances already using pcpu cores used by the migrating instance, nova-scheduler still migrates instance.
The problem is that, in such a compute node where instance have overlapped pcpus , the periodic tracker thread fails & horizon show almost all RAM being used.
That is RAM (used) = RAM (total) - 2G .


Version-Release number of selected component (if applicable):
RHOS10

How reproducible:
Yes

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:
RAM usage should be correctly reported by tracker thread

Additional info:

Comment 1 Jaison Raju 2018-03-22 11:22:38 UTC
This is from my test environment. I can provide access if required for investigation.
$ nova hypervisor-list
+----+-----------------------+-------+---------+
| ID | Hypervisor hostname   | State | Status  |
+----+-----------------------+-------+---------+
| 1  | compute-1.localdomain | up    | enabled |
| 2  | compute-0.localdomain | up    | enabled |
+----+-----------------------+-------+---------+
(overcloud) [stack@undercloud-10 ~]$ nova hypervisor-show 1
+---------------------------+------------------------------------------+
| Property                  | Value                                    |
+---------------------------+------------------------------------------+
| cpu_info_arch             | x86_64                                   |
| cpu_info_features         | ["pge", "avx", "xsaveopt", "clflush",    |
|                           | "sep", "syscall", "tsc_adjust", "tsc-    |
|                           | deadline", "dtes64", "invpcid", "tsc",   |
|                           | "fsgsbase", "xsave", "vmx", "erms",      |
|                           | "xtpr", "cmov", "smep", "pcid", "est",   |
|                           | "pat", "monitor", "smx", "pbe", "lm",    |
|                           | "msr", "nx", "fxsr", "tm", "sse4.1",     |
|                           | "pae", "sse4.2", "pclmuldq", "acpi",     |
|                           | "fma", "vme", "popcnt", "mmx",           |
|                           | "osxsave", "cx8", "mce", "de", "rdtscp", |
|                           | "ht", "dca", "lahf_lm", "abm", "pdcm",   |
|                           | "mca", "pdpe1gb", "apic", "sse", "f16c", |
|                           | "pse", "ds", "invtsc", "pni", "tm2",     |
|                           | "avx2", "aes", "sse2", "ss", "ds_cpl",   |
|                           | "arat", "bmi1", "bmi2", "ssse3", "fpu",  |
|                           | "cx16", "pse36", "mtrr", "movbe",        |
|                           | "rdrand", "cmt", "x2apic"]               |
| cpu_info_model            | Haswell-noTSX                            |
| cpu_info_topology_cells   | 2                                        |
| cpu_info_topology_cores   | 6                                        |
| cpu_info_topology_sockets | 1                                        |
| cpu_info_topology_threads | 1                                        |
| cpu_info_vendor           | Intel                                    |
| current_workload          | 0                                        |
| disk_available_least      | 151                                      |
| free_disk_gb              | 156                                      |
| free_ram_mb               | 29602                                    |
| host_ip                   | 192.168.26.19                            |
| hypervisor_hostname       | compute-1.localdomain                    |
| hypervisor_type           | QEMU                                     |
| hypervisor_version        | 2009000                                  |
| id                        | 1                                        |
| local_gb                  | 186                                      |
| local_gb_used             | 30                                       |
| memory_mb                 | 32674                                    |
| memory_mb_used            | 3072                                     |
| running_vms               | 1                                        |
| service_disabled_reason   | None                                     |
| service_host              | compute-1.localdomain                    |
| service_id                | 15                                       |
| state                     | up                                       |
| status                    | enabled                                  |
| vcpus                     | 6                                        |
| vcpus_used                | 2                                        |
+---------------------------+------------------------------------------+
(overcloud) [stack@undercloud-10 ~]$ nova hypervisor-show 2
+---------------------------+------------------------------------------+
| Property                  | Value                                    |
+---------------------------+------------------------------------------+
| cpu_info_arch             | x86_64                                   |
| cpu_info_features         | ["pge", "avx", "xsaveopt", "clflush",    |
|                           | "sep", "syscall", "tsc_adjust", "tsc-    |
|                           | deadline", "dtes64", "invpcid", "tsc",   |
|                           | "fsgsbase", "xsave", "vmx", "erms",      |
|                           | "xtpr", "cmov", "smep", "pcid", "est",   |
|                           | "pat", "monitor", "smx", "pbe", "lm",    |
|                           | "msr", "nx", "fxsr", "tm", "sse4.1",     |
|                           | "pae", "sse4.2", "pclmuldq", "acpi",     |
|                           | "fma", "vme", "popcnt", "mmx",           |
|                           | "osxsave", "cx8", "mce", "de", "rdtscp", |
|                           | "ht", "dca", "lahf_lm", "abm", "pdcm",   |
|                           | "mca", "pdpe1gb", "apic", "sse", "f16c", |
|                           | "pse", "ds", "invtsc", "pni", "tm2",     |
|                           | "avx2", "aes", "sse2", "ss", "ds_cpl",   |
|                           | "arat", "bmi1", "bmi2", "ssse3", "fpu",  |
|                           | "cx16", "pse36", "mtrr", "movbe",        |
|                           | "rdrand", "cmt", "x2apic"]               |
| cpu_info_model            | Haswell-noTSX                            |
| cpu_info_topology_cells   | 2                                        |
| cpu_info_topology_cores   | 6                                        |
| cpu_info_topology_sockets | 1                                        |
| cpu_info_topology_threads | 1                                        |
| cpu_info_vendor           | Intel                                    |
| current_workload          | 0                                        |
| disk_available_least      | 121                                      |
| free_disk_gb              | 126                                      |
| free_ram_mb               | 27554                                    |
| host_ip                   | 192.168.26.11                            |
| hypervisor_hostname       | compute-0.localdomain                    |
| hypervisor_type           | QEMU                                     |
| hypervisor_version        | 2009000                                  |
| id                        | 2                                        |
| local_gb                  | 186                                      |
| local_gb_used             | 60                                       |
| memory_mb                 | 32674                                    |
| memory_mb_used            | 5120                                     |
| running_vms               | 2                                        |
| service_disabled_reason   | None                                     |
| service_host              | compute-0.localdomain                    |
| service_id                | 16                                       |
| state                     | up                                       |
| status                    | enabled                                  |
| vcpus                     | 6                                        |
| vcpus_used                | 4                                        |
+---------------------------+------------------------------------------+
(overcloud) [stack@undercloud-10 ~]$ nova list
+--------------------------------------+-------+--------+------------+-------------+---------------------------+
| ID                                   | Name  | Status | Task State | Power State | Networks                  |
+--------------------------------------+-------+--------+------------+-------------+---------------------------+
| 9762beab-9f4d-49fd-ae15-96034731401d | rhel1 | ACTIVE | -          | Running     | provider-183=10.65.219.66 |
| 87c5acdb-9623-4d1a-99e1-38b6e65fc9ee | rhel2 | ACTIVE | -          | Running     | provider-183=10.65.219.72 |
| 480fb05b-d15d-4aa7-8bb8-861ad3e7ec77 | rhel3 | ACTIVE | -          | Running     | provider-183=10.65.219.67 |
+--------------------------------------+-------+--------+------------+-------------+---------------------------+

Live migrated from horizon:

(overcloud) [stack@undercloud-10 ~]$ nova hypervisor-show 1
+---------------------------+------------------------------------------+
| Property                  | Value                                    |
+---------------------------+------------------------------------------+
| cpu_info_arch             | x86_64                                   |
| cpu_info_features         | ["pge", "avx", "xsaveopt", "clflush",    |
|                           | "sep", "syscall", "tsc_adjust", "tsc-    |
|                           | deadline", "dtes64", "invpcid", "tsc",   |
|                           | "fsgsbase", "xsave", "vmx", "erms",      |
|                           | "xtpr", "cmov", "smep", "pcid", "est",   |
|                           | "pat", "monitor", "smx", "pbe", "lm",    |
|                           | "msr", "nx", "fxsr", "tm", "sse4.1",     |
|                           | "pae", "sse4.2", "pclmuldq", "acpi",     |
|                           | "fma", "vme", "popcnt", "mmx",           |
|                           | "osxsave", "cx8", "mce", "de", "rdtscp", |
|                           | "ht", "dca", "lahf_lm", "abm", "pdcm",   |
|                           | "mca", "pdpe1gb", "apic", "sse", "f16c", |
|                           | "pse", "ds", "invtsc", "pni", "tm2",     |
|                           | "avx2", "aes", "sse2", "ss", "ds_cpl",   |
|                           | "arat", "bmi1", "bmi2", "ssse3", "fpu",  |
|                           | "cx16", "pse36", "mtrr", "movbe",        |
|                           | "rdrand", "cmt", "x2apic"]               |
| cpu_info_model            | Haswell-noTSX                            |
| cpu_info_topology_cells   | 2                                        |
| cpu_info_topology_cores   | 6                                        |
| cpu_info_topology_sockets | 1                                        |
| cpu_info_topology_threads | 1                                        |
| cpu_info_vendor           | Intel                                    |
| current_workload          | 0                                        |
| disk_available_least      | 181                                      |
| free_disk_gb              | 186                                      |
| free_ram_mb               | 31650                                    |
| host_ip                   | 192.168.26.19                            |
| hypervisor_hostname       | compute-1.localdomain                    |
| hypervisor_type           | QEMU                                     |
| hypervisor_version        | 2009000                                  |
| id                        | 1                                        |
| local_gb                  | 186                                      |
| local_gb_used             | 0                                        |
| memory_mb                 | 32674                                    |
| memory_mb_used            | 1024                                     |
| running_vms               | 0                                        |
| service_disabled_reason   | None                                     |
| service_host              | compute-1.localdomain                    |
| service_id                | 15                                       |
| state                     | up                                       |
| status                    | enabled                                  |
| vcpus                     | 6                                        |
| vcpus_used                | 0                                        |
+---------------------------+------------------------------------------+
(overcloud) [stack@undercloud-10 ~]$ nova hypervisor-show 2
+---------------------------+------------------------------------------+
| Property                  | Value                                    |
+---------------------------+------------------------------------------+
| cpu_info_arch             | x86_64                                   |
| cpu_info_features         | ["pge", "avx", "xsaveopt", "clflush",    |
|                           | "sep", "syscall", "tsc_adjust", "tsc-    |
|                           | deadline", "dtes64", "invpcid", "tsc",   |
|                           | "fsgsbase", "xsave", "vmx", "erms",      |
|                           | "xtpr", "cmov", "smep", "pcid", "est",   |
|                           | "pat", "monitor", "smx", "pbe", "lm",    |
|                           | "msr", "nx", "fxsr", "tm", "sse4.1",     |
|                           | "pae", "sse4.2", "pclmuldq", "acpi",     |
|                           | "fma", "vme", "popcnt", "mmx",           |
|                           | "osxsave", "cx8", "mce", "de", "rdtscp", |
|                           | "ht", "dca", "lahf_lm", "abm", "pdcm",   |
|                           | "mca", "pdpe1gb", "apic", "sse", "f16c", |
|                           | "pse", "ds", "invtsc", "pni", "tm2",     |
|                           | "avx2", "aes", "sse2", "ss", "ds_cpl",   |
|                           | "arat", "bmi1", "bmi2", "ssse3", "fpu",  |
|                           | "cx16", "pse36", "mtrr", "movbe",        |
|                           | "rdrand", "cmt", "x2apic"]               |
| cpu_info_model            | Haswell-noTSX                            |
| cpu_info_topology_cells   | 2                                        |
| cpu_info_topology_cores   | 6                                        |
| cpu_info_topology_sockets | 1                                        |
| cpu_info_topology_threads | 1                                        |
| cpu_info_vendor           | Intel                                    |
| current_workload          | 0                                        |
| disk_available_least      | 91                                       |
| free_disk_gb              | 126                                      |
| free_ram_mb               | 27554                                    |
| host_ip                   | 192.168.26.11                            |
| hypervisor_hostname       | compute-0.localdomain                    |
| hypervisor_type           | QEMU                                     |
| hypervisor_version        | 2009000                                  |
| id                        | 2                                        |
| local_gb                  | 186                                      |
| local_gb_used             | 4                                        |
| memory_mb                 | 32674                                    |
| memory_mb_used            | 30597                                    |  <<<<<<<<<<<<<<----------------
| running_vms               | 2                                        |
| service_disabled_reason   | None                                     |
| service_host              | compute-0.localdomain                    |
| service_id                | 16                                       |
| state                     | up                                       |
| status                    | enabled                                  |
| vcpus                     | 6                                        |
| vcpus_used                | 6                                        |
+---------------------------+------------------------------------------+
(overcloud) [stack@undercloud-10 ~]$

Comment 2 Stephen Finucane 2018-03-22 15:39:18 UTC
Could you provide the version of the openstack-nova package that this is occurring with?

Comment 4 Jaison Raju 2018-03-23 05:36:42 UTC
I apologize for the confusion.

The issue should have not occurred in the first place while live-migration,
but it did because if the destination host is specified,  nova bypasses the scheduler.
Hence we ended up with instances using same pcpus.

Although i found it odd that tracker resource failed with this traceback, then the ram resource usage reported immediately change to almost max.
I am not sure if this value is calculated from the nova-compute or from server.

If you think we should consider any improvement to handle such situations, then we can take this as a low priority RFE,  like:
i. If resource_tracker fails, the reported resource usage can remain what it was earlier.
ii. Or if possible can resource_tracker still send right report on what ever resource it was able to collect successfully?

If this is not possible / feasible / logical , i am okay to have this bz closed.

Comment 5 Stephen Finucane 2018-03-23 12:13:55 UTC
(In reply to Jaison Raju from comment #4)
> I apologize for the confusion.
> 
> The issue should have not occurred in the first place while live-migration,
> but it did because if the destination host is specified,  nova bypasses the
> scheduler.
> Hence we ended up with instances using same pcpus.

Aha, yes, this would result in that. The resolution to resolve this is tracked in bz 1222414.

> Although i found it odd that tracker resource failed with this traceback,
> then the ram resource usage reported immediately change to almost max.
> I am not sure if this value is calculated from the nova-compute or from
> server.
> 
> If you think we should consider any improvement to handle such situations,
> then we can take this as a low priority RFE,  like:
> i. If resource_tracker fails, the reported resource usage can remain what it
> was earlier.
> ii. Or if possible can resource_tracker still send right report on what ever
> resource it was able to collect successfully?
> 
> If this is not possible / feasible / logical , i am okay to have this bz
> closed.

Given the existing issues we have with live migration of pinned instances and force migrations in general, I'd be reluctant to add yet another similar RFE for this. We can use 1222414 to track it instead. I'm going to close this as a duplicate of that issue.

*** This bug has been marked as a duplicate of bug 1222414 ***


Note You need to log in before you can comment on or make changes to this bug.