Bug 1559314
| Summary: | CPUPinningInvalid causes 'Error updating resources' & shows wrong ram usage for a Hypervisor | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Jaison Raju <jraju> |
| Component: | openstack-nova | Assignee: | Stephen Finucane <stephenfin> |
| Status: | CLOSED DUPLICATE | QA Contact: | OSP DFG:Compute <osp-dfg-compute> |
| Severity: | low | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 10.0 (Newton) | CC: | berrange, dasmith, eglynn, jhakimra, jraju, kchamart, sbauza, sferdjao, sgordon, srevivo, stephenfin, vromanso |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2018-03-23 12:13:55 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Jaison Raju
2018-03-22 09:36:55 UTC
This is from my test environment. I can provide access if required for investigation. $ nova hypervisor-list +----+-----------------------+-------+---------+ | ID | Hypervisor hostname | State | Status | +----+-----------------------+-------+---------+ | 1 | compute-1.localdomain | up | enabled | | 2 | compute-0.localdomain | up | enabled | +----+-----------------------+-------+---------+ (overcloud) [stack@undercloud-10 ~]$ nova hypervisor-show 1 +---------------------------+------------------------------------------+ | Property | Value | +---------------------------+------------------------------------------+ | cpu_info_arch | x86_64 | | cpu_info_features | ["pge", "avx", "xsaveopt", "clflush", | | | "sep", "syscall", "tsc_adjust", "tsc- | | | deadline", "dtes64", "invpcid", "tsc", | | | "fsgsbase", "xsave", "vmx", "erms", | | | "xtpr", "cmov", "smep", "pcid", "est", | | | "pat", "monitor", "smx", "pbe", "lm", | | | "msr", "nx", "fxsr", "tm", "sse4.1", | | | "pae", "sse4.2", "pclmuldq", "acpi", | | | "fma", "vme", "popcnt", "mmx", | | | "osxsave", "cx8", "mce", "de", "rdtscp", | | | "ht", "dca", "lahf_lm", "abm", "pdcm", | | | "mca", "pdpe1gb", "apic", "sse", "f16c", | | | "pse", "ds", "invtsc", "pni", "tm2", | | | "avx2", "aes", "sse2", "ss", "ds_cpl", | | | "arat", "bmi1", "bmi2", "ssse3", "fpu", | | | "cx16", "pse36", "mtrr", "movbe", | | | "rdrand", "cmt", "x2apic"] | | cpu_info_model | Haswell-noTSX | | cpu_info_topology_cells | 2 | | cpu_info_topology_cores | 6 | | cpu_info_topology_sockets | 1 | | cpu_info_topology_threads | 1 | | cpu_info_vendor | Intel | | current_workload | 0 | | disk_available_least | 151 | | free_disk_gb | 156 | | free_ram_mb | 29602 | | host_ip | 192.168.26.19 | | hypervisor_hostname | compute-1.localdomain | | hypervisor_type | QEMU | | hypervisor_version | 2009000 | | id | 1 | | local_gb | 186 | | local_gb_used | 30 | | memory_mb | 32674 | | memory_mb_used | 3072 | | running_vms | 1 | | service_disabled_reason | None | | service_host | compute-1.localdomain | | service_id | 15 | | state | up | | status | enabled | | vcpus | 6 | | vcpus_used | 2 | +---------------------------+------------------------------------------+ (overcloud) [stack@undercloud-10 ~]$ nova hypervisor-show 2 +---------------------------+------------------------------------------+ | Property | Value | +---------------------------+------------------------------------------+ | cpu_info_arch | x86_64 | | cpu_info_features | ["pge", "avx", "xsaveopt", "clflush", | | | "sep", "syscall", "tsc_adjust", "tsc- | | | deadline", "dtes64", "invpcid", "tsc", | | | "fsgsbase", "xsave", "vmx", "erms", | | | "xtpr", "cmov", "smep", "pcid", "est", | | | "pat", "monitor", "smx", "pbe", "lm", | | | "msr", "nx", "fxsr", "tm", "sse4.1", | | | "pae", "sse4.2", "pclmuldq", "acpi", | | | "fma", "vme", "popcnt", "mmx", | | | "osxsave", "cx8", "mce", "de", "rdtscp", | | | "ht", "dca", "lahf_lm", "abm", "pdcm", | | | "mca", "pdpe1gb", "apic", "sse", "f16c", | | | "pse", "ds", "invtsc", "pni", "tm2", | | | "avx2", "aes", "sse2", "ss", "ds_cpl", | | | "arat", "bmi1", "bmi2", "ssse3", "fpu", | | | "cx16", "pse36", "mtrr", "movbe", | | | "rdrand", "cmt", "x2apic"] | | cpu_info_model | Haswell-noTSX | | cpu_info_topology_cells | 2 | | cpu_info_topology_cores | 6 | | cpu_info_topology_sockets | 1 | | cpu_info_topology_threads | 1 | | cpu_info_vendor | Intel | | current_workload | 0 | | disk_available_least | 121 | | free_disk_gb | 126 | | free_ram_mb | 27554 | | host_ip | 192.168.26.11 | | hypervisor_hostname | compute-0.localdomain | | hypervisor_type | QEMU | | hypervisor_version | 2009000 | | id | 2 | | local_gb | 186 | | local_gb_used | 60 | | memory_mb | 32674 | | memory_mb_used | 5120 | | running_vms | 2 | | service_disabled_reason | None | | service_host | compute-0.localdomain | | service_id | 16 | | state | up | | status | enabled | | vcpus | 6 | | vcpus_used | 4 | +---------------------------+------------------------------------------+ (overcloud) [stack@undercloud-10 ~]$ nova list +--------------------------------------+-------+--------+------------+-------------+---------------------------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+-------+--------+------------+-------------+---------------------------+ | 9762beab-9f4d-49fd-ae15-96034731401d | rhel1 | ACTIVE | - | Running | provider-183=10.65.219.66 | | 87c5acdb-9623-4d1a-99e1-38b6e65fc9ee | rhel2 | ACTIVE | - | Running | provider-183=10.65.219.72 | | 480fb05b-d15d-4aa7-8bb8-861ad3e7ec77 | rhel3 | ACTIVE | - | Running | provider-183=10.65.219.67 | +--------------------------------------+-------+--------+------------+-------------+---------------------------+ Live migrated from horizon: (overcloud) [stack@undercloud-10 ~]$ nova hypervisor-show 1 +---------------------------+------------------------------------------+ | Property | Value | +---------------------------+------------------------------------------+ | cpu_info_arch | x86_64 | | cpu_info_features | ["pge", "avx", "xsaveopt", "clflush", | | | "sep", "syscall", "tsc_adjust", "tsc- | | | deadline", "dtes64", "invpcid", "tsc", | | | "fsgsbase", "xsave", "vmx", "erms", | | | "xtpr", "cmov", "smep", "pcid", "est", | | | "pat", "monitor", "smx", "pbe", "lm", | | | "msr", "nx", "fxsr", "tm", "sse4.1", | | | "pae", "sse4.2", "pclmuldq", "acpi", | | | "fma", "vme", "popcnt", "mmx", | | | "osxsave", "cx8", "mce", "de", "rdtscp", | | | "ht", "dca", "lahf_lm", "abm", "pdcm", | | | "mca", "pdpe1gb", "apic", "sse", "f16c", | | | "pse", "ds", "invtsc", "pni", "tm2", | | | "avx2", "aes", "sse2", "ss", "ds_cpl", | | | "arat", "bmi1", "bmi2", "ssse3", "fpu", | | | "cx16", "pse36", "mtrr", "movbe", | | | "rdrand", "cmt", "x2apic"] | | cpu_info_model | Haswell-noTSX | | cpu_info_topology_cells | 2 | | cpu_info_topology_cores | 6 | | cpu_info_topology_sockets | 1 | | cpu_info_topology_threads | 1 | | cpu_info_vendor | Intel | | current_workload | 0 | | disk_available_least | 181 | | free_disk_gb | 186 | | free_ram_mb | 31650 | | host_ip | 192.168.26.19 | | hypervisor_hostname | compute-1.localdomain | | hypervisor_type | QEMU | | hypervisor_version | 2009000 | | id | 1 | | local_gb | 186 | | local_gb_used | 0 | | memory_mb | 32674 | | memory_mb_used | 1024 | | running_vms | 0 | | service_disabled_reason | None | | service_host | compute-1.localdomain | | service_id | 15 | | state | up | | status | enabled | | vcpus | 6 | | vcpus_used | 0 | +---------------------------+------------------------------------------+ (overcloud) [stack@undercloud-10 ~]$ nova hypervisor-show 2 +---------------------------+------------------------------------------+ | Property | Value | +---------------------------+------------------------------------------+ | cpu_info_arch | x86_64 | | cpu_info_features | ["pge", "avx", "xsaveopt", "clflush", | | | "sep", "syscall", "tsc_adjust", "tsc- | | | deadline", "dtes64", "invpcid", "tsc", | | | "fsgsbase", "xsave", "vmx", "erms", | | | "xtpr", "cmov", "smep", "pcid", "est", | | | "pat", "monitor", "smx", "pbe", "lm", | | | "msr", "nx", "fxsr", "tm", "sse4.1", | | | "pae", "sse4.2", "pclmuldq", "acpi", | | | "fma", "vme", "popcnt", "mmx", | | | "osxsave", "cx8", "mce", "de", "rdtscp", | | | "ht", "dca", "lahf_lm", "abm", "pdcm", | | | "mca", "pdpe1gb", "apic", "sse", "f16c", | | | "pse", "ds", "invtsc", "pni", "tm2", | | | "avx2", "aes", "sse2", "ss", "ds_cpl", | | | "arat", "bmi1", "bmi2", "ssse3", "fpu", | | | "cx16", "pse36", "mtrr", "movbe", | | | "rdrand", "cmt", "x2apic"] | | cpu_info_model | Haswell-noTSX | | cpu_info_topology_cells | 2 | | cpu_info_topology_cores | 6 | | cpu_info_topology_sockets | 1 | | cpu_info_topology_threads | 1 | | cpu_info_vendor | Intel | | current_workload | 0 | | disk_available_least | 91 | | free_disk_gb | 126 | | free_ram_mb | 27554 | | host_ip | 192.168.26.11 | | hypervisor_hostname | compute-0.localdomain | | hypervisor_type | QEMU | | hypervisor_version | 2009000 | | id | 2 | | local_gb | 186 | | local_gb_used | 4 | | memory_mb | 32674 | | memory_mb_used | 30597 | <<<<<<<<<<<<<<---------------- | running_vms | 2 | | service_disabled_reason | None | | service_host | compute-0.localdomain | | service_id | 16 | | state | up | | status | enabled | | vcpus | 6 | | vcpus_used | 6 | +---------------------------+------------------------------------------+ (overcloud) [stack@undercloud-10 ~]$ Could you provide the version of the openstack-nova package that this is occurring with? I apologize for the confusion. The issue should have not occurred in the first place while live-migration, but it did because if the destination host is specified, nova bypasses the scheduler. Hence we ended up with instances using same pcpus. Although i found it odd that tracker resource failed with this traceback, then the ram resource usage reported immediately change to almost max. I am not sure if this value is calculated from the nova-compute or from server. If you think we should consider any improvement to handle such situations, then we can take this as a low priority RFE, like: i. If resource_tracker fails, the reported resource usage can remain what it was earlier. ii. Or if possible can resource_tracker still send right report on what ever resource it was able to collect successfully? If this is not possible / feasible / logical , i am okay to have this bz closed. (In reply to Jaison Raju from comment #4) > I apologize for the confusion. > > The issue should have not occurred in the first place while live-migration, > but it did because if the destination host is specified, nova bypasses the > scheduler. > Hence we ended up with instances using same pcpus. Aha, yes, this would result in that. The resolution to resolve this is tracked in bz 1222414. > Although i found it odd that tracker resource failed with this traceback, > then the ram resource usage reported immediately change to almost max. > I am not sure if this value is calculated from the nova-compute or from > server. > > If you think we should consider any improvement to handle such situations, > then we can take this as a low priority RFE, like: > i. If resource_tracker fails, the reported resource usage can remain what it > was earlier. > ii. Or if possible can resource_tracker still send right report on what ever > resource it was able to collect successfully? > > If this is not possible / feasible / logical , i am okay to have this bz > closed. Given the existing issues we have with live migration of pinned instances and force migrations in general, I'd be reluctant to add yet another similar RFE for this. We can use 1222414 to track it instead. I'm going to close this as a duplicate of that issue. *** This bug has been marked as a duplicate of bug 1222414 *** |