Description of problem: When instance using dedicated cpu_policy are migrated to a compute node with instances already using pcpu cores used by the migrating instance, nova-scheduler still migrates instance. The problem is that, in such a compute node where instance have overlapped pcpus , the periodic tracker thread fails & horizon show almost all RAM being used. That is RAM (used) = RAM (total) - 2G . Version-Release number of selected component (if applicable): RHOS10 How reproducible: Yes Steps to Reproduce: 1. 2. 3. Actual results: Expected results: RAM usage should be correctly reported by tracker thread Additional info:
This is from my test environment. I can provide access if required for investigation. $ nova hypervisor-list +----+-----------------------+-------+---------+ | ID | Hypervisor hostname | State | Status | +----+-----------------------+-------+---------+ | 1 | compute-1.localdomain | up | enabled | | 2 | compute-0.localdomain | up | enabled | +----+-----------------------+-------+---------+ (overcloud) [stack@undercloud-10 ~]$ nova hypervisor-show 1 +---------------------------+------------------------------------------+ | Property | Value | +---------------------------+------------------------------------------+ | cpu_info_arch | x86_64 | | cpu_info_features | ["pge", "avx", "xsaveopt", "clflush", | | | "sep", "syscall", "tsc_adjust", "tsc- | | | deadline", "dtes64", "invpcid", "tsc", | | | "fsgsbase", "xsave", "vmx", "erms", | | | "xtpr", "cmov", "smep", "pcid", "est", | | | "pat", "monitor", "smx", "pbe", "lm", | | | "msr", "nx", "fxsr", "tm", "sse4.1", | | | "pae", "sse4.2", "pclmuldq", "acpi", | | | "fma", "vme", "popcnt", "mmx", | | | "osxsave", "cx8", "mce", "de", "rdtscp", | | | "ht", "dca", "lahf_lm", "abm", "pdcm", | | | "mca", "pdpe1gb", "apic", "sse", "f16c", | | | "pse", "ds", "invtsc", "pni", "tm2", | | | "avx2", "aes", "sse2", "ss", "ds_cpl", | | | "arat", "bmi1", "bmi2", "ssse3", "fpu", | | | "cx16", "pse36", "mtrr", "movbe", | | | "rdrand", "cmt", "x2apic"] | | cpu_info_model | Haswell-noTSX | | cpu_info_topology_cells | 2 | | cpu_info_topology_cores | 6 | | cpu_info_topology_sockets | 1 | | cpu_info_topology_threads | 1 | | cpu_info_vendor | Intel | | current_workload | 0 | | disk_available_least | 151 | | free_disk_gb | 156 | | free_ram_mb | 29602 | | host_ip | 192.168.26.19 | | hypervisor_hostname | compute-1.localdomain | | hypervisor_type | QEMU | | hypervisor_version | 2009000 | | id | 1 | | local_gb | 186 | | local_gb_used | 30 | | memory_mb | 32674 | | memory_mb_used | 3072 | | running_vms | 1 | | service_disabled_reason | None | | service_host | compute-1.localdomain | | service_id | 15 | | state | up | | status | enabled | | vcpus | 6 | | vcpus_used | 2 | +---------------------------+------------------------------------------+ (overcloud) [stack@undercloud-10 ~]$ nova hypervisor-show 2 +---------------------------+------------------------------------------+ | Property | Value | +---------------------------+------------------------------------------+ | cpu_info_arch | x86_64 | | cpu_info_features | ["pge", "avx", "xsaveopt", "clflush", | | | "sep", "syscall", "tsc_adjust", "tsc- | | | deadline", "dtes64", "invpcid", "tsc", | | | "fsgsbase", "xsave", "vmx", "erms", | | | "xtpr", "cmov", "smep", "pcid", "est", | | | "pat", "monitor", "smx", "pbe", "lm", | | | "msr", "nx", "fxsr", "tm", "sse4.1", | | | "pae", "sse4.2", "pclmuldq", "acpi", | | | "fma", "vme", "popcnt", "mmx", | | | "osxsave", "cx8", "mce", "de", "rdtscp", | | | "ht", "dca", "lahf_lm", "abm", "pdcm", | | | "mca", "pdpe1gb", "apic", "sse", "f16c", | | | "pse", "ds", "invtsc", "pni", "tm2", | | | "avx2", "aes", "sse2", "ss", "ds_cpl", | | | "arat", "bmi1", "bmi2", "ssse3", "fpu", | | | "cx16", "pse36", "mtrr", "movbe", | | | "rdrand", "cmt", "x2apic"] | | cpu_info_model | Haswell-noTSX | | cpu_info_topology_cells | 2 | | cpu_info_topology_cores | 6 | | cpu_info_topology_sockets | 1 | | cpu_info_topology_threads | 1 | | cpu_info_vendor | Intel | | current_workload | 0 | | disk_available_least | 121 | | free_disk_gb | 126 | | free_ram_mb | 27554 | | host_ip | 192.168.26.11 | | hypervisor_hostname | compute-0.localdomain | | hypervisor_type | QEMU | | hypervisor_version | 2009000 | | id | 2 | | local_gb | 186 | | local_gb_used | 60 | | memory_mb | 32674 | | memory_mb_used | 5120 | | running_vms | 2 | | service_disabled_reason | None | | service_host | compute-0.localdomain | | service_id | 16 | | state | up | | status | enabled | | vcpus | 6 | | vcpus_used | 4 | +---------------------------+------------------------------------------+ (overcloud) [stack@undercloud-10 ~]$ nova list +--------------------------------------+-------+--------+------------+-------------+---------------------------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+-------+--------+------------+-------------+---------------------------+ | 9762beab-9f4d-49fd-ae15-96034731401d | rhel1 | ACTIVE | - | Running | provider-183=10.65.219.66 | | 87c5acdb-9623-4d1a-99e1-38b6e65fc9ee | rhel2 | ACTIVE | - | Running | provider-183=10.65.219.72 | | 480fb05b-d15d-4aa7-8bb8-861ad3e7ec77 | rhel3 | ACTIVE | - | Running | provider-183=10.65.219.67 | +--------------------------------------+-------+--------+------------+-------------+---------------------------+ Live migrated from horizon: (overcloud) [stack@undercloud-10 ~]$ nova hypervisor-show 1 +---------------------------+------------------------------------------+ | Property | Value | +---------------------------+------------------------------------------+ | cpu_info_arch | x86_64 | | cpu_info_features | ["pge", "avx", "xsaveopt", "clflush", | | | "sep", "syscall", "tsc_adjust", "tsc- | | | deadline", "dtes64", "invpcid", "tsc", | | | "fsgsbase", "xsave", "vmx", "erms", | | | "xtpr", "cmov", "smep", "pcid", "est", | | | "pat", "monitor", "smx", "pbe", "lm", | | | "msr", "nx", "fxsr", "tm", "sse4.1", | | | "pae", "sse4.2", "pclmuldq", "acpi", | | | "fma", "vme", "popcnt", "mmx", | | | "osxsave", "cx8", "mce", "de", "rdtscp", | | | "ht", "dca", "lahf_lm", "abm", "pdcm", | | | "mca", "pdpe1gb", "apic", "sse", "f16c", | | | "pse", "ds", "invtsc", "pni", "tm2", | | | "avx2", "aes", "sse2", "ss", "ds_cpl", | | | "arat", "bmi1", "bmi2", "ssse3", "fpu", | | | "cx16", "pse36", "mtrr", "movbe", | | | "rdrand", "cmt", "x2apic"] | | cpu_info_model | Haswell-noTSX | | cpu_info_topology_cells | 2 | | cpu_info_topology_cores | 6 | | cpu_info_topology_sockets | 1 | | cpu_info_topology_threads | 1 | | cpu_info_vendor | Intel | | current_workload | 0 | | disk_available_least | 181 | | free_disk_gb | 186 | | free_ram_mb | 31650 | | host_ip | 192.168.26.19 | | hypervisor_hostname | compute-1.localdomain | | hypervisor_type | QEMU | | hypervisor_version | 2009000 | | id | 1 | | local_gb | 186 | | local_gb_used | 0 | | memory_mb | 32674 | | memory_mb_used | 1024 | | running_vms | 0 | | service_disabled_reason | None | | service_host | compute-1.localdomain | | service_id | 15 | | state | up | | status | enabled | | vcpus | 6 | | vcpus_used | 0 | +---------------------------+------------------------------------------+ (overcloud) [stack@undercloud-10 ~]$ nova hypervisor-show 2 +---------------------------+------------------------------------------+ | Property | Value | +---------------------------+------------------------------------------+ | cpu_info_arch | x86_64 | | cpu_info_features | ["pge", "avx", "xsaveopt", "clflush", | | | "sep", "syscall", "tsc_adjust", "tsc- | | | deadline", "dtes64", "invpcid", "tsc", | | | "fsgsbase", "xsave", "vmx", "erms", | | | "xtpr", "cmov", "smep", "pcid", "est", | | | "pat", "monitor", "smx", "pbe", "lm", | | | "msr", "nx", "fxsr", "tm", "sse4.1", | | | "pae", "sse4.2", "pclmuldq", "acpi", | | | "fma", "vme", "popcnt", "mmx", | | | "osxsave", "cx8", "mce", "de", "rdtscp", | | | "ht", "dca", "lahf_lm", "abm", "pdcm", | | | "mca", "pdpe1gb", "apic", "sse", "f16c", | | | "pse", "ds", "invtsc", "pni", "tm2", | | | "avx2", "aes", "sse2", "ss", "ds_cpl", | | | "arat", "bmi1", "bmi2", "ssse3", "fpu", | | | "cx16", "pse36", "mtrr", "movbe", | | | "rdrand", "cmt", "x2apic"] | | cpu_info_model | Haswell-noTSX | | cpu_info_topology_cells | 2 | | cpu_info_topology_cores | 6 | | cpu_info_topology_sockets | 1 | | cpu_info_topology_threads | 1 | | cpu_info_vendor | Intel | | current_workload | 0 | | disk_available_least | 91 | | free_disk_gb | 126 | | free_ram_mb | 27554 | | host_ip | 192.168.26.11 | | hypervisor_hostname | compute-0.localdomain | | hypervisor_type | QEMU | | hypervisor_version | 2009000 | | id | 2 | | local_gb | 186 | | local_gb_used | 4 | | memory_mb | 32674 | | memory_mb_used | 30597 | <<<<<<<<<<<<<<---------------- | running_vms | 2 | | service_disabled_reason | None | | service_host | compute-0.localdomain | | service_id | 16 | | state | up | | status | enabled | | vcpus | 6 | | vcpus_used | 6 | +---------------------------+------------------------------------------+ (overcloud) [stack@undercloud-10 ~]$
Could you provide the version of the openstack-nova package that this is occurring with?
I apologize for the confusion. The issue should have not occurred in the first place while live-migration, but it did because if the destination host is specified, nova bypasses the scheduler. Hence we ended up with instances using same pcpus. Although i found it odd that tracker resource failed with this traceback, then the ram resource usage reported immediately change to almost max. I am not sure if this value is calculated from the nova-compute or from server. If you think we should consider any improvement to handle such situations, then we can take this as a low priority RFE, like: i. If resource_tracker fails, the reported resource usage can remain what it was earlier. ii. Or if possible can resource_tracker still send right report on what ever resource it was able to collect successfully? If this is not possible / feasible / logical , i am okay to have this bz closed.
(In reply to Jaison Raju from comment #4) > I apologize for the confusion. > > The issue should have not occurred in the first place while live-migration, > but it did because if the destination host is specified, nova bypasses the > scheduler. > Hence we ended up with instances using same pcpus. Aha, yes, this would result in that. The resolution to resolve this is tracked in bz 1222414. > Although i found it odd that tracker resource failed with this traceback, > then the ram resource usage reported immediately change to almost max. > I am not sure if this value is calculated from the nova-compute or from > server. > > If you think we should consider any improvement to handle such situations, > then we can take this as a low priority RFE, like: > i. If resource_tracker fails, the reported resource usage can remain what it > was earlier. > ii. Or if possible can resource_tracker still send right report on what ever > resource it was able to collect successfully? > > If this is not possible / feasible / logical , i am okay to have this bz > closed. Given the existing issues we have with live migration of pinned instances and force migrations in general, I'd be reluctant to add yet another similar RFE for this. We can use 1222414 to track it instead. I'm going to close this as a duplicate of that issue. *** This bug has been marked as a duplicate of bug 1222414 ***