Bug 1500157
| Summary: | compute_capabilities_filter does not filter out soft deleted compute_nodes & ends up choosing soft deleted nodes with wrong profiles | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Jaison Raju <jraju> | ||||
| Component: | openstack-nova | Assignee: | Sylvain Bauza <sbauza> | ||||
| Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | Joe H. Rahme <jhakimra> | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | medium | ||||||
| Version: | 10.0 (Newton) | CC: | berrange, dasmith, eglynn, jraju, kchamart, lyarwood, mbooth, sbauza, sferdjao, sgordon, srevivo, vromanso | ||||
| Target Milestone: | async | Keywords: | Triaged, ZStream | ||||
| Target Release: | 10.0 (Newton) | ||||||
| Hardware: | All | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2018-03-26 08:47:29 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
|
Description
Jaison Raju
2017-10-10 05:39:45 UTC
Created attachment 1336613 [details]
cmds / logs
By design, filters are not responsible for verifying the liveness of the host they get in parameter (even the ComputeFilter verifies the *service* liveness, not the host itself). Rather, when we build a list of nodes to verify, we call the DB and ask it (by the Newton timeframe) to return us a list of compute node records here : https://github.com/openstack/nova/blob/e30b75097840019c38e0619e70924ddc9f9487a0/nova/scheduler/host_manager.py#L588 (Once we get that list of nodes, we later call for each of them each filter one-by-one) If you look into the object internals about how we SQL query the list of records that are compute_nodes entries, we turn into verifying the deleted state and by default, we don't ask to return the list of soft-deleted entries. https://github.com/openstack/nova/blob/stable/newton/nova/db/sqlalchemy/api.py#L589-L590 Since we build a context object that is having by default read_deleted=no, in theory we should only return a list of non-soft-deleted nodes Now, looking at your attachment, I can see two records for the same hypervisor_hostname value : *************************** 3. row *************************** created_at: 2017-02-17 06:01:29 updated_at: 2017-09-24 01:41:48 deleted_at: 2017-09-24 01:41:51 id: 3 service_id: NULL vcpus: 0 memory_mb: 0 local_gb: 0 vcpus_used: 2 memory_mb_used: 8192 local_gb_used: 80 hypervisor_type: ironic hypervisor_version: 1 cpu_info: disk_available_least: -278 free_ram_mb: -8192 free_disk_gb: -80 current_workload: 0 running_vms: 3 hypervisor_hostname: 561a3dea-aed9-472f-bc5c-eb55f2aab183 deleted: 3 host_ip: 10.65.176.41 supported_instances: [["x86_64", "baremetal", "hvm"]] pci_stats: {"nova_object.version": "1.1", "nova_object.changes": ["objects"], "nova_object.name": "PciDevicePoolList", "nova_object.data": {"objects": []}, "nova_object.namespace": "nova"} metrics: [] extra_resources: NULL stats: {"profile": "ceph-storage", "cpu_arch": "x86_64", "num_proj_1ce9d46a310d4500987afc96ded596e3": "3", "io_workload": "3", "num_instances": "3", "num_vm_building": "3", "num_task_None": "3", "boot_option": "local", "num_os_type_None": "3"} numa_topology: NULL host: ibm-x3630m4-5.gsslab.pnq.redhat.com ram_allocation_ratio: 1 cpu_allocation_ratio: 0 uuid: b8685471-e7eb-4ba0-98d7-44539828bfeb disk_allocation_ratio: 0 *************************** 10. row *************************** created_at: 2017-09-24 01:44:53 updated_at: 2017-10-02 17:38:02 deleted_at: NULL id: 10 service_id: NULL vcpus: 12 memory_mb: 16384 local_gb: 278 vcpus_used: 0 memory_mb_used: 0 local_gb_used: 0 hypervisor_type: ironic hypervisor_version: 1 cpu_info: disk_available_least: 278 free_ram_mb: 16384 free_disk_gb: 278 current_workload: 0 running_vms: 0 hypervisor_hostname: 561a3dea-aed9-472f-bc5c-eb55f2aab183 deleted: 0 host_ip: 10.65.176.41 supported_instances: [["x86_64", "baremetal", "hvm"]] pci_stats: {"nova_object.version": "1.1", "nova_object.changes": ["objects"], "nova_object.name": "PciDevicePoolList", "nova_object.data": {"objects": []}, "nova_object.namespace": "nova"} metrics: [] extra_resources: NULL stats: {"profile": "compute", "cpu_arch": "x86_64", "cpu_hugepages": "true", "cpu_txt": "true", "cpu_vt": "true", "boot_option": "local", "cpu_aes": "true", "cpu_hugepages_1g": "true"} numa_topology: NULL host: ibm-x3630m4-5.gsslab.pnq.redhat.com ram_allocation_ratio: 1 cpu_allocation_ratio: 0 uuid: c7829239-2100-40e9-830b-2efb88145b8a disk_allocation_ratio: 0 That is fine because we have a UniqueKey on (host, hypervisor_hostname, deleted) which means you can have twice or more the same node if only one of them is active. Accordingly, that means that the HostState should be updated with the new compute node resources, including the stats. Could you please try something like that : - create a new Ironic node - verify you can see it in the compute_nodes table - modify the Ironic node like you did - verify it creates a separate compute_nodes entry (and deleting the previous one) - try to see if the scheduler filter doesn't work - if so, trying to just run again the scheduler and see again if that fixes ? Thanks, -Sylvain tested this via tripleo-quickstart on ocata rdo , but couldnt reproduce the issue.
The db entry was immediately changed once i ran ironic node update.
No new entries were created , existing ones were updated.
MariaDB [nova]> select hypervisor_hostname,stats from compute_nodes\G;
*************************** 1. row ***************************
hypervisor_hostname: 4d640cad-270c-4baa-b218-b7b1ffc78023
stats: {"profile": "compute", "cpu_arch": "x86_64", "boot_option": "local"}
*************************** 2. row ***************************
hypervisor_hostname: 22a2c202-6c18-468f-a2a7-043a9170dc2f
stats: {"profile": "control", "cpu_arch": "x86_64", "boot_option": "local"}
2 rows in set (0.00 sec)
I will test this again on the original environment which was RHOS11.
If i am not able to reproduce the same behavior using ironic node-update,
i think something during the repeated stack-delete & create may have brought the
compute_nodes table to this state.
I tested this again on RHOS11 where the issue was initially noticed. (to work around, i deleted all entries in compute_nodes earlier) The behavior seen is same as i found in tripleo-quickstart env. After ironic-node update , the compute_nodes entry is updated with new profile. Although this does not create new entries. It seems i was wrong on how these entries were created & i still dont know how they were. I will try to force this behavior by adding similar entries before the actual entry with deleted = $id . Okay, lemme know the outcomes of the tests and if you can reproduce that, but I suppose a problem with the Ironic virt driver providing the list of nodes if you can see again the issue. (In reply to Sylvain Bauza from comment #6) > Okay, lemme know the outcomes of the tests and if you can reproduce that, > but I suppose a problem with the Ironic virt driver providing the list of > nodes if you can see again the issue. I am facing some different issues while testing this. I just took all insert commands to be run in compute_nodes & i deleted all entries. i created a duplicate entry insert statement for one of the computes with a much older id (deleted) & with a different profile. But while testing the scheduler immediately fails in retry itself. I cant make out why. I am also not sure how multiple entries are created in repeated redeployment , but i have noticed this multiple times. Please suggest what we could do next. Honestly, given we can't really reproduce the issue, I don't know how to help here. The only possible way would be to try to look at the DB and see whether an Ironic node modification was providing a new compute node record or just modifying the existing one, but given it's not possible to verify that atm, closing the bug now. Please reopen it if you are able to reproduce the problem so we could be discussing after. |