Bug 1533196
Summary: | nova-scheduler reports dead compute nodes but nova-compute is enabled and up | ||
---|---|---|---|
Product: | [Community] RDO | Reporter: | David Manchado <dmanchad> |
Component: | openstack-nova | Assignee: | Eoghan Glynn <eglynn> |
Status: | CLOSED UPSTREAM | QA Contact: | |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | Ocata | CC: | kforde, mwitt, whayutin |
Target Milestone: | --- | ||
Target Release: | trunk | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2018-01-12 00:35:04 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
David Manchado
2018-01-10 17:18:34 UTC
@lyarwood @eglynn this is reported in RDO Cloud, how do you want to handle it? If it's not a packaging issue it should be moved upstream, but we don't want it to get lost there. https://logs.rdoproject.org/openstack-periodic/periodic-tripleo-ci-centos-7-ovb-1ctlr_1comp-featureset020-pike/3a9d14b/console.txt.gz https://logs.rdoproject.org/openstack-periodic/periodic-tripleo-ci-centos-7-ovb-1ctlr_1comp_1ceph-featureset024-pike/1ccf4eb/console.txt.gz It's difficult to guess what's going wrong without being able to see the nova-* service logs at the time the failure occurs. I'll try to make a guess based on the info you've given. You mentioned the cluster was upgraded to Ocata three weeks ago and this problem has been happening for one week. Have there been any changes in the compute hosts? That is, have any been swapped in or out of the cluster? Starting in Ocata, there's a concept in Nova called Cells v2 and at deployment time, there are a few nova-manage commands that are required [1] to setup the mappings needed for API-related services like the scheduler to find compute hosts that are in a cell. If you make any changes to compute hosts, namely adding new ones, you have to run 'nova-manage cell_v2 discover_hosts' in order to make them visible to the API and allow scheduling to them. The cluster will have 3 databases: nova_cell0, nova_api, and nova. In the nova_api database, you should see all of your compute hosts in the host_mappings table. If any are missing, you need to run the discover_hosts command and then you should see them appear in host_mappings. In the nova_api database, you should see two cells in the cell_mappings table: one called 'cell0' (for instances that failed to schedule) containing its database connection and another probably called 'cell1' containing its database and message queue connections. If all looks fine there, we'll need to take a look at the nova-* service logs for the failure to dig deeper. [1] https://docs.openstack.org/nova/latest/user/cells.html#upgrade-minimal Since it's not RDO packaging issue, moved upstream https://bugs.launchpad.net/nova/+bug/1742827 |