Bug 1506038
Summary: | openstack-ironic: errors: Only 14 nodes are exposed to Nova of 15 requests. | ||
---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Alexander Chuzhoy <sasha> |
Component: | openstack-tripleo-common | Assignee: | Dmitry Tantsur <dtantsur> |
Status: | CLOSED ERRATA | QA Contact: | mlammon |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 12.0 (Pike) | CC: | bfournie, cmuresan, dtantsur, jjoyce, mburns, rhel-osp-director-maint, sasha, slinaber, srevivo |
Target Milestone: | z3 | Keywords: | Triaged, ZStream |
Target Release: | 12.0 (Pike) | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | openstack-tripleo-common-7.6.13-1.el7ost | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2018-08-20 12:58:30 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Alexander Chuzhoy
2017-10-24 21:08:06 UTC
Reproduced. Note the following: (undercloud) [stack@undercloud-0 ~]$ bash overcloud_deploy.sh Started Mistral Workflow tripleo.validations.v1.check_pre_deployment_validations. Execution ID: 0076d996-cfdf-4903-af41-a9ff241555be Waiting for messages on queue '684d93c8-120f-4cb2-8ae2-ac1628149cdb' with no timeout. {u'errors': [u'Only 0 nodes are exposed to Nova of 15 requests. Check that enough nodes are in "available" state with maintenance mode off.'], u'result': {u'enough_nodes': False, u'statistic s': {u'count': 0, u'vcpus_used': 0, u'local_gb_used': 0, u'manager': {u'api': {u'server_groups': None, u'keypairs': None, u'servers': None, u'server_external_events': None, u'server_migratio ns': None, u'agents': None, u'instance_action': None, u'glance': None, u'hypervisor_stats': None, u'virtual_interfaces': None, u'flavors': None, u'availability_zones': None, u'user_id': None, u'cloudpipe': None, u'os_cache': False, u'quotas': None, u'migrations': None, u'usage': None, u'logger': None, u'project_id': None, u'neutron': None, u'quota_classes': None, u'project_name': None, u'aggregates': None, u'flavor_access': None, u'services': None, u'list_extensions': None, u'limits': None, u'hypervisors': None, u'cells': None, u'versions': None, u'client': None, u'hosts': None, u'volumes': None, u'assisted_volume_snapshots': None, u'certs': None}}, u'x_openstack_request_ids': [u'req-e3efba33-ab0a-4d62-a1b6-a7770e349e43'], u'memory_mb': 0, u'current_workload': 0, u'vcpus': 0, u'running_vms': 0, u'free_disk_gb': 0, u'disk_available_least': 0, u'_info': {u'count': 0, u'vcpus_used': 0, u'local_gb_used': 0, u'memory_mb': 0, u'current_workload': 0, u'vcpus': 0, u'running_vms': 0, u'free_disk_gb': 0, u'disk_available_least': 0, u'local_gb': 0, u'free_ram_mb': 0, u'memory_mb_used': 0}, u'local_gb': 0, u'free_ram_mb': 0, u'memory_mb_used': 0, u'_loaded': True}, u'requested_count': 4, u'available_count': 15}, u'warnings': []} ERRORS [u'Only 0 nodes are exposed to Nova of 15 requests. Check that enough nodes are in "available" state with maintenance mode off.', u'Only 0 nodes are exposed to Nova of 15 requests. Check that enough nodes are in "available" state with maintenance mode off.'] Configuration has 2 errors, fix them before proceeding. Ignoring these errors is likely to lead to a failed deploy. Then ran: ironic node-list (didn't change anything, just wanted to see if there's anything wrong - nothing wrong). +--------------------------------------+--------------+---------------+-------------+--------------------+-------------+ | UUID | Name | Instance UUID | Power State | Provisioning State | Maintenance | +--------------------------------------+--------------+---------------+-------------+--------------------+-------------+ | eef80645-e3ec-4119-ad25-8cf898acfddb | compute-3 | None | power off | available | False | | a7891d59-ebc6-4410-82d7-d1b0013ba712 | compute-2 | None | power off | available | False | | 1d7b63a8-7704-448f-9e37-d39a0636ebc0 | controller-0 | None | power off | available | False | | 928c0492-357a-4dd5-9ee2-de4e886aa749 | compute-1 | None | power off | available | False | | 941a100f-5cd6-458b-b2cf-05516c73e80d | compute-0 | None | power off | available | False | | 6c5f48d6-197d-4a96-9904-d6e95209f019 | controller-2 | None | power off | available | False | | 3e2bda73-24b3-42b3-8218-df5cc118aecd | controller-1 | None | power off | available | False | | b26b9a92-b32a-4d3f-975d-c437dfc139da | ceph-1 | None | power off | available | False | | c4b9aac3-d3d7-43e0-8a68-527462b3eb1c | ceph-0 | None | power off | available | False | | b82cfdfd-6cab-48bb-a9b8-059b34ca3a2a | compute-5 | None | power off | available | False | | bd9ad19c-8be6-49d8-bb4b-335a0ee35946 | compute-4 | None | power off | available | False | | a85e54aa-1afc-48d4-bd4e-152d18c2d8a7 | ceph-5 | None | power off | available | False | | fd17cb7d-8f39-47df-b784-d4162c52741c | ceph-4 | None | power off | available | False | | 8be552cd-4529-4fdc-b777-90c8f9bb708e | ceph-3 | None | power off | available | False | | 4e025f01-1938-4045-9011-d9c1c1de122e | ceph-2 | None | power off | available | False | +--------------------------------------+--------------+---------------+-------------+--------------------+-------------+ (undercloud) [stack@undercloud-0 ~]$ bash overcloud_deploy.sh Started Mistral Workflow tripleo.validations.v1.check_pre_deployment_validations. Execution ID: 19687a0b-4779-43f2-a4d4-a32c962e23f0 Waiting for messages on queue '5ce23751-a433-4f9f-9478-9e224e4330b5' with no timeout. {u'errors': [u'Only 14 nodes are exposed to Nova of 15 requests. Check that enough nodes are in "available" state with maintenance mode off.'], u'result': {u'enough_nodes': False, u'statistics': {u'count': 14, u'vcpus_used': 0, u'local_gb_used': 0, u'manager': {u'api': {u'server_groups': None, u'keypairs': None, u'servers': None, u'server_external_events': None, u'server_migrations': None, u'agents': None, u'instance_action': None, u'glance': None, u'hypervisor_stats': None, u'virtual_interfaces': None, u'flavors': None, u'availability_zones': None, u'user_id': None, u'cloudpipe': None, u'os_cache': False, u'quotas': None, u'migrations': None, u'usage': None, u'logger': None, u'project_id': None, u'neutron': None, u'quota_classes': None, u'project_name': None, u'aggregates': None, u'flavor_access': None, u'services': None, u'list_extensions': None, u'limits': None, u'hypervisors': None, u'cells': None, u'versions': None, u'client': None, u'hosts': None, u'volumes': None, u'assisted_volume_snapshots': None, u'certs': None}}, u'x_openstack_request_ids': [u'req-49a7d40c-9a52-4617-afb8-76e06d7a3abd'], u'memory_mb': 153600, u'current_workload': 0, u'vcpus': 56, u'running_vms': 0, u'free_disk_gb': 476, u'disk_available_least': 476, u'_info': {u'count': 14, u'vcpus_used': 0, u'local_gb_used': 0, u'memory_mb': 153600, u'current_workload': 0, u'vcpus': 56, u'running_vms': 0, u'free_disk_gb': 476, u'disk_available_least': 476, u'local_gb': 476, u'free_ram_mb': 153600, u'memory_mb_used': 0}, u'local_gb': 476, u'free_ram_mb': 153600, u'memory_mb_used': 0, u'_loaded': True}, u'requested_count': 4, u'available_count': 15}, u'warnings': []} ERRORS [u'Only 14 nodes are exposed to Nova of 15 requests. Check that enough nodes are in "available" state with maintenance mode off.', u'Only 14 nodes are exposed to Nova of 15 requests. Check that enough nodes are in "available" state with maintenance mode off.'] Configuration has 2 errors, fix them before proceeding. Ignoring these errors is likely to lead to a failed deploy. ^ see how suddenly there are 14 nodes out of 15 (vs 0 out 15) On third attempt it worked smoothly (with no changes on my side). I'll take a look tomorrow. I suspect the validation code in tripleo-common does not wait enough - nova can take 2 minutes to sync with ironic. Could you attach a sosreport just in case? For the record: this is the line that is causing a failure. I think the whole check_nodes_count workflow https://github.com/openstack/tripleo-common/blob/6d3ce6329db353e848e453656798d1d1c0c5d387/workbooks/validations.yaml#L515 has to be retried several times with delay before failing. I'll look into it. I noticed, that I tried it on SSL enabled undercloud and was restarting networking and keepalived close to the deployment (maybe too close). During automated tests, where I have 60 sec pause before the deployment - didn't reproduce this. Yep, 60 seconds should be enough for nova to make up its mind. Still, that's a valid bug IMO. Let's assume the retry fixes it. Installed lastest osp 12, puddle 2018-07-27.2 Environment: openstack-tripleo-common-containers-7.6.13-3.el7ost.noarch Deployed 15 node (3 controller, 6 ceph, 6 compute) overcloud as described in the description of the bug. Have deleted overcloud / deployed several times and no longer see any errors as reported in the bug. At this time we will mark it verified. Please re-open bug if seen again Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:2331 |