Bug 1506038 - openstack-ironic: errors: Only 14 nodes are exposed to Nova of 15 requests.
Summary: openstack-ironic: errors: Only 14 nodes are exposed to Nova of 15 requests.
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-common
Version: 12.0 (Pike)
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: z3
: 12.0 (Pike)
Assignee: Dmitry Tantsur
QA Contact: mlammon
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-10-24 21:08 UTC by Alexander Chuzhoy
Modified: 2023-10-06 17:40 UTC (History)
9 users (show)

Fixed In Version: openstack-tripleo-common-7.6.13-1.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-08-20 12:58:30 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 534325 0 None MERGED Retry the check_default_nodes_count workflow for 2 minutes 2020-07-12 05:22:54 UTC
Red Hat Issue Tracker OSP-17099 0 None None None 2022-07-09 11:11:35 UTC
Red Hat Product Errata RHSA-2018:2331 0 None None None 2018-08-20 12:59:12 UTC

Description Alexander Chuzhoy 2017-10-24 21:08:06 UTC
Environment:
puppet-ironic-11.3.1-0.20171006212724.56f526a.el7ost.noarch
instack-undercloud-7.4.2-0.20171010064304.el7ost.noarch
openstack-ironic-api-9.1.2-0.20171019051035.f0b0521.el7ost.noarch
python-ironicclient-1.17.0-0.20170906171257.cdff7a0.el7ost.noarch
python-ironic-lib-2.10.0-0.20170906171416.1fa0a5f.el7ost.noarch
openstack-ironic-inspector-6.0.1-0.20170920142417.77e2b1a.el7ost.noarch
openstack-ironic-common-9.1.2-0.20171019051035.f0b0521.el7ost.noarch
python-ironic-inspector-client-2.1.0-0.20170915002324.bdcab9f.el7ost.noarch
openstack-ironic-conductor-9.1.2-0.20171019051035.f0b0521.el7ost.noarch


Steps to reproduce:
15 ironic nodes in ironic database.
Removed previous deployment and started a new deployment that includes all.

Result:
Started Mistral Workflow tripleo.validations.v1.check_pre_deployment_validations. Execution ID: bc9968b1-5faf-40a9-bc79-c105450d7a3d
Waiting for messages on queue '14be2eab-8e8d-4b0e-b8a5-8f231b9a897f' with no timeout.
{u'errors': [u'Only 14 nodes are exposed to Nova of 15 requests. Check that enough nodes are in "available" state with maintenance mode off.'], u'result': {u'enough_nodes': False, u'statisti
cs': {u'count': 14, u'vcpus_used': 0, u'local_gb_used': 0, u'manager': {u'api': {u'server_groups': None, u'keypairs': None, u'servers': None, u'server_external_events': None, u'server_migrat
ions': None, u'agents': None, u'instance_action': None, u'glance': None, u'hypervisor_stats': None, u'virtual_interfaces': None, u'flavors': None, u'availability_zones': None, u'user_id': No
ne, u'cloudpipe': None, u'os_cache': False, u'quotas': None, u'migrations': None, u'usage': None, u'logger': None, u'project_id': None, u'neutron': None, u'quota_classes': None, u'project_na
me': None, u'aggregates': None, u'flavor_access': None, u'services': None, u'list_extensions': None, u'limits': None, u'hypervisors': None, u'cells': None, u'versions': None, u'client': None, u'hosts': None, u'volumes': None, u'assisted_volume_snapshots': None, u'certs': None}}, u'x_openstack_request_ids': [u'req-dce029a2-9b61-4a98-8508-70dbe6f9bf37'], u'memory_mb': 155648, u'current_workload': 0, u'vcpus': 58, u'running_vms': 0, u'free_disk_gb': 506, u'disk_available_least': 506, u'_info': {u'count': 14, u'vcpus_used': 0, u'local_gb_used': 0, u'memory_mb': 155648, u'current_workload': 0, u'vcpus': 58, u'running_vms': 0, u'free_disk_gb': 506, u'disk_available_least': 506, u'local_gb': 506, u'free_ram_mb': 155648, u'memory_mb_used': 0}, u'local_gb': 506, u'free_ram_mb': 155648, u'memory_mb_used': 0, u'_loaded': True}, u'requested_count': 4, u'available_count': 15}, u'warnings': []}
ERRORS
[u'Only 14 nodes are exposed to Nova of 15 requests. Check that enough nodes are in "available" state with maintenance mode off.', u'Only 14 nodes are exposed to Nova of 15 requests. Check that enough nodes are in "available" state with maintenance mode off.']
Configuration has 2 errors, fix them before proceeding. Ignoring these errors is likely to lead to a failed deploy.

(undercloud) [stack@undercloud-0 ~]$ ironic node-list
/usr/lib/python2.7/site-packages/requests/packages/urllib3/connection.py:303: SubjectAltNameWarning: Certificate for 192.168.24.2 has no `subjectAltName`, falling back to check for a `commonName` for now. This feature is being removed by major browsers and deprecated by RFC 2818. (See https://github.com/shazow/urllib3/issues/497 for details.)
  SubjectAltNameWarning
/usr/lib/python2.7/site-packages/requests/packages/urllib3/connection.py:303: SubjectAltNameWarning: Certificate for 192.168.24.2 has no `subjectAltName`, falling back to check for a `commonName` for now. This feature is being removed by major browsers and deprecated by RFC 2818. (See https://github.com/shazow/urllib3/issues/497 for details.)
  SubjectAltNameWarning
+--------------------------------------+--------------+---------------+-------------+--------------------+-------------+
| UUID                                 | Name         | Instance UUID | Power State | Provisioning State | Maintenance |
+--------------------------------------+--------------+---------------+-------------+--------------------+-------------+
| 0c3d6ccd-a7f4-4237-9659-38de7bb42928 | compute-1    | None          | power off   | available          | False       |
| 26548d77-aa06-4a28-9487-a2c25a74f532 | compute-0    | None          | power off   | available          | False       |
| e7151d16-0e2d-4f86-b7ee-a42882ac308b | ceph-1       | None          | power off   | available          | False       |
| 4ea6de31-74d9-458e-82fa-305e4c4ba7e0 | ceph-0       | None          | power off   | available          | False       |
| 722d06e2-d0b4-49ee-9d31-85624d85dfe7 | ceph-5       | None          | power off   | available          | False       |
| ac4ee02f-4a6f-4bcd-b082-667e4e695ce8 | ceph-4       | None          | power off   | available          | False       |
| e03c0e3e-533f-44b4-8905-3e64a5b273a9 | ceph-3       | None          | power off   | available          | False       |
| 47fca516-df53-4f76-88d4-511fd205bdfe | ceph-2       | None          | power off   | available          | False       |
| 20e40641-31a5-47e8-abe5-e78ceff13bf9 | compute-5    | None          | power off   | available          | False       |
| b7fa5df3-39ba-4d6a-93ab-1de4b2fc5faa | compute-4    | None          | power off   | available          | False       |
| 512dfa70-7bb1-4bec-87e8-52ae6ee8eb77 | compute-3    | None          | power off   | available          | False       |
| 1ceb8a35-9d86-4e93-8183-2d336b00ead3 | compute-2    | None          | power off   | available          | False       |
| b020d86e-40f5-4ee2-a20a-2cdde71b56bd | controller-2 | None          | power off   | available          | False       |
| b6c5ed91-0ffc-4324-bd2b-e659bcfdc0c0 | controller-1 | None          | power off   | available          | False       |
| 843f079f-d086-4880-827f-c103402040a0 | controller-0 | None          | power off   | available          | False       |
+--------------------------------------+--------------+---------------+-------------+--------------------+-------------+


[root@undercloud-0 ~]# cat /home/stack/overcloud_deploy.sh
openstack overcloud deploy --templates \
--libvirt-type kvm \
-n /home/stack/network_data.yaml \
-r /home/stack/roles_data.yaml \
-e /home/stack/templates/nodes_data.yaml \
-e  /usr/share/openstack-tripleo-heat-templates/environments/ceph-ansible/ceph-ansible.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \
-e /home/stack/virt/network/network-environment.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/ssl/enable-tls.yaml \
-e /home/stack/virt/public_vip.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/ssl/tls-endpoints-public-ip.yaml \
-e /home/stack/inject-trust-anchor-hiera.yaml \
-e /home/stack/rhos12.yaml


[root@undercloud-0 ~]# cat /home/stack/templates/nodes_data.yaml
parameter_defaults:
    ControllerCount: 3
    OvercloudControlFlavor: controller
    Compute1Count: 2
    OvercloudCompute1Flavor: compute
    Compute2Count: 2
    OvercloudCompute2Flavor: compute
    Compute3Count: 2
    OvercloudCompute3Flavor: compute
    CephStorage1Count: 2
    OvercloudCephStorage1Flavor: ceph
    CephStorage2Count: 2
    OvercloudCephStorage2Flavor: ceph
    CephStorage3Count: 2
    OvercloudCephStorage3Flavor: ceph
    NtpServer: ["clock.redhat.com","clock2.redhat.com"]




Note:
Simply retried the same deployment command without changing anything and the error didn't reproduce.

Comment 1 Alexander Chuzhoy 2017-10-26 17:18:35 UTC
Reproduced. Note the following:

(undercloud) [stack@undercloud-0 ~]$ bash overcloud_deploy.sh
Started Mistral Workflow tripleo.validations.v1.check_pre_deployment_validations. Execution ID: 0076d996-cfdf-4903-af41-a9ff241555be
Waiting for messages on queue '684d93c8-120f-4cb2-8ae2-ac1628149cdb' with no timeout.
{u'errors': [u'Only 0 nodes are exposed to Nova of 15 requests. Check that enough nodes are in "available" state with maintenance mode off.'], u'result': {u'enough_nodes': False, u'statistic
s': {u'count': 0, u'vcpus_used': 0, u'local_gb_used': 0, u'manager': {u'api': {u'server_groups': None, u'keypairs': None, u'servers': None, u'server_external_events': None, u'server_migratio
ns': None, u'agents': None, u'instance_action': None, u'glance': None, u'hypervisor_stats': None, u'virtual_interfaces': None, u'flavors': None, u'availability_zones': None, u'user_id': None, u'cloudpipe': None, u'os_cache': False, u'quotas': None, u'migrations': None, u'usage': None, u'logger': None, u'project_id': None, u'neutron': None, u'quota_classes': None, u'project_name': None, u'aggregates': None, u'flavor_access': None, u'services': None, u'list_extensions': None, u'limits': None, u'hypervisors': None, u'cells': None, u'versions': None, u'client': None, u'hosts': None, u'volumes': None, u'assisted_volume_snapshots': None, u'certs': None}}, u'x_openstack_request_ids': [u'req-e3efba33-ab0a-4d62-a1b6-a7770e349e43'], u'memory_mb': 0, u'current_workload': 0, u'vcpus': 0, u'running_vms': 0, u'free_disk_gb': 0, u'disk_available_least': 0, u'_info': {u'count': 0, u'vcpus_used': 0, u'local_gb_used': 0, u'memory_mb': 0, u'current_workload': 0, u'vcpus': 0, u'running_vms': 0, u'free_disk_gb': 0, u'disk_available_least': 0, u'local_gb': 0, u'free_ram_mb': 0, u'memory_mb_used': 0}, u'local_gb': 0, u'free_ram_mb': 0, u'memory_mb_used': 0, u'_loaded': True}, u'requested_count': 4, u'available_count': 15}, u'warnings': []}
ERRORS
[u'Only 0 nodes are exposed to Nova of 15 requests. Check that enough nodes are in "available" state with maintenance mode off.', u'Only 0 nodes are exposed to Nova of 15 requests. Check that enough nodes are in "available" state with maintenance mode off.']
Configuration has 2 errors, fix them before proceeding. Ignoring these errors is likely to lead to a failed deploy.



Then ran: ironic node-list (didn't change anything, just wanted to see if there's anything wrong - nothing wrong).



+--------------------------------------+--------------+---------------+-------------+--------------------+-------------+
| UUID                                 | Name         | Instance UUID | Power State | Provisioning State | Maintenance |
+--------------------------------------+--------------+---------------+-------------+--------------------+-------------+
| eef80645-e3ec-4119-ad25-8cf898acfddb | compute-3    | None          | power off   | available          | False       |
| a7891d59-ebc6-4410-82d7-d1b0013ba712 | compute-2    | None          | power off   | available          | False       |
| 1d7b63a8-7704-448f-9e37-d39a0636ebc0 | controller-0 | None          | power off   | available          | False       |
| 928c0492-357a-4dd5-9ee2-de4e886aa749 | compute-1    | None          | power off   | available          | False       |
| 941a100f-5cd6-458b-b2cf-05516c73e80d | compute-0    | None          | power off   | available          | False       |
| 6c5f48d6-197d-4a96-9904-d6e95209f019 | controller-2 | None          | power off   | available          | False       |
| 3e2bda73-24b3-42b3-8218-df5cc118aecd | controller-1 | None          | power off   | available          | False       |
| b26b9a92-b32a-4d3f-975d-c437dfc139da | ceph-1       | None          | power off   | available          | False       |
| c4b9aac3-d3d7-43e0-8a68-527462b3eb1c | ceph-0       | None          | power off   | available          | False       |
| b82cfdfd-6cab-48bb-a9b8-059b34ca3a2a | compute-5    | None          | power off   | available          | False       |
| bd9ad19c-8be6-49d8-bb4b-335a0ee35946 | compute-4    | None          | power off   | available          | False       |
| a85e54aa-1afc-48d4-bd4e-152d18c2d8a7 | ceph-5       | None          | power off   | available          | False       |
| fd17cb7d-8f39-47df-b784-d4162c52741c | ceph-4       | None          | power off   | available          | False       |
| 8be552cd-4529-4fdc-b777-90c8f9bb708e | ceph-3       | None          | power off   | available          | False       |
| 4e025f01-1938-4045-9011-d9c1c1de122e | ceph-2       | None          | power off   | available          | False       |
+--------------------------------------+--------------+---------------+-------------+--------------------+-------------+
(undercloud) [stack@undercloud-0 ~]$ bash overcloud_deploy.sh
Started Mistral Workflow tripleo.validations.v1.check_pre_deployment_validations. Execution ID: 19687a0b-4779-43f2-a4d4-a32c962e23f0
Waiting for messages on queue '5ce23751-a433-4f9f-9478-9e224e4330b5' with no timeout.

{u'errors': [u'Only 14 nodes are exposed to Nova of 15 requests. Check that enough nodes are in "available" state with maintenance mode off.'], u'result': {u'enough_nodes': False, u'statistics': {u'count': 14, u'vcpus_used': 0, u'local_gb_used': 0, u'manager': {u'api': {u'server_groups': None, u'keypairs': None, u'servers': None, u'server_external_events': None, u'server_migrations': None, u'agents': None, u'instance_action': None, u'glance': None, u'hypervisor_stats': None, u'virtual_interfaces': None, u'flavors': None, u'availability_zones': None, u'user_id': None, u'cloudpipe': None, u'os_cache': False, u'quotas': None, u'migrations': None, u'usage': None, u'logger': None, u'project_id': None, u'neutron': None, u'quota_classes': None, u'project_name': None, u'aggregates': None, u'flavor_access': None, u'services': None, u'list_extensions': None, u'limits': None, u'hypervisors': None, u'cells': None, u'versions': None, u'client': None, u'hosts': None, u'volumes': None, u'assisted_volume_snapshots': None, u'certs': None}}, u'x_openstack_request_ids': [u'req-49a7d40c-9a52-4617-afb8-76e06d7a3abd'], u'memory_mb': 153600, u'current_workload': 0, u'vcpus': 56, u'running_vms': 0, u'free_disk_gb': 476, u'disk_available_least': 476, u'_info': {u'count': 14, u'vcpus_used': 0, u'local_gb_used': 0, u'memory_mb': 153600, u'current_workload': 0, u'vcpus': 56, u'running_vms': 0, u'free_disk_gb': 476, u'disk_available_least': 476, u'local_gb': 476, u'free_ram_mb': 153600, u'memory_mb_used': 0}, u'local_gb': 476, u'free_ram_mb': 153600, u'memory_mb_used': 0, u'_loaded': True}, u'requested_count': 4, u'available_count': 15}, u'warnings': []}
ERRORS
[u'Only 14 nodes are exposed to Nova of 15 requests. Check that enough nodes are in "available" state with maintenance mode off.', u'Only 14 nodes are exposed to Nova of 15 requests. Check that enough nodes are in "available" state with maintenance mode off.']
Configuration has 2 errors, fix them before proceeding. Ignoring these errors is likely to lead to a failed deploy.




^ see how suddenly there are 14 nodes out of 15 (vs 0 out 15)



On third attempt it worked smoothly (with no changes on my side).

Comment 2 Dmitry Tantsur 2017-10-26 17:25:15 UTC
I'll take a look tomorrow. I suspect the validation code in tripleo-common does not wait enough - nova can take 2 minutes to sync with ironic.

Could you attach a sosreport just in case?

Comment 3 Dmitry Tantsur 2017-10-27 13:28:50 UTC
For the record: this is the line that is causing a failure. I think the whole check_nodes_count workflow https://github.com/openstack/tripleo-common/blob/6d3ce6329db353e848e453656798d1d1c0c5d387/workbooks/validations.yaml#L515 has to be retried several times with delay before failing.

I'll look into it.

Comment 4 Alexander Chuzhoy 2017-10-27 13:48:26 UTC
I noticed, that I tried it on SSL enabled undercloud and was restarting networking and keepalived close to the deployment (maybe too close).

During automated tests, where I have 60 sec pause before the deployment - didn't reproduce this.

Comment 5 Dmitry Tantsur 2017-10-27 13:49:35 UTC
Yep, 60 seconds should be enough for nova to make up its mind. Still, that's a valid bug IMO.

Comment 8 Dmitry Tantsur 2018-01-16 14:29:35 UTC
Let's assume the retry fixes it.

Comment 12 mlammon 2018-08-02 15:19:10 UTC
Installed lastest osp 12, puddle 2018-07-27.2

Environment:
openstack-tripleo-common-containers-7.6.13-3.el7ost.noarch


Deployed 15 node (3 controller, 6 ceph, 6 compute) overcloud as described in the description of the bug.   Have deleted overcloud / deployed several times and no longer see any errors as reported in the bug.  At this time we will mark it verified.   

Please re-open bug if seen again

Comment 14 errata-xmlrpc 2018-08-20 12:58:30 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2331


Note You need to log in before you can comment on or make changes to this bug.