Environment: puppet-ironic-11.3.1-0.20171006212724.56f526a.el7ost.noarch instack-undercloud-7.4.2-0.20171010064304.el7ost.noarch openstack-ironic-api-9.1.2-0.20171019051035.f0b0521.el7ost.noarch python-ironicclient-1.17.0-0.20170906171257.cdff7a0.el7ost.noarch python-ironic-lib-2.10.0-0.20170906171416.1fa0a5f.el7ost.noarch openstack-ironic-inspector-6.0.1-0.20170920142417.77e2b1a.el7ost.noarch openstack-ironic-common-9.1.2-0.20171019051035.f0b0521.el7ost.noarch python-ironic-inspector-client-2.1.0-0.20170915002324.bdcab9f.el7ost.noarch openstack-ironic-conductor-9.1.2-0.20171019051035.f0b0521.el7ost.noarch Steps to reproduce: 15 ironic nodes in ironic database. Removed previous deployment and started a new deployment that includes all. Result: Started Mistral Workflow tripleo.validations.v1.check_pre_deployment_validations. Execution ID: bc9968b1-5faf-40a9-bc79-c105450d7a3d Waiting for messages on queue '14be2eab-8e8d-4b0e-b8a5-8f231b9a897f' with no timeout. {u'errors': [u'Only 14 nodes are exposed to Nova of 15 requests. Check that enough nodes are in "available" state with maintenance mode off.'], u'result': {u'enough_nodes': False, u'statisti cs': {u'count': 14, u'vcpus_used': 0, u'local_gb_used': 0, u'manager': {u'api': {u'server_groups': None, u'keypairs': None, u'servers': None, u'server_external_events': None, u'server_migrat ions': None, u'agents': None, u'instance_action': None, u'glance': None, u'hypervisor_stats': None, u'virtual_interfaces': None, u'flavors': None, u'availability_zones': None, u'user_id': No ne, u'cloudpipe': None, u'os_cache': False, u'quotas': None, u'migrations': None, u'usage': None, u'logger': None, u'project_id': None, u'neutron': None, u'quota_classes': None, u'project_na me': None, u'aggregates': None, u'flavor_access': None, u'services': None, u'list_extensions': None, u'limits': None, u'hypervisors': None, u'cells': None, u'versions': None, u'client': None, u'hosts': None, u'volumes': None, u'assisted_volume_snapshots': None, u'certs': None}}, u'x_openstack_request_ids': [u'req-dce029a2-9b61-4a98-8508-70dbe6f9bf37'], u'memory_mb': 155648, u'current_workload': 0, u'vcpus': 58, u'running_vms': 0, u'free_disk_gb': 506, u'disk_available_least': 506, u'_info': {u'count': 14, u'vcpus_used': 0, u'local_gb_used': 0, u'memory_mb': 155648, u'current_workload': 0, u'vcpus': 58, u'running_vms': 0, u'free_disk_gb': 506, u'disk_available_least': 506, u'local_gb': 506, u'free_ram_mb': 155648, u'memory_mb_used': 0}, u'local_gb': 506, u'free_ram_mb': 155648, u'memory_mb_used': 0, u'_loaded': True}, u'requested_count': 4, u'available_count': 15}, u'warnings': []} ERRORS [u'Only 14 nodes are exposed to Nova of 15 requests. Check that enough nodes are in "available" state with maintenance mode off.', u'Only 14 nodes are exposed to Nova of 15 requests. Check that enough nodes are in "available" state with maintenance mode off.'] Configuration has 2 errors, fix them before proceeding. Ignoring these errors is likely to lead to a failed deploy. (undercloud) [stack@undercloud-0 ~]$ ironic node-list /usr/lib/python2.7/site-packages/requests/packages/urllib3/connection.py:303: SubjectAltNameWarning: Certificate for 192.168.24.2 has no `subjectAltName`, falling back to check for a `commonName` for now. This feature is being removed by major browsers and deprecated by RFC 2818. (See https://github.com/shazow/urllib3/issues/497 for details.) SubjectAltNameWarning /usr/lib/python2.7/site-packages/requests/packages/urllib3/connection.py:303: SubjectAltNameWarning: Certificate for 192.168.24.2 has no `subjectAltName`, falling back to check for a `commonName` for now. This feature is being removed by major browsers and deprecated by RFC 2818. (See https://github.com/shazow/urllib3/issues/497 for details.) SubjectAltNameWarning +--------------------------------------+--------------+---------------+-------------+--------------------+-------------+ | UUID | Name | Instance UUID | Power State | Provisioning State | Maintenance | +--------------------------------------+--------------+---------------+-------------+--------------------+-------------+ | 0c3d6ccd-a7f4-4237-9659-38de7bb42928 | compute-1 | None | power off | available | False | | 26548d77-aa06-4a28-9487-a2c25a74f532 | compute-0 | None | power off | available | False | | e7151d16-0e2d-4f86-b7ee-a42882ac308b | ceph-1 | None | power off | available | False | | 4ea6de31-74d9-458e-82fa-305e4c4ba7e0 | ceph-0 | None | power off | available | False | | 722d06e2-d0b4-49ee-9d31-85624d85dfe7 | ceph-5 | None | power off | available | False | | ac4ee02f-4a6f-4bcd-b082-667e4e695ce8 | ceph-4 | None | power off | available | False | | e03c0e3e-533f-44b4-8905-3e64a5b273a9 | ceph-3 | None | power off | available | False | | 47fca516-df53-4f76-88d4-511fd205bdfe | ceph-2 | None | power off | available | False | | 20e40641-31a5-47e8-abe5-e78ceff13bf9 | compute-5 | None | power off | available | False | | b7fa5df3-39ba-4d6a-93ab-1de4b2fc5faa | compute-4 | None | power off | available | False | | 512dfa70-7bb1-4bec-87e8-52ae6ee8eb77 | compute-3 | None | power off | available | False | | 1ceb8a35-9d86-4e93-8183-2d336b00ead3 | compute-2 | None | power off | available | False | | b020d86e-40f5-4ee2-a20a-2cdde71b56bd | controller-2 | None | power off | available | False | | b6c5ed91-0ffc-4324-bd2b-e659bcfdc0c0 | controller-1 | None | power off | available | False | | 843f079f-d086-4880-827f-c103402040a0 | controller-0 | None | power off | available | False | +--------------------------------------+--------------+---------------+-------------+--------------------+-------------+ [root@undercloud-0 ~]# cat /home/stack/overcloud_deploy.sh openstack overcloud deploy --templates \ --libvirt-type kvm \ -n /home/stack/network_data.yaml \ -r /home/stack/roles_data.yaml \ -e /home/stack/templates/nodes_data.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/ceph-ansible/ceph-ansible.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \ -e /home/stack/virt/network/network-environment.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/ssl/enable-tls.yaml \ -e /home/stack/virt/public_vip.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/ssl/tls-endpoints-public-ip.yaml \ -e /home/stack/inject-trust-anchor-hiera.yaml \ -e /home/stack/rhos12.yaml [root@undercloud-0 ~]# cat /home/stack/templates/nodes_data.yaml parameter_defaults: ControllerCount: 3 OvercloudControlFlavor: controller Compute1Count: 2 OvercloudCompute1Flavor: compute Compute2Count: 2 OvercloudCompute2Flavor: compute Compute3Count: 2 OvercloudCompute3Flavor: compute CephStorage1Count: 2 OvercloudCephStorage1Flavor: ceph CephStorage2Count: 2 OvercloudCephStorage2Flavor: ceph CephStorage3Count: 2 OvercloudCephStorage3Flavor: ceph NtpServer: ["clock.redhat.com","clock2.redhat.com"] Note: Simply retried the same deployment command without changing anything and the error didn't reproduce.
Reproduced. Note the following: (undercloud) [stack@undercloud-0 ~]$ bash overcloud_deploy.sh Started Mistral Workflow tripleo.validations.v1.check_pre_deployment_validations. Execution ID: 0076d996-cfdf-4903-af41-a9ff241555be Waiting for messages on queue '684d93c8-120f-4cb2-8ae2-ac1628149cdb' with no timeout. {u'errors': [u'Only 0 nodes are exposed to Nova of 15 requests. Check that enough nodes are in "available" state with maintenance mode off.'], u'result': {u'enough_nodes': False, u'statistic s': {u'count': 0, u'vcpus_used': 0, u'local_gb_used': 0, u'manager': {u'api': {u'server_groups': None, u'keypairs': None, u'servers': None, u'server_external_events': None, u'server_migratio ns': None, u'agents': None, u'instance_action': None, u'glance': None, u'hypervisor_stats': None, u'virtual_interfaces': None, u'flavors': None, u'availability_zones': None, u'user_id': None, u'cloudpipe': None, u'os_cache': False, u'quotas': None, u'migrations': None, u'usage': None, u'logger': None, u'project_id': None, u'neutron': None, u'quota_classes': None, u'project_name': None, u'aggregates': None, u'flavor_access': None, u'services': None, u'list_extensions': None, u'limits': None, u'hypervisors': None, u'cells': None, u'versions': None, u'client': None, u'hosts': None, u'volumes': None, u'assisted_volume_snapshots': None, u'certs': None}}, u'x_openstack_request_ids': [u'req-e3efba33-ab0a-4d62-a1b6-a7770e349e43'], u'memory_mb': 0, u'current_workload': 0, u'vcpus': 0, u'running_vms': 0, u'free_disk_gb': 0, u'disk_available_least': 0, u'_info': {u'count': 0, u'vcpus_used': 0, u'local_gb_used': 0, u'memory_mb': 0, u'current_workload': 0, u'vcpus': 0, u'running_vms': 0, u'free_disk_gb': 0, u'disk_available_least': 0, u'local_gb': 0, u'free_ram_mb': 0, u'memory_mb_used': 0}, u'local_gb': 0, u'free_ram_mb': 0, u'memory_mb_used': 0, u'_loaded': True}, u'requested_count': 4, u'available_count': 15}, u'warnings': []} ERRORS [u'Only 0 nodes are exposed to Nova of 15 requests. Check that enough nodes are in "available" state with maintenance mode off.', u'Only 0 nodes are exposed to Nova of 15 requests. Check that enough nodes are in "available" state with maintenance mode off.'] Configuration has 2 errors, fix them before proceeding. Ignoring these errors is likely to lead to a failed deploy. Then ran: ironic node-list (didn't change anything, just wanted to see if there's anything wrong - nothing wrong). +--------------------------------------+--------------+---------------+-------------+--------------------+-------------+ | UUID | Name | Instance UUID | Power State | Provisioning State | Maintenance | +--------------------------------------+--------------+---------------+-------------+--------------------+-------------+ | eef80645-e3ec-4119-ad25-8cf898acfddb | compute-3 | None | power off | available | False | | a7891d59-ebc6-4410-82d7-d1b0013ba712 | compute-2 | None | power off | available | False | | 1d7b63a8-7704-448f-9e37-d39a0636ebc0 | controller-0 | None | power off | available | False | | 928c0492-357a-4dd5-9ee2-de4e886aa749 | compute-1 | None | power off | available | False | | 941a100f-5cd6-458b-b2cf-05516c73e80d | compute-0 | None | power off | available | False | | 6c5f48d6-197d-4a96-9904-d6e95209f019 | controller-2 | None | power off | available | False | | 3e2bda73-24b3-42b3-8218-df5cc118aecd | controller-1 | None | power off | available | False | | b26b9a92-b32a-4d3f-975d-c437dfc139da | ceph-1 | None | power off | available | False | | c4b9aac3-d3d7-43e0-8a68-527462b3eb1c | ceph-0 | None | power off | available | False | | b82cfdfd-6cab-48bb-a9b8-059b34ca3a2a | compute-5 | None | power off | available | False | | bd9ad19c-8be6-49d8-bb4b-335a0ee35946 | compute-4 | None | power off | available | False | | a85e54aa-1afc-48d4-bd4e-152d18c2d8a7 | ceph-5 | None | power off | available | False | | fd17cb7d-8f39-47df-b784-d4162c52741c | ceph-4 | None | power off | available | False | | 8be552cd-4529-4fdc-b777-90c8f9bb708e | ceph-3 | None | power off | available | False | | 4e025f01-1938-4045-9011-d9c1c1de122e | ceph-2 | None | power off | available | False | +--------------------------------------+--------------+---------------+-------------+--------------------+-------------+ (undercloud) [stack@undercloud-0 ~]$ bash overcloud_deploy.sh Started Mistral Workflow tripleo.validations.v1.check_pre_deployment_validations. Execution ID: 19687a0b-4779-43f2-a4d4-a32c962e23f0 Waiting for messages on queue '5ce23751-a433-4f9f-9478-9e224e4330b5' with no timeout. {u'errors': [u'Only 14 nodes are exposed to Nova of 15 requests. Check that enough nodes are in "available" state with maintenance mode off.'], u'result': {u'enough_nodes': False, u'statistics': {u'count': 14, u'vcpus_used': 0, u'local_gb_used': 0, u'manager': {u'api': {u'server_groups': None, u'keypairs': None, u'servers': None, u'server_external_events': None, u'server_migrations': None, u'agents': None, u'instance_action': None, u'glance': None, u'hypervisor_stats': None, u'virtual_interfaces': None, u'flavors': None, u'availability_zones': None, u'user_id': None, u'cloudpipe': None, u'os_cache': False, u'quotas': None, u'migrations': None, u'usage': None, u'logger': None, u'project_id': None, u'neutron': None, u'quota_classes': None, u'project_name': None, u'aggregates': None, u'flavor_access': None, u'services': None, u'list_extensions': None, u'limits': None, u'hypervisors': None, u'cells': None, u'versions': None, u'client': None, u'hosts': None, u'volumes': None, u'assisted_volume_snapshots': None, u'certs': None}}, u'x_openstack_request_ids': [u'req-49a7d40c-9a52-4617-afb8-76e06d7a3abd'], u'memory_mb': 153600, u'current_workload': 0, u'vcpus': 56, u'running_vms': 0, u'free_disk_gb': 476, u'disk_available_least': 476, u'_info': {u'count': 14, u'vcpus_used': 0, u'local_gb_used': 0, u'memory_mb': 153600, u'current_workload': 0, u'vcpus': 56, u'running_vms': 0, u'free_disk_gb': 476, u'disk_available_least': 476, u'local_gb': 476, u'free_ram_mb': 153600, u'memory_mb_used': 0}, u'local_gb': 476, u'free_ram_mb': 153600, u'memory_mb_used': 0, u'_loaded': True}, u'requested_count': 4, u'available_count': 15}, u'warnings': []} ERRORS [u'Only 14 nodes are exposed to Nova of 15 requests. Check that enough nodes are in "available" state with maintenance mode off.', u'Only 14 nodes are exposed to Nova of 15 requests. Check that enough nodes are in "available" state with maintenance mode off.'] Configuration has 2 errors, fix them before proceeding. Ignoring these errors is likely to lead to a failed deploy. ^ see how suddenly there are 14 nodes out of 15 (vs 0 out 15) On third attempt it worked smoothly (with no changes on my side).
I'll take a look tomorrow. I suspect the validation code in tripleo-common does not wait enough - nova can take 2 minutes to sync with ironic. Could you attach a sosreport just in case?
For the record: this is the line that is causing a failure. I think the whole check_nodes_count workflow https://github.com/openstack/tripleo-common/blob/6d3ce6329db353e848e453656798d1d1c0c5d387/workbooks/validations.yaml#L515 has to be retried several times with delay before failing. I'll look into it.
I noticed, that I tried it on SSL enabled undercloud and was restarting networking and keepalived close to the deployment (maybe too close). During automated tests, where I have 60 sec pause before the deployment - didn't reproduce this.
Yep, 60 seconds should be enough for nova to make up its mind. Still, that's a valid bug IMO.
Let's assume the retry fixes it.
Installed lastest osp 12, puddle 2018-07-27.2 Environment: openstack-tripleo-common-containers-7.6.13-3.el7ost.noarch Deployed 15 node (3 controller, 6 ceph, 6 compute) overcloud as described in the description of the bug. Have deleted overcloud / deployed several times and no longer see any errors as reported in the bug. At this time we will mark it verified. Please re-open bug if seen again
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:2331