Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1240679

Summary:	Deploy fails or has non-zero return code - ERROR No valid host was found. There are not enough hosts available. Code: 500"
Product:	Red Hat OpenStack	Reporter:	Ronelle Landy <rlandy>
Component:	python-rdomanager-oscplugin	Assignee:	Dougal Matthews <dmatthew>
Status:	CLOSED ERRATA	QA Contact:	Udi Kalifon <ukalifon>
Severity:	urgent	Docs Contact:
Priority:	urgent
Version:	7.0 (Kilo)	CC:	akrivoka, calfonso, dmacpher, jslagle, mburns, rhel-osp-director-maint, rrosa, rybrown, whayutin
Target Milestone:	ga	Keywords:	Automation
Target Release:	Director
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	python-rdomanager-oscplugin-0.0.8-29.el7ost	Doc Type:	Bug Fix
Doc Text:	During deployment, the heat engine logs return a non-zero error due to no valid hosts being found, despite the ironic logs showing nodes available. This is due to the director not setting nodes to "available" when the "openstack baremetal introspection" command completes. This fix sets the nodes to "available" after the introspection completes. The director now sees the nodes when deploying the Overcloud.	Story Points:	---
Clone Of:		Environment:
Last Closed:	2015-08-05 13:58:41 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Ronelle Landy 2015-07-07 13:44:50 UTC

Description of problem:

Deploy the overcloud either fails to deploy certain nodes or shows a STACK_COMPLETE but has a non-zero (1) return code due to "No valid host was found. There are not enough hosts" 500 error. This error shows up in the heat engine logs.

However, ironic shows the nodes are available:

00:45:47 cmd:
00:45:47 source /home/stack/stackrc; instack-ironic-deployment --show-profile;
00:45:47 
00:45:47 start:
00:45:47 2015-07-06 20:45:43.220294
00:45:47 
00:45:47 end:
00:45:47 2015-07-06 20:45:47.442309
00:45:47 
00:45:47 delta:
00:45:47 0:00:04.222015
00:45:47 
00:45:47 stdout:
00:45:47 Preparing for deployment...
00:45:47   Querying assigned profiles ... 
00:45:47 
00:45:47     7d3f8c6f-7eb5-4609-b157-fe19c70f7fb6
00:45:47       "boot_option:local"
00:45:47 
00:45:47     ff42ed8b-4663-43d5-a96d-4c4d630cb951
00:45:47       "boot_option:local"
00:45:47 
00:45:47     99a67fa9-3dea-4ff6-a955-a4902ce3eae8
00:45:47       "boot_option:local"
00:45:47 
00:45:47     a8914beb-b7ea-4f9d-8a06-5e7a741b6cf8
00:45:47       "boot_option:local"
00:45:47 
00:45:47     458d482f-f855-4325-a02b-3f7b1deb113d
00:45:47       "boot_option:local"
00:45:47 
00:45:47 
00:45:47   DONE.
00:45:47 
00:45:47 Prepared.


Version-Release number of selected component (if applicable):

[stack@host15 ~]$ rpm -qa | grep openstack
openstack-neutron-openvswitch-2015.1.0-10.el7ost.noarch
openstack-nova-api-2015.1.0-14.el7ost.noarch
openstack-utils-2014.2-1.el7ost.noarch
openstack-heat-api-cloudwatch-2015.1.0-4.el7ost.noarch
openstack-tuskar-0.4.18-3.el7ost.noarch
openstack-nova-compute-2015.1.0-14.el7ost.noarch
openstack-nova-conductor-2015.1.0-14.el7ost.noarch
openstack-swift-account-2.3.0-1.el7ost.noarch
redhat-access-plugin-openstack-7.0.0-0.el7ost.noarch
openstack-heat-api-2015.1.0-4.el7ost.noarch
openstack-ceilometer-central-2015.1.0-6.el7ost.noarch
openstack-tripleo-common-0.0.1.dev6-0.git49b57eb.el7ost.noarch
openstack-heat-api-cfn-2015.1.0-4.el7ost.noarch
openstack-ceilometer-api-2015.1.0-6.el7ost.noarch
openstack-ironic-api-2015.1.0-8.el7ost.noarch
openstack-swift-plugin-swift3-1.7-3.el7ost.noarch
openstack-tripleo-puppet-elements-0.0.1-3.el7ost.noarch
openstack-nova-common-2015.1.0-14.el7ost.noarch
openstack-tripleo-image-elements-0.9.6-5.el7ost.noarch
openstack-heat-templates-0-0.6.20150605git.el7ost.noarch
openstack-ceilometer-notification-2015.1.0-6.el7ost.noarch
openstack-ceilometer-collector-2015.1.0-6.el7ost.noarch
openstack-ironic-common-2015.1.0-8.el7ost.noarch
openstack-tempest-kilo-20150507.2.el7ost.noarch
openstack-swift-2.3.0-1.el7ost.noarch
openstack-neutron-ml2-2015.1.0-10.el7ost.noarch
openstack-nova-novncproxy-2015.1.0-14.el7ost.noarch
openstack-nova-scheduler-2015.1.0-14.el7ost.noarch
openstack-swift-object-2.3.0-1.el7ost.noarch
openstack-nova-cert-2015.1.0-14.el7ost.noarch
openstack-dashboard-theme-2015.1.0-10.el7ost.noarch
openstack-tuskar-ui-extras-0.0.4-1.el7ost.noarch
openstack-nova-console-2015.1.0-14.el7ost.noarch
openstack-neutron-common-2015.1.0-10.el7ost.noarch
openstack-neutron-2015.1.0-10.el7ost.noarch
openstack-heat-engine-2015.1.0-4.el7ost.noarch
openstack-ceilometer-common-2015.1.0-6.el7ost.noarch
openstack-ironic-conductor-2015.1.0-8.el7ost.noarch
openstack-selinux-0.6.35-1.el7ost.noarch
openstack-swift-container-2.3.0-1.el7ost.noarch
openstack-puppet-modules-2015.1.7-5.el7ost.noarch
openstack-dashboard-2015.1.0-10.el7ost.noarch
openstack-swift-proxy-2.3.0-1.el7ost.noarch
python-django-openstack-auth-1.2.0-3.el7ost.noarch
openstack-tripleo-heat-templates-0.8.6-23.el7ost.noarch
openstack-glance-2015.1.0-6.el7ost.noarch
python-openstackclient-1.0.3-2.el7ost.noarch
openstack-ironic-discoverd-1.1.0-4.el7ost.noarch
openstack-ceilometer-alarm-2015.1.0-6.el7ost.noarch
openstack-keystone-2015.1.0-4.el7ost.noarch
openstack-tuskar-ui-0.3.0-8.el7ost.noarch
openstack-heat-common-2015.1.0-4.el7ost.noarch
openstack-tripleo-0.0.7-0.1.1664e566.el7ost.noarch

[stack@host15 ~]$ rpm -qa | grep plugin
yum-rhn-plugin-2.0.1-5.el7.noarch
yum-plugin-priorities-1.1.31-29.el7.noarch
redhat-access-plugin-openstack-7.0.0-0.el7ost.noarch
openstack-swift-plugin-swift3-1.7-3.el7ost.noarch

How reproducible:
Not always but fairly often. More so with baremetal - and HA - more nodes


Steps to Reproduce:
1. Install ops-director from poodle/puddle bits
2. instack-ironic-deployment --show-profile to check overcloud nodes are there and available
3. Deploy the overcloud 

Actual results:
Return code 1 and/or deploy fails - some nodes left in BUILD state

Expected results:
Deploy passes

Additional info:

Comment 5 John Trowbridge 2015-07-07 20:40:24 UTC

Looked into this a bit this afternoon, and it looks like we are not setting the nodes to available when the UCLI inspection command completes[1], but instead doing it in the middle of the deploy command[2]. In the instack scripts, we did this immediately after inspection[3].

The problem with doing it in the middle of the deploy command, is that it takes a minute or so for the Nova scheduler to get updated[4]. So, this then creates a race. This is mitigated somewhat by Heat retrying the deploy, but we still get spurious CI failures because we end up with a non-zero exit code.

Looking at the CLI bulk introspection code, I do not see an obvious place to put the state transition, as we only have commands for starting and polling inspection. In any case, we should move the nodes to available some time before the deploy command.

[1] https://github.com/rdo-management/python-rdomanager-oscplugin/blob/master/rdomanager_oscplugin/v1/baremetal.py#L123-L163
[2] https://github.com/rdo-management/python-rdomanager-oscplugin/blob/master/rdomanager_oscplugin/v1/overcloud_deploy.py#L359-L362
[3] https://github.com/rdo-management/instack-undercloud/blob/master/scripts/instack-ironic-deployment#L158
[4] https://bugs.launchpad.net/ironic/+bug/1248022

Comment 6 Dougal Matthews 2015-07-08 08:38:19 UTC

We can add this to the end of the command to start introspection. The tricky bit is that waiting for introspection to finish is optional. If we move this here, we need to wait for it to complete every time.

So this will cause a slight regression in removing a small feature.

Comment 7 Dougal Matthews 2015-07-09 11:27:30 UTC

Midstream patch https://review.gerrithub.io/#/c/238962/

Comment 8 Dougal Matthews 2015-07-09 11:28:44 UTC

I am unable to reproduce this to fully verify the issue, but based on the comment 5, the above review moves the changing of provisioning state to make it happen earlier in the process.

Is there a way we can tell when the Nova scheduler is updated?

Comment 11 Udi Kalifon 2015-07-21 11:56:04 UTC

Verified:
python-rdomanager-oscplugin-0.0.8-41.el7ost.noarch

Comment 13 errata-xmlrpc 2015-08-05 13:58:41 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2015:1549