Bug 1240679 - Deploy fails or has non-zero return code - ERROR No valid host was found. There are not enough hosts available. Code: 500"
Summary: Deploy fails or has non-zero return code - ERROR No valid host was found. Th...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: python-rdomanager-oscplugin
Version: 7.0 (Kilo)
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ga
: Director
Assignee: Dougal Matthews
QA Contact: Udi
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-07-07 13:44 UTC by Ronelle Landy
Modified: 2015-08-05 13:58 UTC (History)
10 users (show)

Fixed In Version: python-rdomanager-oscplugin-0.0.8-29.el7ost
Doc Type: Bug Fix
Doc Text:
During deployment, the heat engine logs return a non-zero error due to no valid hosts being found, despite the ironic logs showing nodes available. This is due to the director not setting nodes to "available" when the "openstack baremetal introspection" command completes. This fix sets the nodes to "available" after the introspection completes. The director now sees the nodes when deploying the Overcloud.
Clone Of:
Environment:
Last Closed: 2015-08-05 13:58:41 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Gerrithub.io 238962 None None None Never
Red Hat Product Errata RHEA-2015:1549 normal SHIPPED_LIVE Red Hat Enterprise Linux OpenStack Platform director Release 2015-08-05 17:49:10 UTC

Description Ronelle Landy 2015-07-07 13:44:50 UTC
Description of problem:

Deploy the overcloud either fails to deploy certain nodes or shows a STACK_COMPLETE but has a non-zero (1) return code due to "No valid host was found. There are not enough hosts" 500 error. This error shows up in the heat engine logs.

However, ironic shows the nodes are available:

00:45:47 cmd:
00:45:47 source /home/stack/stackrc; instack-ironic-deployment --show-profile;
00:45:47 
00:45:47 start:
00:45:47 2015-07-06 20:45:43.220294
00:45:47 
00:45:47 end:
00:45:47 2015-07-06 20:45:47.442309
00:45:47 
00:45:47 delta:
00:45:47 0:00:04.222015
00:45:47 
00:45:47 stdout:
00:45:47 Preparing for deployment...
00:45:47   Querying assigned profiles ... 
00:45:47 
00:45:47     7d3f8c6f-7eb5-4609-b157-fe19c70f7fb6
00:45:47       "boot_option:local"
00:45:47 
00:45:47     ff42ed8b-4663-43d5-a96d-4c4d630cb951
00:45:47       "boot_option:local"
00:45:47 
00:45:47     99a67fa9-3dea-4ff6-a955-a4902ce3eae8
00:45:47       "boot_option:local"
00:45:47 
00:45:47     a8914beb-b7ea-4f9d-8a06-5e7a741b6cf8
00:45:47       "boot_option:local"
00:45:47 
00:45:47     458d482f-f855-4325-a02b-3f7b1deb113d
00:45:47       "boot_option:local"
00:45:47 
00:45:47 
00:45:47   DONE.
00:45:47 
00:45:47 Prepared.


Version-Release number of selected component (if applicable):

[stack@host15 ~]$ rpm -qa | grep openstack
openstack-neutron-openvswitch-2015.1.0-10.el7ost.noarch
openstack-nova-api-2015.1.0-14.el7ost.noarch
openstack-utils-2014.2-1.el7ost.noarch
openstack-heat-api-cloudwatch-2015.1.0-4.el7ost.noarch
openstack-tuskar-0.4.18-3.el7ost.noarch
openstack-nova-compute-2015.1.0-14.el7ost.noarch
openstack-nova-conductor-2015.1.0-14.el7ost.noarch
openstack-swift-account-2.3.0-1.el7ost.noarch
redhat-access-plugin-openstack-7.0.0-0.el7ost.noarch
openstack-heat-api-2015.1.0-4.el7ost.noarch
openstack-ceilometer-central-2015.1.0-6.el7ost.noarch
openstack-tripleo-common-0.0.1.dev6-0.git49b57eb.el7ost.noarch
openstack-heat-api-cfn-2015.1.0-4.el7ost.noarch
openstack-ceilometer-api-2015.1.0-6.el7ost.noarch
openstack-ironic-api-2015.1.0-8.el7ost.noarch
openstack-swift-plugin-swift3-1.7-3.el7ost.noarch
openstack-tripleo-puppet-elements-0.0.1-3.el7ost.noarch
openstack-nova-common-2015.1.0-14.el7ost.noarch
openstack-tripleo-image-elements-0.9.6-5.el7ost.noarch
openstack-heat-templates-0-0.6.20150605git.el7ost.noarch
openstack-ceilometer-notification-2015.1.0-6.el7ost.noarch
openstack-ceilometer-collector-2015.1.0-6.el7ost.noarch
openstack-ironic-common-2015.1.0-8.el7ost.noarch
openstack-tempest-kilo-20150507.2.el7ost.noarch
openstack-swift-2.3.0-1.el7ost.noarch
openstack-neutron-ml2-2015.1.0-10.el7ost.noarch
openstack-nova-novncproxy-2015.1.0-14.el7ost.noarch
openstack-nova-scheduler-2015.1.0-14.el7ost.noarch
openstack-swift-object-2.3.0-1.el7ost.noarch
openstack-nova-cert-2015.1.0-14.el7ost.noarch
openstack-dashboard-theme-2015.1.0-10.el7ost.noarch
openstack-tuskar-ui-extras-0.0.4-1.el7ost.noarch
openstack-nova-console-2015.1.0-14.el7ost.noarch
openstack-neutron-common-2015.1.0-10.el7ost.noarch
openstack-neutron-2015.1.0-10.el7ost.noarch
openstack-heat-engine-2015.1.0-4.el7ost.noarch
openstack-ceilometer-common-2015.1.0-6.el7ost.noarch
openstack-ironic-conductor-2015.1.0-8.el7ost.noarch
openstack-selinux-0.6.35-1.el7ost.noarch
openstack-swift-container-2.3.0-1.el7ost.noarch
openstack-puppet-modules-2015.1.7-5.el7ost.noarch
openstack-dashboard-2015.1.0-10.el7ost.noarch
openstack-swift-proxy-2.3.0-1.el7ost.noarch
python-django-openstack-auth-1.2.0-3.el7ost.noarch
openstack-tripleo-heat-templates-0.8.6-23.el7ost.noarch
openstack-glance-2015.1.0-6.el7ost.noarch
python-openstackclient-1.0.3-2.el7ost.noarch
openstack-ironic-discoverd-1.1.0-4.el7ost.noarch
openstack-ceilometer-alarm-2015.1.0-6.el7ost.noarch
openstack-keystone-2015.1.0-4.el7ost.noarch
openstack-tuskar-ui-0.3.0-8.el7ost.noarch
openstack-heat-common-2015.1.0-4.el7ost.noarch
openstack-tripleo-0.0.7-0.1.1664e566.el7ost.noarch

[stack@host15 ~]$ rpm -qa | grep plugin
yum-rhn-plugin-2.0.1-5.el7.noarch
yum-plugin-priorities-1.1.31-29.el7.noarch
redhat-access-plugin-openstack-7.0.0-0.el7ost.noarch
openstack-swift-plugin-swift3-1.7-3.el7ost.noarch

How reproducible:
Not always but fairly often. More so with baremetal - and HA - more nodes


Steps to Reproduce:
1. Install ops-director from poodle/puddle bits
2. instack-ironic-deployment --show-profile to check overcloud nodes are there and available
3. Deploy the overcloud 

Actual results:
Return code 1 and/or deploy fails - some nodes left in BUILD state

Expected results:
Deploy passes

Additional info:

Comment 5 John Trowbridge 2015-07-07 20:40:24 UTC
Looked into this a bit this afternoon, and it looks like we are not setting the nodes to available when the UCLI inspection command completes[1], but instead doing it in the middle of the deploy command[2]. In the instack scripts, we did this immediately after inspection[3].

The problem with doing it in the middle of the deploy command, is that it takes a minute or so for the Nova scheduler to get updated[4]. So, this then creates a race. This is mitigated somewhat by Heat retrying the deploy, but we still get spurious CI failures because we end up with a non-zero exit code.

Looking at the CLI bulk introspection code, I do not see an obvious place to put the state transition, as we only have commands for starting and polling inspection. In any case, we should move the nodes to available some time before the deploy command.

[1] https://github.com/rdo-management/python-rdomanager-oscplugin/blob/master/rdomanager_oscplugin/v1/baremetal.py#L123-L163
[2] https://github.com/rdo-management/python-rdomanager-oscplugin/blob/master/rdomanager_oscplugin/v1/overcloud_deploy.py#L359-L362
[3] https://github.com/rdo-management/instack-undercloud/blob/master/scripts/instack-ironic-deployment#L158
[4] https://bugs.launchpad.net/ironic/+bug/1248022

Comment 6 Dougal Matthews 2015-07-08 08:38:19 UTC
We can add this to the end of the command to start introspection. The tricky bit is that waiting for introspection to finish is optional. If we move this here, we need to wait for it to complete every time.

So this will cause a slight regression in removing a small feature.

Comment 7 Dougal Matthews 2015-07-09 11:27:30 UTC
Midstream patch https://review.gerrithub.io/#/c/238962/

Comment 8 Dougal Matthews 2015-07-09 11:28:44 UTC
I am unable to reproduce this to fully verify the issue, but based on the comment 5, the above review moves the changing of provisioning state to make it happen earlier in the process.

Is there a way we can tell when the Nova scheduler is updated?

Comment 11 Udi 2015-07-21 11:56:04 UTC
Verified:
python-rdomanager-oscplugin-0.0.8-41.el7ost.noarch

Comment 13 errata-xmlrpc 2015-08-05 13:58:41 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2015:1549


Note You need to log in before you can comment on or make changes to this bug.