Bug 1240679
| Summary: | Deploy fails or has non-zero return code - ERROR No valid host was found. There are not enough hosts available. Code: 500" | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Ronelle Landy <rlandy> |
| Component: | python-rdomanager-oscplugin | Assignee: | Dougal Matthews <dmatthew> |
| Status: | CLOSED ERRATA | QA Contact: | Udi Kalifon <ukalifon> |
| Severity: | urgent | Docs Contact: | |
| Priority: | urgent | ||
| Version: | 7.0 (Kilo) | CC: | akrivoka, calfonso, dmacpher, jslagle, mburns, rhel-osp-director-maint, rrosa, rybrown, whayutin |
| Target Milestone: | ga | Keywords: | Automation |
| Target Release: | Director | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | python-rdomanager-oscplugin-0.0.8-29.el7ost | Doc Type: | Bug Fix |
| Doc Text: |
During deployment, the heat engine logs return a non-zero error due to no valid hosts being found, despite the ironic logs showing nodes available. This is due to the director not setting nodes to "available" when the "openstack baremetal introspection" command completes. This fix sets the nodes to "available" after the introspection completes. The director now sees the nodes when deploying the Overcloud.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2015-08-05 13:58:41 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Ronelle Landy
2015-07-07 13:44:50 UTC
Looked into this a bit this afternoon, and it looks like we are not setting the nodes to available when the UCLI inspection command completes[1], but instead doing it in the middle of the deploy command[2]. In the instack scripts, we did this immediately after inspection[3]. The problem with doing it in the middle of the deploy command, is that it takes a minute or so for the Nova scheduler to get updated[4]. So, this then creates a race. This is mitigated somewhat by Heat retrying the deploy, but we still get spurious CI failures because we end up with a non-zero exit code. Looking at the CLI bulk introspection code, I do not see an obvious place to put the state transition, as we only have commands for starting and polling inspection. In any case, we should move the nodes to available some time before the deploy command. [1] https://github.com/rdo-management/python-rdomanager-oscplugin/blob/master/rdomanager_oscplugin/v1/baremetal.py#L123-L163 [2] https://github.com/rdo-management/python-rdomanager-oscplugin/blob/master/rdomanager_oscplugin/v1/overcloud_deploy.py#L359-L362 [3] https://github.com/rdo-management/instack-undercloud/blob/master/scripts/instack-ironic-deployment#L158 [4] https://bugs.launchpad.net/ironic/+bug/1248022 We can add this to the end of the command to start introspection. The tricky bit is that waiting for introspection to finish is optional. If we move this here, we need to wait for it to complete every time. So this will cause a slight regression in removing a small feature. Midstream patch https://review.gerrithub.io/#/c/238962/ I am unable to reproduce this to fully verify the issue, but based on the comment 5, the above review moves the changing of provisioning state to make it happen earlier in the process. Is there a way we can tell when the Nova scheduler is updated? Verified: python-rdomanager-oscplugin-0.0.8-41.el7ost.noarch Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2015:1549 |