Bug 1323274

Summary: nodes in "deploy failed" state can't be reset to "available"
Product: Red Hat OpenStack Reporter: Dan Yocum <dyocum>
Component: python-ironicclientAssignee: RHOS Maint <rhos-maint>
Status: CLOSED NOTABUG QA Contact: Shai Revivo <srevivo>
Severity: high Docs Contact:
Priority: medium    
Version: 8.0 (Liberty)CC: apevec, gkeegan, lhh, lmartins, mburns, rhel-osp-director-maint
Target Milestone: ---Keywords: ZStream
Target Release: 8.0 (Liberty)   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-08-18 14:19:33 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Dan Yocum 2016-04-01 16:40:35 UTC
Description of problem:

When the /mnt dir on the director node is on a FS that is too small, causing the BM deployment to fail, the node is set to provision state "deploy failed" and can't be reset to "available" 

Version-Release number of selected component (if applicable):

python-ironicclient-0.5.1-12.el7ost.noarch
openstack-ironic-discoverd-1.1.0-8.el7ost.noarch
openstack-ironic-common-2015.1.2-2.el7ost.noarch
openstack-ironic-api-2015.1.2-2.el7ost.noarch
openstack-ironic-conductor-2015.1.2-2.el7ost.noarch
python-ironic-discoverd-1.1.0-8.el7ost.noarch


How reproducible:

Every

Steps to Reproduce:
1. Put /mnt on a too small FS partition (<2GB)
2. deploy the overcloud - it will fail
3. ironic node-list 


Actual results:

| 2c06aaa3-bc9e-45ee-83c9-25160ee4cb49 | None | None          | power off   | deploy failed   | False       |

ironic node-set-provision-state 2c06aaa3-bc9e-45ee-83c9-25160ee4cb49 available

usage: ironic node-set-provision-state [--config-drive <config-drive>]
                                       <node> <provision-state>
ironic node-set-provision-state: error: argument <provision-state>: invalid choice: 'available' (choose from 'active', 'deleted', 'rebuild', 'inspect', 'provide', 'manage')

Expected results:

Node is set to 'available' state.


Additional info:

Comment 3 Dan Yocum 2016-04-26 16:59:53 UTC
This continues to be an issue in OSP-d v8:

openstack-ironic-api-4.2.2-4.el7ost.noarch
openstack-ironic-common-4.2.2-4.el7ost.noarch
openstack-ironic-conductor-4.2.2-4.el7ost.noarch
openstack-ironic-inspector-2.2.5-2.el7ost.noarch
python-ironicclient-0.8.1-1.el7ost.noarch
python-ironic-inspector-client-1.2.0-6.el7ost.noarch


i.e., 

| 9983c951-e820-4023-859e-357692547d91 | None | None          | power off   | error              | False       |

[stack@ops2 ~]$ ironic  node-set-provision-state 9983c951-e820-4023-859e-357692547d91 provide 
The requested action "provide" can not be performed on node "9983c951-e820-4023-859e-357692547d91" while it is in state "error". (HTTP 400)

Comment 4 Dan Yocum 2016-04-26 17:10:36 UTC
Manually updating the database is no fun, but here's the command to get it back to a known good state:

MariaDB [ironic]> update nodes set provision_state='available', target_provision_state=null, last_error=null, instance_info='{}' where uuid='<UUID OF ERRORED NODE>';

Comment 6 Lucas Alvares Gomes 2016-08-18 14:19:33 UTC
From deploy failed you should use the "deleted" verb:

$ ironic node-set-provision-state <uuid> deleted

It's also possible to bring the node back to "manageable" state and from there go to "available":

$ ironic node-set-provisio-state <uuid> manage
$ ironic node-set-provisio-state <uuid> provide

Please take a look at the node states diagram: http://docs.openstack.org/developer/ironic/dev/states.html