Hide Forgot
Created attachment 1139323 [details] templates-for-ironic-bug.tgz Description of problem: I deployed 8.0 (from 2016-03-18) on a virtual environment with bad network isolation templates (I'll attach them to the bug for the sake of reproduction). The deploy command I used was: openstack overcloud deploy --templates --control-scale 3 --compute-scale 2 --ceph-storage-scale 1 --neutron-network-type vxlan --neutron-tunnel-types vxlan,gre --ntp-server clock.redhat.com -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/puppet-pacemaker.yaml -e network-environment.yaml Deployment failed with a timeout after 4 hours. I deleted the stack with heat stack-delete, but in ironic no nodes were deleted at all. Nodes were stuck in "deploy failed" provision state and still had their nova instance attached to them. I removed the nova instance from the ironic nodes by "ironic node-update $id remove instance_uuid" and that worked. I then tried "ironic node-delete" and got: Can not delete node "194f3a85-deee-40a9-b2d3-650addf1b1c1" while it is in provision state "deploy failed". Valid provision states to perform deletion are: "('available', None, 'manageable', 'enroll')" Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 142, in inner return func(*args, **kwargs) File "/usr/lib/python2.7/site-packages/ironic/conductor/manager.py", line 1565, in destroy_node raise exception.InvalidState(msg) InvalidState: Can not delete node "194f3a85-deee-40a9-b2d3-650addf1b1c1" while it is in provision state "deploy failed". Valid provision states to perform deletion are: "('available', None, 'manageable', 'enroll')" (HTTP 409) So I tried to set the provision state to something else by "ironic node-set-provision-state 194f3a85-deee-40a9-b2d3-650addf1b1c1 provide", and I got: The requested action "provide" can not be performed on node "194f3a85-deee-40a9-b2d3-650addf1b1c1" while it is in state "deploy failed". (HTTP 400) .. I am completely stuck and can't delete the nodes! I have to reprovision and reinstall everything. Version-Release number of selected component (if applicable): openstack-ironic-common-4.2.2-4.el7ost.noarch python-ironicclient-0.8.1-1.el7ost.noarch python-ironic-inspector-client-1.2.0-6.el7ost.noarch openstack-ironic-conductor-4.2.2-4.el7ost.noarch openstack-ironic-inspector-2.2.4-3.el7ost.noarch openstack-ironic-api-4.2.2-4.el7ost.noarch Steps to Reproduce: 1. deploy in a virtual environment with the above deploy command and the attached network isolation files 2. when stack deletion fails, try to delete the stack and check if the ironic nodes were freed as well Actual results: Ironic nodes can't be freed Additional info: I think that my mistake in the deployment, is that I assigned static IP addresses to the controller nodes only, and forgot to do the same for the computes and ceph nodes...
(In reply to Udi from comment #0) > Created attachment 1139323 [details] > templates-for-ironic-bug.tgz > > Description of problem: > I deployed 8.0 (from 2016-03-18) on a virtual environment with bad network > isolation templates (I'll attach them to the bug for the sake of > reproduction). > > The deploy command I used was: > > openstack overcloud deploy --templates --control-scale 3 --compute-scale 2 > --ceph-storage-scale 1 --neutron-network-type vxlan --neutron-tunnel-types > vxlan,gre --ntp-server clock.redhat.com -e > /usr/share/openstack-tripleo-heat-templates/environments/network-isolation. > yaml -e > /usr/share/openstack-tripleo-heat-templates/environments/storage-environment. > yaml -e > /usr/share/openstack-tripleo-heat-templates/environments/puppet-pacemaker. > yaml -e network-environment.yaml > > Deployment failed with a timeout after 4 hours. I deleted the stack with > heat stack-delete, but in ironic no nodes were deleted at all. Nodes were > stuck in "deploy failed" provision state and still had their nova instance > attached to them. > > I removed the nova instance from the ironic nodes by "ironic node-update $id > remove instance_uuid" and that worked. I then tried "ironic node-delete" and > got: > > Can not delete node "194f3a85-deee-40a9-b2d3-650addf1b1c1" while it is in > provision state "deploy failed". Valid provision states to perform deletion > are: "('available', None, 'manageable', 'enroll')" > Traceback (most recent call last): > > File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line > 142, in inner > return func(*args, **kwargs) > > File "/usr/lib/python2.7/site-packages/ironic/conductor/manager.py", line > 1565, in destroy_node > raise exception.InvalidState(msg) > > InvalidState: Can not delete node "194f3a85-deee-40a9-b2d3-650addf1b1c1" > while it is in provision state "deploy failed". Valid provision states to > perform deletion are: "('available', None, 'manageable', 'enroll')" > (HTTP 409) > > So I tried to set the provision state to something else by "ironic > node-set-provision-state 194f3a85-deee-40a9-b2d3-650addf1b1c1 provide", and > I got: > > The requested action "provide" can not be performed on node > "194f3a85-deee-40a9-b2d3-650addf1b1c1" while it is in state "deploy failed". > (HTTP 400) > Right, this may be because of the way the state machine works. Since the deployment failed and the nova instance was still associated with the node you should set the provision state to deleted (so it will be back to available) or if you don't care about it being available again you can move it to manageable state (in case you need to update something that caused the deployment to fail). So: $ ironic node-set-provision-state $NODE_UUID manage or $ ironic node-set-provision-state $NODE_UUID deleted Now you can delete the node from Ironic's inventory: $ironic node-delete $NODE_UUID ... Worth adding as well that, if the node is completely broken (say some hardware failure) you can always set the node to maintenance and it will allow you to delete it at any state: $ ironic node-set-maintenance NODE_UUID on $ironic node-delete $NODE_UUID ... Also please take a look at our state machine diagram to know what node transition are supported: http://docs.openstack.org/developer/ironic/dev/states.html