Description of problem: I am noticing when an OC deployment fails the baremetal node list Provisioning state and Power state is not getting changed. In the test I ran I received a deployment error DeploymentError: Heat Stack create failed. but the power state of the baremetal node is "power on" and the provisioning state is "deploying" This is causing the enduser not to be able to delete the failed OC. Version-Release number of selected component (if applicable): version: core_puddle_version 2018-12-07.2 rpm -qa | grep openstack-ironic openstack-ironic-common-11.1.1-0.20181012152841.el7ost.noarch How reproducible: noticing sometimes Steps to Reproduce: Note: These OC deployment steps are related to another task but got me to a failed OC deployment. 1. Deploy UC 2. Deploy Ironic nodes and introspect 3: follow these steps http://tripleo.org/install/advanced_deployment/ansible_deploy_interface.html A: Custom ansible playbooks steps 1-2 B: Installing/update UC: steps 1-5 Note: step 5 do this: "sudo chmod 777 /var/lib/ironic/ipa-ssh" C: skip Enabling Temporary URL's not needed for OSP14 D: Configure Nodes E: Editing Playbooks steps 1-2 Note: step 1 change dest: "{{ tmp_rootfs_mount }}/etc/default/grub" --> path: "{{ tmp_rootfs_mount }}/etc/default/grub" in the grub.yaml 4. Prepare for any failed OC deployment 5: run OC deployment and it will fail. Actual results: File "/usr/lib/python2.7/site-packages/tripleoclient/workflows/deployment.py", line 106, in deploy_and_wait raise exceptions.DeploymentError("Heat Stack create failed.") DeploymentError: Heat Stack create failed. END return value: 1 (undercloud) [stack@undercloud-0 ~]$ openstack baremetal node list +--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+ | UUID | Name | Instance UUID | Power State | Provisioning State | Maintenance | +--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+ | f3ded335-a9b7-4aa4-8117-caebc67b9a66 | compute-0 | 0b54a343-8d71-44f7-a204-4f22a70c895b | power on | deploying | False | | 65d42111-a299-4bb8-8e12-33fba1d513a9 | controller-0 | 665b8f4a-dacf-4176-b138-e6035435fff7 | power on | deploying | False | +--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+ Expected results: Expected the baremetal node to not be in deploying state if the OC deployment failed. Additional info: no workaroud
Getting these DeployFailures in ironic-conductor.log [conductor] ***************************************************************\nMETA: ran handlers\n\nTASK [add_host] ****************************************************************\ntask path: /var/lib/ironic/playbooks/add-ironic-nodes.yaml:4\ncreating host via \'add_host\': hostname=65d42111-a299-4bb8-8e12-33fba1d513a9\nchanged: [conductor] => (item={u\'ip\': u\'192.168.24.18\', u\'user\': u\'root\', u\'name\': u\'65d42111-a299-4bb8-8e12-33fba1d513a9\', u\'extra\': {u\'hardware_swift_object\': u\'extra_hardware-65d42111-a299-4bb8-8e12-33fba1d513a9\'}}) => {\n "add_host": {\n "groups": [\n "ironic"\n ], \n "host_name": "65d42111-a299-4bb8-8e12-33fba1d513a9", \n "host_vars": {\n "ansible_host": "192.168.24.18", \n "ansible_user": "root", \n "group": "ironic", \n "ironic_extra": {\n "hardware_swift_object": "extra_hardware-65d42111-a299-4bb8-8e12-33fba1d513a9"\n }\n }\n }, \n "changed": true, \n "item": {\n "extra": {\n "hardware_swift_object": "extra_hardware-65d42111-a299-4bb8-8e12-33fba1d513a9"\n }, \n "ip": "192.168.24.18", \n "name": "65d42111-a299-4bb8-8e12-33fba1d513a9", \n "user": "root"\n }\n}\nMETA: ran handlers\nMETA: ran handlers\n\nPLAY [ironic] ******************************************************************\n\nTASK [Gathering Facts] *********************************************************\ntask path: /var/lib/ironic/playbooks/deploy.yaml:4\nUsing module file /usr/lib/python2.7/site-packages/ansible/modules/system/setup.py\n<192.168.24.18> ESTABLISH CONNECTION FOR USER: root on PORT 22 TO 192.168.24.18\nfatal: [65d42111-a299-4bb8-8e12-33fba1d513a9]: UNREACHABLE! => {\n "changed": false, \n "msg": "[Errno 13] Permission denied: u\'/var/lib/ironic/ipa-ssh\'", \n "unreachable": true\n}\n\nPLAY RECAP *********************************************************************\n65d42111-a299-4bb8-8e12-33fba1d513a9 : ok=0 changed=0 unreachable=1 failed=0 \nconductor : ok=1 changed=1 unreachable=0 failed=0 \n\n' Stderr: u' [WARNING]: Ignoring invalid attribute: state\n [WARNING]: Ignoring invalid attribute: path\n [WARNING]: Ignoring invalid attribute: line\n' 2018-12-11 10:24:01.867 1 ERROR ironic.drivers.modules.agent_base_vendor Traceback (most recent call last): 2018-12-11 10:24:01.867 1 ERROR ironic.drivers.modules.agent_base_vendor File "/usr/lib/python2.7/site-packages/ironic/drivers/modules/agent_base_vendor.py", line 310, in heartbeat 2018-12-11 10:24:01.867 1 ERROR ironic.drivers.modules.agent_base_vendor self.continue_deploy(task) 2018-12-11 10:24:01.867 1 ERROR ironic.drivers.modules.agent_base_vendor File "/usr/lib/python2.7/site-packages/ironic_lib/metrics.py", line 60, in wrapped 2018-12-11 10:24:01.867 1 ERROR ironic.drivers.modules.agent_base_vendor result = f(*args, **kwargs) 2018-12-11 10:24:01.867 1 ERROR ironic.drivers.modules.agent_base_vendor File "/usr/lib/python2.7/site-packages/ironic/drivers/modules/ansible/deploy.py", line 564, in continue_deploy 2018-12-11 10:24:01.867 1 ERROR ironic.drivers.modules.agent_base_vendor self._ansible_deploy(task, node_address) 2018-12-11 10:24:01.867 1 ERROR ironic.drivers.modules.agent_base_vendor File "/usr/lib/python2.7/site-packages/ironic/drivers/modules/ansible/deploy.py", line 428, in _ansible_deploy 2018-12-11 10:24:01.867 1 ERROR ironic.drivers.modules.agent_base_vendor _run_playbook(node, playbook, extra_vars, key) 2018-12-11 10:24:01.867 1 ERROR ironic.drivers.modules.agent_base_vendor File "/usr/lib/python2.7/site-packages/ironic/drivers/modules/ansible/deploy.py", line 160, in _run_playbook 2018-12-11 10:24:01.867 1 ERROR ironic.drivers.modules.agent_base_vendor raise exception.InstanceDeployFailure(reason=e) 2018-12-11 10:24:01.867 1 ERROR ironic.drivers.modules.agent_base_vendor InstanceDeployFailure: Failed to deploy instance: Unexpected error while running command.
I feel when I make the change to the grub.yaml file, set the --extra parma and run the OC its triggering the ironic node not to switch from deploying state to active during the deployment and it times out. When I run this with out making any changes to the grub.yaml file it appears to change correctly through out the OC deployment. nova-compute.log 2018-12-11 18:53:05.217 1 DEBUG nova.virt.ironic.driver [-] [instance: 428de4e5-bfd1-42bf-8f21-9236ab1351bb] Still waiting for ironic node 5fb90416-7382-4418-94c2-7e07e8d264be to become ACTIVE: power_state="power on", target_power_state=None, provision_state="deploying", target_provision_state="active" _log_ironic_polling /usr/lib/python2.7/site-packages/nova/virt/ironic/driver.py:131 2018-12-11 18:53:06.376 1 DEBUG nova.virt.ironic.driver [-] [instance: 4f548d20-0d3a-43a6-9c87-f5b9108ef9fe] Still waiting for ironic node 6c1d076a-8310-4136-b6a1-0b32fdeca417 to become ACTIVE: power_state="power on", target_power_state=None, provision_state="deploying", target_provision_state="active" _log_ironic_polling /usr/lib/python2.7/site-packages/nova/virt/ironic/driver.py:131
Workaround: Director issue these commands: #sudo docker restart ironic_conductor #source stackrc && openstack baremetal node undeploy controller-0 #source stackrc && openstack baremetal node undeploy compute-0
Another workaround which is faster #sudo docker restart ironic_conductor #source stackrc && openstack stack delete -y overcloud
Update to trigger failure. In step 3 "E" when modifying the playbook there needs to be incorrect spacing. This will trigger the failure and cause the OC deploy to fail.
Per Comment 17, this is an incorrect grub setting in the playbook, not a bug, and not something that can be detected. Closing.