Bug 1658331
Summary: | Overcloud deployment fails not changing provisioning state or powering off baremetal servers | ||
---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | bjacot |
Component: | openstack-ironic | Assignee: | RHOS Maint <rhos-maint> |
Status: | CLOSED NOTABUG | QA Contact: | bjacot |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 14.0 (Rocky) | CC: | bfournie, dtantsur, jkreger, mburns |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2019-01-03 13:46:04 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
bjacot
2018-12-11 19:02:10 UTC
Getting these DeployFailures in ironic-conductor.log [conductor] ***************************************************************\nMETA: ran handlers\n\nTASK [add_host] ****************************************************************\ntask path: /var/lib/ironic/playbooks/add-ironic-nodes.yaml:4\ncreating host via \'add_host\': hostname=65d42111-a299-4bb8-8e12-33fba1d513a9\nchanged: [conductor] => (item={u\'ip\': u\'192.168.24.18\', u\'user\': u\'root\', u\'name\': u\'65d42111-a299-4bb8-8e12-33fba1d513a9\', u\'extra\': {u\'hardware_swift_object\': u\'extra_hardware-65d42111-a299-4bb8-8e12-33fba1d513a9\'}}) => {\n "add_host": {\n "groups": [\n "ironic"\n ], \n "host_name": "65d42111-a299-4bb8-8e12-33fba1d513a9", \n "host_vars": {\n "ansible_host": "192.168.24.18", \n "ansible_user": "root", \n "group": "ironic", \n "ironic_extra": {\n "hardware_swift_object": "extra_hardware-65d42111-a299-4bb8-8e12-33fba1d513a9"\n }\n }\n }, \n "changed": true, \n "item": {\n "extra": {\n "hardware_swift_object": "extra_hardware-65d42111-a299-4bb8-8e12-33fba1d513a9"\n }, \n "ip": "192.168.24.18", \n "name": "65d42111-a299-4bb8-8e12-33fba1d513a9", \n "user": "root"\n }\n}\nMETA: ran handlers\nMETA: ran handlers\n\nPLAY [ironic] ******************************************************************\n\nTASK [Gathering Facts] *********************************************************\ntask path: /var/lib/ironic/playbooks/deploy.yaml:4\nUsing module file /usr/lib/python2.7/site-packages/ansible/modules/system/setup.py\n<192.168.24.18> ESTABLISH CONNECTION FOR USER: root on PORT 22 TO 192.168.24.18\nfatal: [65d42111-a299-4bb8-8e12-33fba1d513a9]: UNREACHABLE! => {\n "changed": false, \n "msg": "[Errno 13] Permission denied: u\'/var/lib/ironic/ipa-ssh\'", \n "unreachable": true\n}\n\nPLAY RECAP *********************************************************************\n65d42111-a299-4bb8-8e12-33fba1d513a9 : ok=0 changed=0 unreachable=1 failed=0 \nconductor : ok=1 changed=1 unreachable=0 failed=0 \n\n' Stderr: u' [WARNING]: Ignoring invalid attribute: state\n [WARNING]: Ignoring invalid attribute: path\n [WARNING]: Ignoring invalid attribute: line\n' 2018-12-11 10:24:01.867 1 ERROR ironic.drivers.modules.agent_base_vendor Traceback (most recent call last): 2018-12-11 10:24:01.867 1 ERROR ironic.drivers.modules.agent_base_vendor File "/usr/lib/python2.7/site-packages/ironic/drivers/modules/agent_base_vendor.py", line 310, in heartbeat 2018-12-11 10:24:01.867 1 ERROR ironic.drivers.modules.agent_base_vendor self.continue_deploy(task) 2018-12-11 10:24:01.867 1 ERROR ironic.drivers.modules.agent_base_vendor File "/usr/lib/python2.7/site-packages/ironic_lib/metrics.py", line 60, in wrapped 2018-12-11 10:24:01.867 1 ERROR ironic.drivers.modules.agent_base_vendor result = f(*args, **kwargs) 2018-12-11 10:24:01.867 1 ERROR ironic.drivers.modules.agent_base_vendor File "/usr/lib/python2.7/site-packages/ironic/drivers/modules/ansible/deploy.py", line 564, in continue_deploy 2018-12-11 10:24:01.867 1 ERROR ironic.drivers.modules.agent_base_vendor self._ansible_deploy(task, node_address) 2018-12-11 10:24:01.867 1 ERROR ironic.drivers.modules.agent_base_vendor File "/usr/lib/python2.7/site-packages/ironic/drivers/modules/ansible/deploy.py", line 428, in _ansible_deploy 2018-12-11 10:24:01.867 1 ERROR ironic.drivers.modules.agent_base_vendor _run_playbook(node, playbook, extra_vars, key) 2018-12-11 10:24:01.867 1 ERROR ironic.drivers.modules.agent_base_vendor File "/usr/lib/python2.7/site-packages/ironic/drivers/modules/ansible/deploy.py", line 160, in _run_playbook 2018-12-11 10:24:01.867 1 ERROR ironic.drivers.modules.agent_base_vendor raise exception.InstanceDeployFailure(reason=e) 2018-12-11 10:24:01.867 1 ERROR ironic.drivers.modules.agent_base_vendor InstanceDeployFailure: Failed to deploy instance: Unexpected error while running command. I feel when I make the change to the grub.yaml file, set the --extra parma and run the OC its triggering the ironic node not to switch from deploying state to active during the deployment and it times out. When I run this with out making any changes to the grub.yaml file it appears to change correctly through out the OC deployment. nova-compute.log 2018-12-11 18:53:05.217 1 DEBUG nova.virt.ironic.driver [-] [instance: 428de4e5-bfd1-42bf-8f21-9236ab1351bb] Still waiting for ironic node 5fb90416-7382-4418-94c2-7e07e8d264be to become ACTIVE: power_state="power on", target_power_state=None, provision_state="deploying", target_provision_state="active" _log_ironic_polling /usr/lib/python2.7/site-packages/nova/virt/ironic/driver.py:131 2018-12-11 18:53:06.376 1 DEBUG nova.virt.ironic.driver [-] [instance: 4f548d20-0d3a-43a6-9c87-f5b9108ef9fe] Still waiting for ironic node 6c1d076a-8310-4136-b6a1-0b32fdeca417 to become ACTIVE: power_state="power on", target_power_state=None, provision_state="deploying", target_provision_state="active" _log_ironic_polling /usr/lib/python2.7/site-packages/nova/virt/ironic/driver.py:131 Workaround: Director issue these commands: #sudo docker restart ironic_conductor #source stackrc && openstack baremetal node undeploy controller-0 #source stackrc && openstack baremetal node undeploy compute-0 Another workaround which is faster #sudo docker restart ironic_conductor #source stackrc && openstack stack delete -y overcloud Update to trigger failure. In step 3 "E" when modifying the playbook there needs to be incorrect spacing. This will trigger the failure and cause the OC deploy to fail. Per Comment 17, this is an incorrect grub setting in the playbook, not a bug, and not something that can be detected. Closing. |