Description of problem: In OSP14 with networking-ansible enabled I am getting following error when deleting nova instances (backed by BM nodes): 2019-02-19 17:58:16.052 1 ERROR ironicclient.common.http [req-1c8e02f9-30e8-40f6-9176-63d3c24e2c0e 85ace131b01b49598469689274636c63 7ad8b8ac7c3a43beaa75042e46379230 - default default] Error contacting Ironic server: Node 678ca4b1-b1b7-46d3-b8c9-a1fbc3ce197d can not be updated while a state transition is in progress. (HTTP 409). Attempt 61 of 61: Conflict: Node 678ca4b1-b1b7-46d3-b8c9-a1fbc3ce197d can not be updated while a state transition is in progress. (HTTP 409) 2019-02-19 17:58:17.458 1 INFO nova.compute.manager [-] [instance: ae870e98-b3bb-45c9-9cc6-59b22cfe64f1] Took 1.40 seconds to deallocate network for instance. 2019-02-19 17:58:17.459 1 ERROR nova.compute.manager [req-1c8e02f9-30e8-40f6-9176-63d3c24e2c0e 85ace131b01b49598469689274636c63 7ad8b8ac7c3a43beaa75042e46379230 - default default] [instance: ae870e98-b3bb-45c9-9cc6-59b22cfe64f1] Setting instance vm_state to ERROR: Conflict: Node 678ca4b1-b1b7-46d3-b8c9-a1fbc3ce197d can not be updated while a state transition is in progress. (HTTP 409) 2019-02-19 17:58:17.459 1 ERROR nova.compute.manager [instance: ae870e98-b3bb-45c9-9cc6-59b22cfe64f1] Traceback (most recent call last): 2019-02-19 17:58:17.459 1 ERROR nova.compute.manager [instance: ae870e98-b3bb-45c9-9cc6-59b22cfe64f1] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2670, in do_terminate_instance 2019-02-19 17:58:17.459 1 ERROR nova.compute.manager [instance: ae870e98-b3bb-45c9-9cc6-59b22cfe64f1] self._delete_instance(context, instance, bdms) 2019-02-19 17:58:17.459 1 ERROR nova.compute.manager [instance: ae870e98-b3bb-45c9-9cc6-59b22cfe64f1] File "/usr/lib/python2.7/site-packages/nova/hooks.py", line 154, in inner 2019-02-19 17:58:17.459 1 ERROR nova.compute.manager [instance: ae870e98-b3bb-45c9-9cc6-59b22cfe64f1] rv = f(*args, **kwargs) 2019-02-19 17:58:17.459 1 ERROR nova.compute.manager [instance: ae870e98-b3bb-45c9-9cc6-59b22cfe64f1] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2613, in _delete_instance 2019-02-19 17:58:17.459 1 ERROR nova.compute.manager [instance: ae870e98-b3bb-45c9-9cc6-59b22cfe64f1] self._shutdown_instance(context, instance, bdms) 2019-02-19 17:58:17.459 1 ERROR nova.compute.manager [instance: ae870e98-b3bb-45c9-9cc6-59b22cfe64f1] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2505, in _shutdown_instance 2019-02-19 17:58:17.459 1 ERROR nova.compute.manager [instance: ae870e98-b3bb-45c9-9cc6-59b22cfe64f1] requested_networks) 2019-02-19 17:58:17.459 1 ERROR nova.compute.manager [instance: ae870e98-b3bb-45c9-9cc6-59b22cfe64f1] File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__ 2019-02-19 17:58:17.459 1 ERROR nova.compute.manager [instance: ae870e98-b3bb-45c9-9cc6-59b22cfe64f1] self.force_reraise() 2019-02-19 17:58:17.459 1 ERROR nova.compute.manager [instance: ae870e98-b3bb-45c9-9cc6-59b22cfe64f1] File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise 2019-02-19 17:58:17.459 1 ERROR nova.compute.manager [instance: ae870e98-b3bb-45c9-9cc6-59b22cfe64f1] six.reraise(self.type_, self.value, self.tb) 2019-02-19 17:58:17.459 1 ERROR nova.compute.manager [instance: ae870e98-b3bb-45c9-9cc6-59b22cfe64f1] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2492, in _shutdown_instance 2019-02-19 17:58:17.459 1 ERROR nova.compute.manager [instance: ae870e98-b3bb-45c9-9cc6-59b22cfe64f1] block_device_info) 2019-02-19 17:58:17.459 1 ERROR nova.compute.manager [instance: ae870e98-b3bb-45c9-9cc6-59b22cfe64f1] File "/usr/lib/python2.7/site-packages/nova/virt/ironic/driver.py", line 1256, in destroy 2019-02-19 17:58:17.459 1 ERROR nova.compute.manager [instance: ae870e98-b3bb-45c9-9cc6-59b22cfe64f1] self._cleanup_deploy(node, instance, network_info) 2019-02-19 17:58:17.459 1 ERROR nova.compute.manager [instance: ae870e98-b3bb-45c9-9cc6-59b22cfe64f1] File "/usr/lib/python2.7/site-packages/nova/virt/ironic/driver.py", line 478, in _cleanup_deploy 2019-02-19 17:58:17.459 1 ERROR nova.compute.manager [instance: ae870e98-b3bb-45c9-9cc6-59b22cfe64f1] self._remove_instance_info_from_node(node, instance) 2019-02-19 17:58:17.459 1 ERROR nova.compute.manager [instance: ae870e98-b3bb-45c9-9cc6-59b22cfe64f1] File "/usr/lib/python2.7/site-packages/nova/virt/ironic/driver.py", line 413, in _remove_instance_info_from_node 2019-02-19 17:58:17.459 1 ERROR nova.compute.manager [instance: ae870e98-b3bb-45c9-9cc6-59b22cfe64f1] self.ironicclient.call('node.update', node.uuid, patch) 2019-02-19 17:58:17.459 1 ERROR nova.compute.manager [instance: ae870e98-b3bb-45c9-9cc6-59b22cfe64f1] File "/usr/lib/python2.7/site-packages/nova/virt/ironic/client_wrapper.py", line 170, in call 2019-02-19 17:58:17.459 1 ERROR nova.compute.manager [instance: ae870e98-b3bb-45c9-9cc6-59b22cfe64f1] return self._multi_getattr(client, method)(*args, **kwargs) 2019-02-19 17:58:17.459 1 ERROR nova.compute.manager [instance: ae870e98-b3bb-45c9-9cc6-59b22cfe64f1] File "/usr/lib/python2.7/site-packages/ironicclient/v1/node.py", line 359, in update 2019-02-19 17:58:17.459 1 ERROR nova.compute.manager [instance: ae870e98-b3bb-45c9-9cc6-59b22cfe64f1] params=params) 2019-02-19 17:58:17.459 1 ERROR nova.compute.manager [instance: ae870e98-b3bb-45c9-9cc6-59b22cfe64f1] File "/usr/lib/python2.7/site-packages/ironicclient/common/base.py", line 232, in _update 2019-02-19 17:58:17.459 1 ERROR nova.compute.manager [instance: ae870e98-b3bb-45c9-9cc6-59b22cfe64f1] resp, body = self.api.json_request(method, url, **kwargs) 2019-02-19 17:58:17.459 1 ERROR nova.compute.manager [instance: ae870e98-b3bb-45c9-9cc6-59b22cfe64f1] File "/usr/lib/python2.7/site-packages/ironicclient/common/http.py", line 678, in json_request 2019-02-19 17:58:17.459 1 ERROR nova.compute.manager [instance: ae870e98-b3bb-45c9-9cc6-59b22cfe64f1] resp = self._http_request(url, method, **kwargs) 2019-02-19 17:58:17.459 1 ERROR nova.compute.manager [instance: ae870e98-b3bb-45c9-9cc6-59b22cfe64f1] File "/usr/lib/python2.7/site-packages/ironicclient/common/http.py", line 287, in wrapper 2019-02-19 17:58:17.459 1 ERROR nova.compute.manager [instance: ae870e98-b3bb-45c9-9cc6-59b22cfe64f1] return func(self, url, method, **kwargs) 2019-02-19 17:58:17.459 1 ERROR nova.compute.manager [instance: ae870e98-b3bb-45c9-9cc6-59b22cfe64f1] File "/usr/lib/python2.7/site-packages/ironicclient/common/http.py", line 660, in _http_request 2019-02-19 17:58:17.459 1 ERROR nova.compute.manager [instance: ae870e98-b3bb-45c9-9cc6-59b22cfe64f1] error_json.get('debuginfo'), method, url) 2019-02-19 17:58:17.459 1 ERROR nova.compute.manager [instance: ae870e98-b3bb-45c9-9cc6-59b22cfe64f1] Conflict: Node 678ca4b1-b1b7-46d3-b8c9-a1fbc3ce197d can not be updated while a state transition is in progress. (HTTP 409) 2019-02-19 17:58:17.459 1 ERROR nova.compute.manager [instance: ae870e98-b3bb-45c9-9cc6-59b22cfe64f1] 2019-02-19 17:58:17.594 1 INFO nova.compute.manager [-] [instance: ae870e98-b3bb-45c9-9cc6-59b22cfe64f1] During sync_power_state the instance has a pending task (deleting). Skip. 2019-02-19 17:58:17.770 1 INFO nova.compute.manager [req-1c8e02f9-30e8-40f6-9176-63d3c24e2c0e 85ace131b01b49598469689274636c63 7ad8b8ac7c3a43beaa75042e46379230 - default default] [instance: ae870e98-b3bb-45c9-9cc6-59b22cfe64f1] Successfully reverted task state from deleting on failure for instance. This might be networking-ansible related, since I haven't seen this error when using flat network for Ironic It seems that only nova is unable to get the correct state, from Ironic output all the cleaned-up nodes do go back to a happy state: (chrisj-osp14) [stack@undercloud-osp14 ~]$ openstack baremetal node list +--------------------------------------+------------------------------+---------------+-------------+--------------------+-------------+ | UUID | Name | Instance UUID | Power State | Provisioning State | Maintenance | +--------------------------------------+------------------------------+---------------+-------------+--------------------+-------------+ | 68cd5104-fc2f-48ef-b02e-cd7ecdb8a941 | ASRock-J1900D2Y-172.31.9.31 | None | power off | available | False | | a067cb0d-796e-4d7e-b686-7009f309a8d8 | ASRock-J1900D2Y-172.31.9.32 | None | power off | available | False | | e455da6a-0692-4459-ac8b-551545a92086 | ASRock-J1900D2Y-172.31.9.33 | None | power off | available | False | | 678ca4b1-b1b7-46d3-b8c9-a1fbc3ce197d | Supermicro-A1SAi-172.31.9.34 | None | power off | available | False | | 3dc357ba-565e-4c8a-9666-bc91bca8f23f | Supermicro-A1SRi-172.31.9.35 | None | power off | available | False | +--------------------------------------+------------------------------+---------------+-------------+--------------------+-------------+ Version-Release number of selected component (if applicable): OSP14 How reproducible: 80% of the time Steps to Reproduce: 1. Deploy OSP14 with Ironic and networking-ansible enabled 2. Spawn BM node from nova 3. Delete BM node from nova Actual results: Error Expected results: Nova removes the instance Additional info: The logs seems to be too large to attach to this BZ so I am including the link: http://chrisj.cloud/containers.tar.gz
We've encountered an issue with similar symptoms on QE environment. The cause was that Dell machines took long time after restart to boot up - it was about 5 minutes from the point where node was powered on till it got to the boot loader phase. The workaround was to bump api_max_retries settings for nova's ironic client. The original issue is described in bug 1647005. Chris, any chance you can try to change this value and re-test if you still hit the issue? I'm pasting here Doc text form the bug 1647005 for your convenience: Nova-compute ironic driver tries to update BM node while the node is being cleaned up. The cleaning takes approximately five minutes but nova-compute attempts to update the node for approximately two minutes. After timeout, nova-compute stops and puts nova instance into ERROR state. As a workaround, set the following configuration option for nova-compute service: [ironic] api_max_retries = 180 As a result, nova-compute continues to attempt to update BM node longer and eventually succeeds.
Adjusting value for api_max_retries has worked around this issue. [ironic] api_max_retries = 180 that extra time that networking-ansible need to take to set the port in my environment must make a difference to go over the default retries value. I wonder if we could inject this into templates for ironic defaults. The BM nodes to take longer then VMs to delete, due to cleanup steps and whatever needs to be done on the switch for multi-tenancy.
(In reply to Chris Janiszewski from comment #2) > Adjusting value for api_max_retries has worked around this issue. > [ironic] > api_max_retries = 180 > > that extra time that networking-ansible need to take to set the port in my > environment must make a difference to go over the default retries value. I > wonder if we could inject this into templates for ironic defaults. The BM > nodes to take longer then VMs to delete, due to cleanup steps and whatever > needs to be done on the switch for multi-tenancy. We should, I'm working on it here: https://review.openstack.org/#/c/638119/ I'm closing this bug as a dup of the bug 1647005 but thanks for your findings. I raised priority of the original bug and started working on it right away. *** This bug has been marked as a duplicate of bug 1647005 ***