Description of problem: Commonly, but not always, the 'openstack baremetal introspection bulk start' throws an error "Error contacting Ironic server: Node <uuid-node-3> is locked by host director, please retry after the current operation is completed." Version-Release number of selected component (if applicable): [stack@undercloud nic-configs]$ rpm -qa | grep ironic python-ironic-inspector-client-1.2.0-6.el7ost.noarch openstack-ironic-conductor-4.2.5-1.el7ost.noarch openstack-ironic-common-4.2.5-1.el7ost.noarch openstack-ironic-inspector-2.2.6-1.el7ost.noarch openstack-ironic-api-4.2.5-1.el7ost.noarch python-ironicclient-0.8.1-1.el7ost.noarch How reproducible: 50% of the time in this specific environment Steps to Reproduce: opens[stack@rh-director ~]$ openstack baremetal introspection bulk start Setting nodes for introspection to manageable... Starting introspection of node: <uuid-node-1> Starting introspection of node: <uuid-node-2> Starting introspection of node: <uuid-node-3> Waiting for introspection to finish... Introspection for UUID <uuid-node-1> finished successfully. Introspection for UUID <uuid-node-2> finished successfully. Introspection for UUID <uuid-node-3> finished successfully. Setting manageable nodes to available... Node <uuid-node-1> has been set to available. Node <uuid-node-2> has been set to available. Request returned failure status. Error contacting Ironic server: Node <uuid-node-3> is locked by host director, please retry after the current operation is completed. Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 142, in inner return func(*args, **kwargs) File "/usr/lib/python2.7/site-packages/ironic/conductor/manager.py", line 1151, in do_provisioning_action % action) as task: File "/usr/lib/python2.7/site-packages/ironic/conductor/task_manager.py", line 152, in acquire driver_name=driver_name, purpose=purpose) File "/usr/lib/python2.7/site-packages/ironic/conductor/task_manager.py", line 221, in __init__ self.release_resources() File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 204, in __exit__ six.reraise(self.type_, self.value, self.tb) File "/usr/lib/python2.7/site-packages/ironic/conductor/task_manager.py", line 203, in __init__ self._lock() File "/usr/lib/python2.7/site-packages/ironic/conductor/task_manager.py", line 242, in _lock reserve_node() File "/usr/lib/python2.7/site-packages/retrying.py", line 68, in wrapped_f return Retrying(*dargs, **dkw).call(f, *args, **kw) File "/usr/lib/python2.7/site-packages/retrying.py", line 229, in call raise attempt.get() File "/usr/lib/python2.7/site-packages/retrying.py", line 261, in get six.reraise(self.value[0], self.value[1], self.value[2]) File "/usr/lib/python2.7/site-packages/retrying.py", line 217, in call attempt = Attempt(fn(*args, **kwargs), attempt_number, False) File "/usr/lib/python2.7/site-packages/ironic/conductor/task_manager.py", line 235, in reserve_node self.node_id) File "/usr/lib/python2.7/site-packages/ironic/objects/node.py", line 228, in reserve db_node = cls.dbapi.reserve_node(tag, node_id) File "/usr/lib/python2.7/site-packages/ironic/db/sqlalchemy/api.py", line 226, in reserve_node host=node['reservation']) NodeLocked: Node <uuid-node-3> is locked by host director, please retry after the current operation is completed. (HTTP 409). Attempt 1 of 6 Node <uuid-node-3> has been set to available. Introspection completed. This error does not appear to be harmful, just ugly to the end-user, as the operation appears to be retried successfully. Not clear why there would be lock contention since there shouldn't be anything else going on but regardless, this should throw an ugle python traceback at the user. Where are you experiencing the behavior? What environment? About 50% of the time doing a 3 node introspection [stack@rh-director ~]$ more instackenv.json { "nodes":[ { "mac":[ "00:00:00:00:00:01" ], "pm_type":"pxe_ilo", "pm_user":"Administrator", "pm_password":"password", "pm_addr":"192.168.0.11", "capabilities":"profile:control,boot_option:local" }, { "mac":[ "00:00:00:00:00:02" ], "pm_type":"pxe_ilo", "pm_user":"Administrator", "pm_password":"password", "pm_addr":"192.168.0.13", "capabilities":"profile:compute,boot_option:local" }, { "mac":[ "00:00:00:00:00:03" ], "pm_type":"pxe_ilo", "pm_user":"Administrator", "pm_password":"password", "pm_addr":"192.168.0.12", "capabilities":"profile:compute,boot_option:local" } ] } [stack@rh-director ~]$ ironic node-list +--------------------------------------+------+---------------+-------------+--------------------+-------------+ | UUID | Name | Instance UUID | Power State | Provisioning State | Maintenance | +--------------------------------------+------+---------------+-------------+--------------------+-------------+ | <uuid-node-1> | None | None | power off | available | False | | <uuid-node-2> | None | None | power off | available | False | | <uuid-node-3> | None | None | power off | available | False | +--------------------------------------+------+---------------+-------------+--------------------+-------------+ Additional info:
This should likely simply be handled more gracefully by ironic instead of throwing an exception at the user.
Hi! We've rewritten this logic completely in OSP10, and now it should not show ugly warnings to users.
*** This bug has been marked as a duplicate of bug 1287848 ***