Bug 1270370

Summary: [RDO-Manager] bulk introspection moving the nodes from available to manageable too quickly [getting: NodeLocked:]
Product: [Community] RDO Reporter: Omri Hochman <ohochman>
Component: rdo-managerAssignee: Hugh Brock <hbrock>
Status: CLOSED CURRENTRELEASE QA Contact: Shai Revivo <srevivo>
Severity: medium Docs Contact:
Priority: high    
Version: LibertyCC: chris.brown, dyasny, goneri, jtrowbri, mburns
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-06-18 06:15:28 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Omri Hochman 2015-10-09 19:32:27 UTC
[RDO-Manager] bulk introspection moving the nodes from available to manageable too quickly [getting: NodeLocked:]


Environment:
-------------
python-tripleoclient-0.0.11-dev62.el7.centos.noarch
instack-undercloud-2.1.3-dev224.el7.centos.noarch
instack-0.0.8-dev4.el7.centos.noarch
openstack-ironic-inspector-2.2.1-dev7.el7.centos.noarch
openstack-ironic-common-4.2.1-dev26.el7.centos.noarch
python-ironic-inspector-client-1.2.1-dev7.el7.centos.noarch
python-ironicclient-0.8.2-dev14.el7.centos.noarch
bulk start moving the nodes from available to manageable to quickly

Description:
------------

- Bare-metal environment with 1 undercloud and 7 nodes, 

It seems that the command  "openstack baremetal introspection bulk start" is moving the nodes from available to manageable too quickly and therefore sometimes throwing Error: 

 
[stack@rhos-compute-node-18 ~]$ openstack baremetal introspection bulk start
Setting available nodes to manageable...
Starting introspection of node: 9a1d87b9-ea80-414d-842e-d0ac5b373ce6
Starting introspection of node: abd4664f-e55f-49cd-abf4-01d773893cba
Starting introspection of node: c1013049-5cc2-4f83-b337-719012e3ad05
Starting introspection of node: 7471db83-194a-48ab-a6d8-6125c149498e
Starting introspection of node: e9274c3f-1331-4c4e-9f88-fe00ba2e4e57
Starting introspection of node: c026eb52-0710-4138-bd84-2927cf7e2d94
Starting introspection of node: 8bd03f31-4118-44eb-9c7a-4512e2da36b0
Waiting for introspection to finish...
Introspection for UUID 9a1d87b9-ea80-414d-842e-d0ac5b373ce6 finished successfully.
Introspection for UUID abd4664f-e55f-49cd-abf4-01d773893cba finished successfully.
Introspection for UUID c1013049-5cc2-4f83-b337-719012e3ad05 finished successfully.
Introspection for UUID e9274c3f-1331-4c4e-9f88-fe00ba2e4e57 finished successfully.
Introspection for UUID 7471db83-194a-48ab-a6d8-6125c149498e finished successfully.
Introspection for UUID c026eb52-0710-4138-bd84-2927cf7e2d94 finished successfully.
Introspection for UUID 8bd03f31-4118-44eb-9c7a-4512e2da36b0 finished successfully.
Setting manageable nodes to available...
Node 9a1d87b9-ea80-414d-842e-d0ac5b373ce6 has been set to available.
Node abd4664f-e55f-49cd-abf4-01d773893cba has been set to available.
Node c1013049-5cc2-4f83-b337-719012e3ad05 has been set to available.
Node 7471db83-194a-48ab-a6d8-6125c149498e has been set to available.
Node e9274c3f-1331-4c4e-9f88-fe00ba2e4e57 has been set to available.
Node c026eb52-0710-4138-bd84-2927cf7e2d94 has been set to available.
Request returned failure status.
Error contacting Ironic server: Node 8bd03f31-4118-44eb-9c7a-4512e2da36b0 is locked by host rhos-compute-node-18.osqe.lab.eng.bos.redhat.com, please retry aft
Traceback (most recent call last):

  File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 142, in inner
    return func(*args, **kwargs)

  File "/usr/lib/python2.7/site-packages/ironic/conductor/manager.py", line 1149, in do_provisioning_action
    % action) as task:
  File "/usr/lib/python2.7/site-packages/ironic/conductor/task_manager.py", line 221, in __init__
    self.release_resources()

  File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 195, in __exit__
    six.reraise(self.type_, self.value, self.tb)

  File "/usr/lib/python2.7/site-packages/ironic/conductor/task_manager.py", line 203, in __init__
    self._lock()

  File "/usr/lib/python2.7/site-packages/ironic/conductor/task_manager.py", line 242, in _lock
    reserve_node()

  File "/usr/lib/python2.7/site-packages/retrying.py", line 68, in wrapped_f
    return Retrying(*dargs, **dkw).call(f, *args, **kw)

  File "/usr/lib/python2.7/site-packages/retrying.py", line 229, in call
    raise attempt.get()

  File "/usr/lib/python2.7/site-packages/retrying.py", line 261, in get
    six.reraise(self.value[0], self.value[1], self.value[2])

  File "/usr/lib/python2.7/site-packages/retrying.py", line 217, in call
    attempt = Attempt(fn(*args, **kwargs), attempt_number, False)

  File "/usr/lib/python2.7/site-packages/ironic/conductor/task_manager.py", line 235, in reserve_node
    self.node_id)

  File "/usr/lib/python2.7/site-packages/ironic/objects/node.py", line 228, in reserve
    db_node = cls.dbapi.reserve_node(tag, node_id)

  File "/usr/lib/python2.7/site-packages/ironic/db/sqlalchemy/api.py", line 226, in reserve_node
    host=node['reservation'])   

NodeLocked: Node 8bd03f31-4118-44eb-9c7a-4512e2da36b0 is locked by host rhos-compute-node-18.osqe.lab.eng.bos.redhat.com, please retry after the current opera
 (HTTP 409). Attempt 1 of 6
  File "/usr/lib/python2.7/site-packages/ironic/conductor/task_manager.py", line 221, in __init__
    self.release_resources()

  File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 195, in __exit__
    six.reraise(self.type_, self.value, self.tb)

  File "/usr/lib/python2.7/site-packages/ironic/conductor/task_manager.py", line 203, in __init__
    self._lock()

  File "/usr/lib/python2.7/site-packages/ironic/conductor/task_manager.py", line 242, in _lock
    reserve_node()

  File "/usr/lib/python2.7/site-packages/retrying.py", line 68, in wrapped_f
    return Retrying(*dargs, **dkw).call(f, *args, **kw)

  File "/usr/lib/python2.7/site-packages/retrying.py", line 229, in call
    raise attempt.get()

  File "/usr/lib/python2.7/site-packages/retrying.py", line 261, in get
    six.reraise(self.value[0], self.value[1], self.value[2])

  File "/usr/lib/python2.7/site-packages/retrying.py", line 217, in call
    attempt = Attempt(fn(*args, **kwargs), attempt_number, False)

  File "/usr/lib/python2.7/site-packages/ironic/conductor/task_manager.py", line 235, in reserve_node
    self.node_id)

  File "/usr/lib/python2.7/site-packages/ironic/objects/node.py", line 228, in reserve
    db_node = cls.dbapi.reserve_node(tag, node_id)

  File "/usr/lib/python2.7/site-packages/ironic/db/sqlalchemy/api.py", line 226, in reserve_node
    host=node['reservation'])   

NodeLocked: Node 8bd03f31-4118-44eb-9c7a-4512e2da36b0 is locked by host rhos-compute-node-18.osqe.lab.eng.bos.redhat.com, please retry after the current opera
 (HTTP 409). Attempt 2 of 6
Node 8bd03f31-4118-44eb-9c7a-4512e2da36b0 has been set to available.
Introspection completed.






Ironic-api.log:
----------------
2015-10-09 14:31:21.223 16485 WARNING wsme.api [req-79cb36df-972c-4084-af67-52d61f3444a6 ] Client-side error: Node 081660c4-2031-45b3-99b9-e7d779ff5675 is loc
b.eng.bos.redhat.com, please retry after the current operation is completed.
Traceback (most recent call last):

  File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 142, in inner
    return func(*args, **kwargs)

  File "/usr/lib/python2.7/site-packages/ironic/conductor/manager.py", line 1149, in do_provisioning_action
    % action) as task:

  File "/usr/lib/python2.7/site-packages/ironic/conductor/task_manager.py", line 152, in acquire
    driver_name=driver_name, purpose=purpose)

  File "/usr/lib/python2.7/site-packages/ironic/conductor/task_manager.py", line 221, in __init__
    self.release_resources()

  File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 195, in __exit__
    six.reraise(self.type_, self.value, self.tb)

  File "/usr/lib/python2.7/site-packages/ironic/conductor/task_manager.py", line 203, in __init__
    self._lock()

  File "/usr/lib/python2.7/site-packages/ironic/conductor/task_manager.py", line 242, in _lock
    reserve_node()

  File "/usr/lib/python2.7/site-packages/retrying.py", line 68, in wrapped_f
    return Retrying(*dargs, **dkw).call(f, *args, **kw)

  File "/usr/lib/python2.7/site-packages/retrying.py", line 229, in call
    raise attempt.get()

  File "/usr/lib/python2.7/site-packages/retrying.py", line 261, in get
    six.reraise(self.value[0], self.value[1], self.value[2])

  File "/usr/lib/python2.7/site-packages/retrying.py", line 217, in call
    attempt = Attempt(fn(*args, **kwargs), attempt_number, False)
  File "/usr/lib/python2.7/site-packages/ironic/conductor/task_manager.py", line 242, in _lock
    reserve_node()

  File "/usr/lib/python2.7/site-packages/retrying.py", line 68, in wrapped_f
    return Retrying(*dargs, **dkw).call(f, *args, **kw)

  File "/usr/lib/python2.7/site-packages/retrying.py", line 229, in call
    raise attempt.get()

  File "/usr/lib/python2.7/site-packages/retrying.py", line 261, in get
    six.reraise(self.value[0], self.value[1], self.value[2])

  File "/usr/lib/python2.7/site-packages/retrying.py", line 217, in call
    attempt = Attempt(fn(*args, **kwargs), attempt_number, False)

  File "/usr/lib/python2.7/site-packages/ironic/conductor/task_manager.py", line 235, in reserve_node
    self.node_id)

  File "/usr/lib/python2.7/site-packages/ironic/objects/node.py", line 228, in reserve
    db_node = cls.dbapi.reserve_node(tag, node_id)

  File "/usr/lib/python2.7/site-packages/ironic/db/sqlalchemy/api.py", line 226, in reserve_node
    host=node['reservation'])

NodeLocked: Node 081660c4-2031-45b3-99b9-e7d779ff5675 is locked by host rhos-compute-node-18.osqe.lab.eng.bos.redhat.com, please retry after the current opera

Comment 1 Omri Hochman 2015-10-09 19:33:25 UTC
Despite the exception it seems that Introspection completed.

[stack@rhos-compute-node-18 ~]$ ironic node-list
+--------------------------------------+------+---------------+-------------+--------------------+-------------+
| UUID                                 | Name | Instance UUID | Power State | Provisioning State | Maintenance |
+--------------------------------------+------+---------------+-------------+--------------------+-------------+
| 9a1d87b9-ea80-414d-842e-d0ac5b373ce6 | None | None          | power off   | available          | False       |
| abd4664f-e55f-49cd-abf4-01d773893cba | None | None          | power off   | available          | False       |
| c1013049-5cc2-4f83-b337-719012e3ad05 | None | None          | power off   | available          | False       |
| 7471db83-194a-48ab-a6d8-6125c149498e | None | None          | power off   | available          | False       |
| e9274c3f-1331-4c4e-9f88-fe00ba2e4e57 | None | None          | power off   | available          | False       |
| c026eb52-0710-4138-bd84-2927cf7e2d94 | None | None          | power off   | available          | False       |
| 8bd03f31-4118-44eb-9c7a-4512e2da36b0 | None | None          | power off   | available          | False       |
+--------------------------------------+------+---------------+-------------+--------------------+-------------+

Comment 4 Christopher Brown 2017-06-17 18:56:15 UTC
This is fine, not seen this for years, can be closed.