Bug 1356292 - 'NodeLocked' error during introspection
Summary: 'NodeLocked' error during introspection
Keywords:
Status: CLOSED DUPLICATE of bug 1287848
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: python-tripleoclient
Version: 8.0 (Liberty)
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: RHOS Maint
QA Contact: Arik Chernetsky
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-07-13 22:19 UTC by Andreas Karis
Modified: 2019-11-14 08:42 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-10-03 15:37:51 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Andreas Karis 2016-07-13 22:19:43 UTC
Description of problem:
Commonly, but not always, the 'openstack baremetal introspection bulk start' throws an error
"Error contacting Ironic server: Node <uuid-node-3> is locked by host director, please retry after the current operation is completed."

Version-Release number of selected component (if applicable):
[stack@undercloud nic-configs]$ rpm -qa | grep ironic
python-ironic-inspector-client-1.2.0-6.el7ost.noarch
openstack-ironic-conductor-4.2.5-1.el7ost.noarch
openstack-ironic-common-4.2.5-1.el7ost.noarch
openstack-ironic-inspector-2.2.6-1.el7ost.noarch
openstack-ironic-api-4.2.5-1.el7ost.noarch
python-ironicclient-0.8.1-1.el7ost.noarch


How reproducible:
50% of the time in this specific environment

Steps to Reproduce:
opens[stack@rh-director ~]$ openstack baremetal introspection bulk start
Setting nodes for introspection to manageable...
Starting introspection of node: <uuid-node-1>
Starting introspection of node: <uuid-node-2>
Starting introspection of node: <uuid-node-3>
Waiting for introspection to finish...
Introspection for UUID <uuid-node-1> finished successfully.
Introspection for UUID <uuid-node-2> finished successfully.
Introspection for UUID <uuid-node-3> finished successfully.
Setting manageable nodes to available...
Node <uuid-node-1> has been set to available.
Node <uuid-node-2> has been set to available.
Request returned failure status.
Error contacting Ironic server: Node <uuid-node-3> is locked by host director, please retry after the current operation is completed.
Traceback (most recent call last):

  File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 142, in inner
    return func(*args, **kwargs)

  File "/usr/lib/python2.7/site-packages/ironic/conductor/manager.py", line 1151, in do_provisioning_action
    % action) as task:

  File "/usr/lib/python2.7/site-packages/ironic/conductor/task_manager.py", line 152, in acquire
    driver_name=driver_name, purpose=purpose)

  File "/usr/lib/python2.7/site-packages/ironic/conductor/task_manager.py", line 221, in __init__
    self.release_resources()

  File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 204, in __exit__
    six.reraise(self.type_, self.value, self.tb)

  File "/usr/lib/python2.7/site-packages/ironic/conductor/task_manager.py", line 203, in __init__
    self._lock()

  File "/usr/lib/python2.7/site-packages/ironic/conductor/task_manager.py", line 242, in _lock
    reserve_node()

  File "/usr/lib/python2.7/site-packages/retrying.py", line 68, in wrapped_f
    return Retrying(*dargs, **dkw).call(f, *args, **kw)

  File "/usr/lib/python2.7/site-packages/retrying.py", line 229, in call
    raise attempt.get()

  File "/usr/lib/python2.7/site-packages/retrying.py", line 261, in get
    six.reraise(self.value[0], self.value[1], self.value[2])

  File "/usr/lib/python2.7/site-packages/retrying.py", line 217, in call
    attempt = Attempt(fn(*args, **kwargs), attempt_number, False)

  File "/usr/lib/python2.7/site-packages/ironic/conductor/task_manager.py", line 235, in reserve_node
    self.node_id)

  File "/usr/lib/python2.7/site-packages/ironic/objects/node.py", line 228, in reserve
    db_node = cls.dbapi.reserve_node(tag, node_id)

  File "/usr/lib/python2.7/site-packages/ironic/db/sqlalchemy/api.py", line 226, in reserve_node
    host=node['reservation'])

NodeLocked: Node <uuid-node-3> is locked by host director, please retry after the current operation is completed.
 (HTTP 409). Attempt 1 of 6
Node <uuid-node-3> has been set to available.
Introspection completed.

This error does not appear to be harmful, just ugly to the end-user, as the operation appears to be retried successfully.

Not clear why there would be lock contention since there shouldn't be anything else going on but regardless, this should throw an ugle python traceback at the user.

Where are you experiencing the behavior?  What environment?

About 50% of the time doing a 3 node introspection

[stack@rh-director ~]$ more instackenv.json
{
    "nodes":[
        {
            "mac":[
                "00:00:00:00:00:01"
            ],
            "pm_type":"pxe_ilo",
            "pm_user":"Administrator",
            "pm_password":"password",
            "pm_addr":"192.168.0.11",
            "capabilities":"profile:control,boot_option:local"
        },
        {
            "mac":[
                "00:00:00:00:00:02"
            ],
            "pm_type":"pxe_ilo",
            "pm_user":"Administrator",
            "pm_password":"password",
            "pm_addr":"192.168.0.13",
            "capabilities":"profile:compute,boot_option:local"
        },
        {
            "mac":[
                "00:00:00:00:00:03"
            ],
            "pm_type":"pxe_ilo",
            "pm_user":"Administrator",
            "pm_password":"password",
            "pm_addr":"192.168.0.12",
            "capabilities":"profile:compute,boot_option:local"
        }
    ]
}

[stack@rh-director ~]$ ironic node-list
+--------------------------------------+------+---------------+-------------+--------------------+-------------+
| UUID                                 | Name | Instance UUID | Power State | Provisioning State | Maintenance |
+--------------------------------------+------+---------------+-------------+--------------------+-------------+
| <uuid-node-1> | None | None          | power off   | available          | False       |
| <uuid-node-2> | None | None          | power off   | available          | False       |
| <uuid-node-3> | None | None          | power off   | available          | False       |
+--------------------------------------+------+---------------+-------------+--------------------+-------------+

Additional info:

Comment 6 Andreas Karis 2016-07-18 21:42:57 UTC
This should likely simply be handled more gracefully by ironic instead of throwing an exception at the user.

Comment 7 Dmitry Tantsur 2016-10-03 15:37:51 UTC
Hi! We've rewritten this logic completely in OSP10, and now it should not show ugly warnings to users.

Comment 8 Miles Gould 2016-10-03 15:41:38 UTC

*** This bug has been marked as a duplicate of bug 1287848 ***


Note You need to log in before you can comment on or make changes to this bug.