Bug 1287848 - Node locked error during node introspection
Node locked error during node introspection
Status: CLOSED CURRENTRELEASE
Product: Red Hat OpenStack
Classification: Red Hat
Component: python-tripleoclient (Show other bugs)
8.0 (Liberty)
Unspecified Unspecified
medium Severity medium
: ---
: 8.0 (Liberty)
Assigned To: RHOS Maint
Arik Chernetsky
: Reopened, ZStream
: 1356292 (view as bug list)
Depends On:
Blocks: 1261979
  Show dependency treegraph
 
Reported: 2015-12-02 15:17 EST by Chris Dearborn
Modified: 2016-10-03 11:44 EDT (History)
21 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-10-03 11:44:28 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
ironic logs from the director node (1.12 MB, application/x-gzip)
2016-01-28 15:05 EST, Chris Dearborn
no flags Details

  None (edit)
Description Chris Dearborn 2015-12-02 15:17:03 EST
Description of problem:
Initial node introspection fails while setting nodes to available with:
Error contacting Ironic server: Node 2eb0d12c-c49a-41df-aeb8-f54c03b67d55 is locked by host director-r13.rcbd.lab, please retry after the current operation is completed.

Note that running introspection again seems to succeed.

Version-Release number of selected component (if applicable):
OSP-D 8 Beta

How reproducible:
Run "openstack baremetal introspection bulk start" to launch node introspection.

Steps to Reproduce:
1. Run "openstack baremetal introspection bulk start"
2. Node introspection errors out

Actual results:
Node introspection fails.

Expected results:
Node introspection should successfully set all nodes to available.

Additional info:

The full error follows:
[stack@director-r13 ~]$ openstack baremetal introspection bulk start
Setting available nodes to manageable...		
Starting introspection of node: a34bc8c3-0968-4e60-bed0-da39ada8e7f0
Starting introspection of node: fc7e23e4-a104-4882-b3af-98b326b92a80
Starting introspection of node: 4d6a461f-a7de-439e-ab47-c46abdc158a7
Starting introspection of node: 08523959-cb92-45aa-b32f-e5e8b854ea0a
Starting introspection of node: 93ae4d69-5f31-484a-84e5-1b5dbf99aced
Starting introspection of node: 0a6b3382-2c67-4792-900a-05624c7251af
Starting introspection of node: 5ddc6c88-bd93-4fcd-8671-cf2a9b43ca53
Starting introspection of node: 2eb0d12c-c49a-41df-aeb8-f54c03b67d55
Starting introspection of node: 80d914b9-0107-42d0-987a-83da84356982
Waiting for introspection to finish...
Introspection for UUID a34bc8c3-0968-4e60-bed0-da39ada8e7f0 finished successfully.
Introspection for UUID 4d6a461f-a7de-439e-ab47-c46abdc158a7 finished successfully.
Introspection for UUID fc7e23e4-a104-4882-b3af-98b326b92a80 finished successfully.
Introspection for UUID 08523959-cb92-45aa-b32f-e5e8b854ea0a finished successfully.
Introspection for UUID 0a6b3382-2c67-4792-900a-05624c7251af finished successfully.
Introspection for UUID 5ddc6c88-bd93-4fcd-8671-cf2a9b43ca53 finished successfully.
Introspection for UUID 80d914b9-0107-42d0-987a-83da84356982 finished successfully.
Introspection for UUID 93ae4d69-5f31-484a-84e5-1b5dbf99aced finished successfully.
Introspection for UUID 2eb0d12c-c49a-41df-aeb8-f54c03b67d55 finished successfully.
Setting manageable nodes to available...
Node a34bc8c3-0968-4e60-bed0-da39ada8e7f0 has been set to available.
Node fc7e23e4-a104-4882-b3af-98b326b92a80 has been set to available.
Node 4d6a461f-a7de-439e-ab47-c46abdc158a7 has been set to available.
Node 08523959-cb92-45aa-b32f-e5e8b854ea0a has been set to available.
Node 93ae4d69-5f31-484a-84e5-1b5dbf99aced has been set to available.
Node 0a6b3382-2c67-4792-900a-05624c7251af has been set to available.
Node 5ddc6c88-bd93-4fcd-8671-cf2a9b43ca53 has been set to available.
Request returned failure status.
Error contacting Ironic server: Node 2eb0d12c-c49a-41df-aeb8-f54c03b67d55 is locked by host director-r13.rcbd.lab, please retry after the current operation is completed.
Traceback (most recent call last):

  File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 142, in inner
    return func(*args, **kwargs)

  File "/usr/lib/python2.7/site-packages/ironic/conductor/manager.py", line 1149, in do_provisioning_action
    % action) as task:

  File "/usr/lib/python2.7/site-packages/ironic/conductor/task_manager.py", line 152, in acquire
    driver_name=driver_name, purpose=purpose)

  File "/usr/lib/python2.7/site-packages/ironic/conductor/task_manager.py", line 221, in __init__
    self.release_resources()

  File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 195, in __exit__
    six.reraise(self.type_, self.value, self.tb)

  File "/usr/lib/python2.7/site-packages/ironic/conductor/task_manager.py", line 203, in __init__
    self._lock()

  File "/usr/lib/python2.7/site-packages/ironic/conductor/task_manager.py", line 242, in _lock
    reserve_node()

  File "/usr/lib/python2.7/site-packages/retrying.py", line 68, in wrapped_f
    return Retrying(*dargs, **dkw).call(f, *args, **kw)

  File "/usr/lib/python2.7/site-packages/retrying.py", line 229, in call
    raise attempt.get()

  File "/usr/lib/python2.7/site-packages/retrying.py", line 261, in get
    six.reraise(self.value[0], self.value[1], self.value[2])

  File "/usr/lib/python2.7/site-packages/retrying.py", line 217, in call
    attempt = Attempt(fn(*args, **kwargs), attempt_number, False)

  File "/usr/lib/python2.7/site-packages/ironic/conductor/task_manager.py", line 235, in reserve_node
    self.node_id)

  File "/usr/lib/python2.7/site-packages/ironic/objects/node.py", line 228, in reserve
    db_node = cls.dbapi.reserve_node(tag, node_id)

  File "/usr/lib/python2.7/site-packages/ironic/db/sqlalchemy/api.py", line 226, in reserve_node
    host=node['reservation'])

NodeLocked: Node 2eb0d12c-c49a-41df-aeb8-f54c03b67d55 is locked by host director-r13.rcbd.lab, please retry after the current operation is completed.
 (HTTP 409). Attempt 1 of 6
Node 2eb0d12c-c49a-41df-aeb8-f54c03b67d55 has been set to available.
Node 80d914b9-0107-42d0-987a-83da84356982 has been set to available.
Introspection completed.
[stack@director-r13 ~]$
Comment 2 Chris Dearborn 2015-12-03 16:29:42 EST
I slicked and reloaded my director node, and now node introspection is failing with the same error every time even after multiple runs.
Comment 3 Chris Dearborn 2016-01-07 14:53:46 EST
We've done a couple deployments with OSP 8 Beta 3, and have not seen this issue.  Evidently a change in beta 3 resolved the problem.  Closing.
Comment 4 Chris Dearborn 2016-01-13 07:13:11 EST
Just did a beta 3 deploy, and this happened again.  Reopening.
Comment 5 Chris Dearborn 2016-01-13 09:22:30 EST
Update: Node introspection appears to succeed, but this error or warning is shown.
Comment 6 Mike Burns 2016-01-13 09:25:55 EST
Lucas, any thoughts?
Comment 7 Mike Burns 2016-01-28 14:49:36 EST
Chris, can we get any logs from Ironic?

Lucas, can you see anything?
Comment 8 Chris Dearborn 2016-01-28 15:02:37 EST
I just completed introspection with OSP8 beta 4, and this is still an issue, though it does appear to be more of a warning than anything else.  Looks pretty bad though:

[stack@director-r13 ~]$ openstack baremetal introspection bulk start
Setting available nodes to manageable...
Starting introspection of node: ed756c4e-5a8b-4d60-ab36-9599131da568
Starting introspection of node: 54653cb5-89ea-48a3-96a3-71a9a07ba90b
Starting introspection of node: ad38de44-6d87-4d2b-b3ee-05831e7aa29f
Starting introspection of node: 4b5eedfa-2319-4656-94cf-44b4ab51f63a
Starting introspection of node: 3645cd3c-c77a-4dc9-9d6c-39607349a859
Starting introspection of node: 4ac88109-0ef3-48a4-abe3-79f380d7c84b
Starting introspection of node: e56cdb20-183a-414c-8ca5-47703d37ed41
Starting introspection of node: 1cfc22bc-9194-4c08-a9b7-bcc3bbd25ae5
Starting introspection of node: b0a32584-4614-4a6f-b649-173d63397a14
Waiting for introspection to finish...
Introspection for UUID 54653cb5-89ea-48a3-96a3-71a9a07ba90b finished successfull                                y.
Introspection for UUID ed756c4e-5a8b-4d60-ab36-9599131da568 finished successfull                                y.
Introspection for UUID ad38de44-6d87-4d2b-b3ee-05831e7aa29f finished successfull                                y.
Introspection for UUID 3645cd3c-c77a-4dc9-9d6c-39607349a859 finished successfull                                y.
Introspection for UUID 4b5eedfa-2319-4656-94cf-44b4ab51f63a finished successfull                                y.
Introspection for UUID e56cdb20-183a-414c-8ca5-47703d37ed41 finished successfull                                y.
Introspection for UUID 4ac88109-0ef3-48a4-abe3-79f380d7c84b finished successfull                                y.
Introspection for UUID 1cfc22bc-9194-4c08-a9b7-bcc3bbd25ae5 finished successfull                                y.
Introspection for UUID b0a32584-4614-4a6f-b649-173d63397a14 finished successfull                                y.
Setting manageable nodes to available...
Node ed756c4e-5a8b-4d60-ab36-9599131da568 has been set to available.
Node 54653cb5-89ea-48a3-96a3-71a9a07ba90b has been set to available.
Node ad38de44-6d87-4d2b-b3ee-05831e7aa29f has been set to available.
Node 4b5eedfa-2319-4656-94cf-44b4ab51f63a has been set to available.
Node 3645cd3c-c77a-4dc9-9d6c-39607349a859 has been set to available.
Node 4ac88109-0ef3-48a4-abe3-79f380d7c84b has been set to available.
Node e56cdb20-183a-414c-8ca5-47703d37ed41 has been set to available.
Node 1cfc22bc-9194-4c08-a9b7-bcc3bbd25ae5 has been set to available.
Request returned failure status.
Error contacting Ironic server: Node b0a32584-4614-4a6f-b649-173d63397a14 is loc                                ked by host director-r13.rcbd.lab, please retry after the current operation is c                                ompleted.
Traceback (most recent call last):

  File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 142                                , in inner
    return func(*args, **kwargs)

  File "/usr/lib/python2.7/site-packages/ironic/conductor/manager.py", line 1150                                , in do_provisioning_action
    % action) as task:

  File "/usr/lib/python2.7/site-packages/ironic/conductor/task_manager.py", line                                 152, in acquire
    driver_name=driver_name, purpose=purpose)

  File "/usr/lib/python2.7/site-packages/ironic/conductor/task_manager.py", line                                 221, in __init__
    self.release_resources()

  File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 195, in _                                _exit__
    six.reraise(self.type_, self.value, self.tb)

  File "/usr/lib/python2.7/site-packages/ironic/conductor/task_manager.py", line                                 203, in __init__
    self._lock()

  File "/usr/lib/python2.7/site-packages/ironic/conductor/task_manager.py", line                                 242, in _lock
    reserve_node()

  File "/usr/lib/python2.7/site-packages/retrying.py", line 68, in wrapped_f
    return Retrying(*dargs, **dkw).call(f, *args, **kw)

  File "/usr/lib/python2.7/site-packages/retrying.py", line 229, in call
    raise attempt.get()

  File "/usr/lib/python2.7/site-packages/retrying.py", line 261, in get
    six.reraise(self.value[0], self.value[1], self.value[2])

  File "/usr/lib/python2.7/site-packages/retrying.py", line 217, in call
    attempt = Attempt(fn(*args, **kwargs), attempt_number, False)

  File "/usr/lib/python2.7/site-packages/ironic/conductor/task_manager.py", line                                 235, in reserve_node
    self.node_id)

  File "/usr/lib/python2.7/site-packages/ironic/objects/node.py", line 228, in r                                eserve
    db_node = cls.dbapi.reserve_node(tag, node_id)

  File "/usr/lib/python2.7/site-packages/ironic/db/sqlalchemy/api.py", line 226,                                 in reserve_node
    host=node['reservation'])

NodeLocked: Node b0a32584-4614-4a6f-b649-173d63397a14 is locked by host director                                -r13.rcbd.lab, please retry after the current operation is completed.
 (HTTP 409). Attempt 1 of 6
Request returned failure status.
Error contacting Ironic server: Node b0a32584-4614-4a6f-b649-173d63397a14 is loc                                ked by host director-r13.rcbd.lab, please retry after the current operation is c                                ompleted.
Traceback (most recent call last):

  File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 142                                , in inner
    return func(*args, **kwargs)

  File "/usr/lib/python2.7/site-packages/ironic/conductor/manager.py", line 1150                                , in do_provisioning_action
    % action) as task:

  File "/usr/lib/python2.7/site-packages/ironic/conductor/task_manager.py", line                                 152, in acquire
    driver_name=driver_name, purpose=purpose)

  File "/usr/lib/python2.7/site-packages/ironic/conductor/task_manager.py", line                                 221, in __init__
    self.release_resources()

  File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 195, in _                                _exit__
    six.reraise(self.type_, self.value, self.tb)

  File "/usr/lib/python2.7/site-packages/ironic/conductor/task_manager.py", line                                 203, in __init__
    self._lock()

  File "/usr/lib/python2.7/site-packages/ironic/conductor/task_manager.py", line                                 242, in _lock
    reserve_node()

  File "/usr/lib/python2.7/site-packages/retrying.py", line 68, in wrapped_f
    return Retrying(*dargs, **dkw).call(f, *args, **kw)

  File "/usr/lib/python2.7/site-packages/retrying.py", line 229, in call
    raise attempt.get()

  File "/usr/lib/python2.7/site-packages/retrying.py", line 261, in get
    six.reraise(self.value[0], self.value[1], self.value[2])

  File "/usr/lib/python2.7/site-packages/retrying.py", line 217, in call
    attempt = Attempt(fn(*args, **kwargs), attempt_number, False)

  File "/usr/lib/python2.7/site-packages/ironic/conductor/task_manager.py", line                                 235, in reserve_node
    self.node_id)

  File "/usr/lib/python2.7/site-packages/ironic/objects/node.py", line 228, in r                                eserve
    db_node = cls.dbapi.reserve_node(tag, node_id)

  File "/usr/lib/python2.7/site-packages/ironic/db/sqlalchemy/api.py", line 226,                                 in reserve_node
    host=node['reservation'])

NodeLocked: Node b0a32584-4614-4a6f-b649-173d63397a14 is locked by host director                                -r13.rcbd.lab, please retry after the current operation is completed.
 (HTTP 409). Attempt 2 of 6
Node b0a32584-4614-4a6f-b649-173d63397a14 has been set to available.
Introspection completed.
[stack@director-r13 ~]$
Comment 9 Chris Dearborn 2016-01-28 15:05 EST
Created attachment 1119266 [details]
ironic logs from the director node

Added all ironic logs from the director node.
Comment 10 Mike Burns 2016-02-04 15:00:21 EST
Lucas, suspicion is that this is a non-fatal warning.  is this true?
Comment 11 Lucas Alvares Gomes 2016-02-17 09:32:20 EST
Hi @Mike, @Chris,

The NodeLocked means that when the operation was tried a conductor was doing something in the node (sync'ing power state, transitioning the power state from A to B, deploying etc...). This error is retry-able as we can see in the output there (and it's being automatically retried) [0].

I think the problem here is that the CLI command is leaking the exception. Ironic will print out exceptions in case it's in debug mode. @Chris can you please check the /etc/ironic/ironic.conf configuration file and see if Ironic is running in debug mode? e.g

[DEFAULT]
debug=True

[0] NodeLocked: Node 2eb0d12c-c49a-41df-aeb8-f54c03b67d55 is locked by host director-r13.rcbd.lab, please retry after the current operation is completed.
 (HTTP 409). Attempt 1 of 6
Comment 12 Chris Dearborn 2016-02-17 10:13:02 EST
Just checked, and in /etc/ironic/ironic.conf:

# If set to false, will disable INFO logging level, making
# WARNING the default. (boolean value)
#verbose=true
verbose=False
Comment 13 Chris Dearborn 2016-02-17 10:13:40 EST
Also from ironic.conf:

# Print debugging output (set logging level to DEBUG instead
# of default INFO level). (boolean value)
#debug=false
debug=True
Comment 14 Lucas Alvares Gomes 2016-02-18 04:56:53 EST
@Chris,

Thanks, yeah setting debug=False will stop leaking those tracebacks on the API side. It was considered a "feature" in Ironic until recently. Theorically we should not run this in debug mode in production (debug messages may contain secrets and these kind of information), but in reality people do so we introduced a new configuration option to turn on/off those exceptions in the API side even if it's in debug mode, the patch is this one https://review.openstack.org/#/c/256615/.

If you think it's important to run things in debug mode but do not want the traceback I can check about back porting it into OSP.
Comment 15 Miles Gould 2016-10-03 11:41:38 EDT
*** Bug 1356292 has been marked as a duplicate of this bug. ***
Comment 16 Miles Gould 2016-10-03 11:44:28 EDT
We've rewritten this logic completely in OSP10, and now it should not show ugly warnings to users.

Note You need to log in before you can comment on or make changes to this bug.