Bug 1284247 - Retry timeout when setting node state to available
Summary: Retry timeout when setting node state to available
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: python-tripleoclient
Version: 8.0 (Liberty)
Hardware: Unspecified
OS: Unspecified
urgent
high
Target Milestone: ga
: 9.0 (Mitaka)
Assignee: Brad P. Crochet
QA Contact: Shai Revivo
URL:
Whiteboard:
Depends On:
Blocks: 1343744
TreeView+ depends on / blocked
 
Reported: 2015-11-22 11:45 UTC by Udi Kalifon
Modified: 2016-06-28 13:48 UTC (History)
17 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1343744 (view as bug list)
Environment:
Last Closed: 2016-06-28 13:48:32 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Udi Kalifon 2015-11-22 11:45:15 UTC
Description of problem:
When introspection retries it throws an exception and a traceback:

[stack@puma01 ~]$ openstack baremetal introspection bulk start
Setting available nodes to manageable...
Starting introspection of node: f1651455-d1a3-4716-9818-303c55a90e89
Starting introspection of node: 404100ea-df81-448d-ab75-7cdaa3f45373
Starting introspection of node: f2d7b8b0-cbd7-4466-9aac-3d5d07af5559
Starting introspection of node: f1c24a81-34c1-4646-bcf6-92c82a8c591e
Starting introspection of node: ab1cd1a1-d52b-459e-b03a-2a4753e8692c
Starting introspection of node: 1bbc969d-479d-4f9b-becf-713be9085289
Waiting for introspection to finish...
Introspection for UUID f1651455-d1a3-4716-9818-303c55a90e89 finished successfully.
Introspection for UUID 404100ea-df81-448d-ab75-7cdaa3f45373 finished successfully.
Introspection for UUID f2d7b8b0-cbd7-4466-9aac-3d5d07af5559 finished successfully.
Introspection for UUID f1c24a81-34c1-4646-bcf6-92c82a8c591e finished successfully.
Introspection for UUID ab1cd1a1-d52b-459e-b03a-2a4753e8692c finished successfully.
Introspection for UUID 1bbc969d-479d-4f9b-becf-713be9085289 finished successfully.
Setting manageable nodes to available...
Node f1651455-d1a3-4716-9818-303c55a90e89 has been set to available.
Node 404100ea-df81-448d-ab75-7cdaa3f45373 has been set to available.
Node f2d7b8b0-cbd7-4466-9aac-3d5d07af5559 has been set to available.
Node f1c24a81-34c1-4646-bcf6-92c82a8c591e has been set to available.
Request returned failure status.
Error contacting Ironic server: Node ab1cd1a1-d52b-459e-b03a-2a4753e8692c is locked by host puma01.scl.lab.tlv.redhat.com, please retry after the current operation is completed.
Traceback (most recent call last):

  File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 142, in inner
    return func(*args, **kwargs)

  File "/usr/lib/python2.7/site-packages/ironic/conductor/manager.py", line 1149, in do_provisioning_action
    % action) as task:

  File "/usr/lib/python2.7/site-packages/ironic/conductor/task_manager.py", line 152, in acquire
    driver_name=driver_name, purpose=purpose)

  File "/usr/lib/python2.7/site-packages/ironic/conductor/task_manager.py", line 221, in __init__
    self.release_resources()

  File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 195, in __exit__
    six.reraise(self.type_, self.value, self.tb)

  File "/usr/lib/python2.7/site-packages/ironic/conductor/task_manager.py", line 203, in __init__
    self._lock()

  File "/usr/lib/python2.7/site-packages/ironic/conductor/task_manager.py", line 242, in _lock
    reserve_node()

  File "/usr/lib/python2.7/site-packages/retrying.py", line 68, in wrapped_f
    return Retrying(*dargs, **dkw).call(f, *args, **kw)

  File "/usr/lib/python2.7/site-packages/retrying.py", line 229, in call
    raise attempt.get()

  File "/usr/lib/python2.7/site-packages/retrying.py", line 261, in get
    six.reraise(self.value[0], self.value[1], self.value[2])

  File "/usr/lib/python2.7/site-packages/retrying.py", line 217, in call
    attempt = Attempt(fn(*args, **kwargs), attempt_number, False)

  File "/usr/lib/python2.7/site-packages/ironic/conductor/task_manager.py", line 235, in reserve_node
    self.node_id)

  File "/usr/lib/python2.7/site-packages/ironic/objects/node.py", line 228, in reserve
    db_node = cls.dbapi.reserve_node(tag, node_id)

  File "/usr/lib/python2.7/site-packages/ironic/db/sqlalchemy/api.py", line 226, in reserve_node
    host=node['reservation'])

NodeLocked: Node ab1cd1a1-d52b-459e-b03a-2a4753e8692c is locked by host puma01.scl.lab.tlv.redhat.com, please retry after the current operation is completed.
 (HTTP 409). Attempt 1 of 6
Node ab1cd1a1-d52b-459e-b03a-2a4753e8692c has been set to available.
Request returned failure status.


Version-Release number of selected component (if applicable):
openstack-ironic-conductor-4.2.0-2.1.el7ost.noarch
openstack-ironic-inspector-2.2.2-1.el7ost.noarch
python-ironic-inspector-client-1.2.0-5.el7ost.noarch
python-ironicclient-0.8.1-1.el7ost.noarch
openstack-ironic-common-4.2.0-2.1.el7ost.noarch
openstack-ironic-api-4.2.0-2.1.el7ost.noarch


How reproducible:
When retries occur


Steps to Reproduce:
1. Install OSP-d puddle 2015.11.19.2 (final beta)
2. Run introspection of bare metal nodes

Comment 2 Dmitry Tantsur 2015-11-23 09:39:36 UTC
First of all, this bug is not directly related to ironic and/or inspector ("thanks" to bulk start command for being so confusing). I think the root cause is that we've dropped the ironicclient patch for bumping retries number. tripleoclient must do it now instead when trying to update node provision state.

Comment 3 Dmitry Tantsur 2015-11-23 10:46:55 UTC
Ofer, could you explain why you changed the component? This error does not even involve discoverd. I've set it to the correct component previously.

Comment 4 Ofer Blaut 2015-11-29 10:37:27 UTC
Hi

We don't have python-rdomanager-oscplugin in OSPd8 so i moved it to ironic 

I will move it to ospd 

Ofer

Comment 8 Jaromir Coufal 2016-06-07 19:52:38 UTC
Doc_text only if the issue is in unified CLI and there is workaround for using Ironic CLI.

Comment 19 Brad P. Crochet 2016-06-16 19:07:45 UTC
(In reply to Dmitry Tantsur from comment #2)
> First of all, this bug is not directly related to ironic and/or inspector
> ("thanks" to bulk start command for being so confusing). I think the root
> cause is that we've dropped the ironicclient patch for bumping retries
> number. tripleoclient must do it now instead when trying to update node
> provision state.

Can you point me to the patch you speak of?

Comment 20 Dmitry Tantsur 2016-06-17 07:31:28 UTC
It will be tricky to find it now, you can probably look at OSPd7 patches. But it was essentially bumping the default values of max_retries argument to Ironic client __init__.

Comment 21 Brad P. Crochet 2016-06-23 11:34:42 UTC
Can you reproduce this and provide the output with the --debug flag turned on? I have not been able to reproduce this on my end.

Comment 22 Udi Kalifon 2016-06-28 13:36:10 UTC
This does not reproduce with the latest 9.0 (puddle from Jun 27). I tried to fail introspection by powering off the nodes, and also by providing a wrong ipmi address. The node that fails goes to maintenance mode and the process behaved as expected.


Note You need to log in before you can comment on or make changes to this bug.