Bug 1343744 - Retry timeout when setting node state to available [NEEDINFO]
Summary: Retry timeout when setting node state to available
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-common
Version: 8.0 (Liberty)
Hardware: Unspecified
OS: Unspecified
urgent
high
Target Milestone: rc
: 10.0 (Newton)
Assignee: Derek Higgins
QA Contact: Raviv Bar-Tal
URL:
Whiteboard:
: 1290066 (view as bug list)
Depends On: 1284247
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-06-07 19:43 UTC by Jay Dobies
Modified: 2016-12-14 15:36 UTC (History)
20 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1284247
Environment:
Last Closed: 2016-12-14 15:36:15 UTC
rbartal: needinfo? (jason.dobies)


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2016:2948 normal SHIPPED_LIVE Red Hat OpenStack Platform 10 enhancement update 2016-12-14 19:55:27 UTC

Description Jay Dobies 2016-06-07 19:43:29 UTC
+++ This bug was initially created as a clone of Bug #1284247 +++

Description of problem:
When introspection retries it throws an exception and a traceback:

[stack@puma01 ~]$ openstack baremetal introspection bulk start
Setting available nodes to manageable...
Starting introspection of node: f1651455-d1a3-4716-9818-303c55a90e89
Starting introspection of node: 404100ea-df81-448d-ab75-7cdaa3f45373
Starting introspection of node: f2d7b8b0-cbd7-4466-9aac-3d5d07af5559
Starting introspection of node: f1c24a81-34c1-4646-bcf6-92c82a8c591e
Starting introspection of node: ab1cd1a1-d52b-459e-b03a-2a4753e8692c
Starting introspection of node: 1bbc969d-479d-4f9b-becf-713be9085289
Waiting for introspection to finish...
Introspection for UUID f1651455-d1a3-4716-9818-303c55a90e89 finished successfully.
Introspection for UUID 404100ea-df81-448d-ab75-7cdaa3f45373 finished successfully.
Introspection for UUID f2d7b8b0-cbd7-4466-9aac-3d5d07af5559 finished successfully.
Introspection for UUID f1c24a81-34c1-4646-bcf6-92c82a8c591e finished successfully.
Introspection for UUID ab1cd1a1-d52b-459e-b03a-2a4753e8692c finished successfully.
Introspection for UUID 1bbc969d-479d-4f9b-becf-713be9085289 finished successfully.
Setting manageable nodes to available...
Node f1651455-d1a3-4716-9818-303c55a90e89 has been set to available.
Node 404100ea-df81-448d-ab75-7cdaa3f45373 has been set to available.
Node f2d7b8b0-cbd7-4466-9aac-3d5d07af5559 has been set to available.
Node f1c24a81-34c1-4646-bcf6-92c82a8c591e has been set to available.
Request returned failure status.
Error contacting Ironic server: Node ab1cd1a1-d52b-459e-b03a-2a4753e8692c is locked by host puma01.scl.lab.tlv.redhat.com, please retry after the current operation is completed.
Traceback (most recent call last):

  File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 142, in inner
    return func(*args, **kwargs)

  File "/usr/lib/python2.7/site-packages/ironic/conductor/manager.py", line 1149, in do_provisioning_action
    % action) as task:

  File "/usr/lib/python2.7/site-packages/ironic/conductor/task_manager.py", line 152, in acquire
    driver_name=driver_name, purpose=purpose)

  File "/usr/lib/python2.7/site-packages/ironic/conductor/task_manager.py", line 221, in __init__
    self.release_resources()

  File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 195, in __exit__
    six.reraise(self.type_, self.value, self.tb)

  File "/usr/lib/python2.7/site-packages/ironic/conductor/task_manager.py", line 203, in __init__
    self._lock()

  File "/usr/lib/python2.7/site-packages/ironic/conductor/task_manager.py", line 242, in _lock
    reserve_node()

  File "/usr/lib/python2.7/site-packages/retrying.py", line 68, in wrapped_f
    return Retrying(*dargs, **dkw).call(f, *args, **kw)

  File "/usr/lib/python2.7/site-packages/retrying.py", line 229, in call
    raise attempt.get()

  File "/usr/lib/python2.7/site-packages/retrying.py", line 261, in get
    six.reraise(self.value[0], self.value[1], self.value[2])

  File "/usr/lib/python2.7/site-packages/retrying.py", line 217, in call
    attempt = Attempt(fn(*args, **kwargs), attempt_number, False)

  File "/usr/lib/python2.7/site-packages/ironic/conductor/task_manager.py", line 235, in reserve_node
    self.node_id)

  File "/usr/lib/python2.7/site-packages/ironic/objects/node.py", line 228, in reserve
    db_node = cls.dbapi.reserve_node(tag, node_id)

  File "/usr/lib/python2.7/site-packages/ironic/db/sqlalchemy/api.py", line 226, in reserve_node
    host=node['reservation'])

NodeLocked: Node ab1cd1a1-d52b-459e-b03a-2a4753e8692c is locked by host puma01.scl.lab.tlv.redhat.com, please retry after the current operation is completed.
 (HTTP 409). Attempt 1 of 6
Node ab1cd1a1-d52b-459e-b03a-2a4753e8692c has been set to available.
Request returned failure status.


Version-Release number of selected component (if applicable):
openstack-ironic-conductor-4.2.0-2.1.el7ost.noarch
openstack-ironic-inspector-2.2.2-1.el7ost.noarch
python-ironic-inspector-client-1.2.0-5.el7ost.noarch
python-ironicclient-0.8.1-1.el7ost.noarch
openstack-ironic-common-4.2.0-2.1.el7ost.noarch
openstack-ironic-api-4.2.0-2.1.el7ost.noarch


How reproducible:
When retries occur


Steps to Reproduce:
1. Install OSP-d puddle 2015.11.19.2 (final beta)
2. Run introspection of bare metal nodes

--- Additional comment from Red Hat Bugzilla Rules Engine on 2015-11-22 06:45:17 EST ---

Since this issue was entered in bugzilla without a release flag set, rhos-8.0? has been automatically added to ensure that it is properly evaluated for this release.

--- Additional comment from Dmitry Tantsur on 2015-11-23 04:39:36 EST ---

First of all, this bug is not directly related to ironic and/or inspector ("thanks" to bulk start command for being so confusing). I think the root cause is that we've dropped the ironicclient patch for bumping retries number. tripleoclient must do it now instead when trying to update node provision state.

--- Additional comment from Dmitry Tantsur on 2015-11-23 05:46:55 EST ---

Ofer, could you explain why you changed the component? This error does not even involve discoverd. I've set it to the correct component previously.

--- Additional comment from Ofer Blaut on 2015-11-29 05:37:27 EST ---

Hi

We don't have python-rdomanager-oscplugin in OSPd8 so i moved it to ironic 

I will move it to ospd 

Ofer

--- Additional comment from Mike Burns on 2016-02-23 08:18:14 EST ---

marking urgent due to blocker flag

--- Additional comment from John Skeoch on 2016-04-18 05:09:59 EDT ---

User yeylon@redhat.com's account has been closed

Comment 1 Dmitry Tantsur 2016-08-19 08:15:16 UTC
*** Bug 1290066 has been marked as a duplicate of this bug. ***

Comment 2 Dmitry Tantsur 2016-10-05 08:34:56 UTC
This issue was fixed upstream by bumping (again) the retry timeout when creating an ironic client for mistral workflows. I think we can close it.

Comment 4 Raviv Bar-Tal 2016-10-10 14:21:46 UTC
Hi Jay,
I tested this on puma servers and did not reproduce the error your getting.
If you still have this problem please contact my and I will check your environment.

B.R
Raviv

Comment 6 errata-xmlrpc 2016-12-14 15:36:15 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-2948.html


Note You need to log in before you can comment on or make changes to this bug.