Bug 1343744

Summary: Retry timeout when setting node state to available
Product: Red Hat OpenStack Reporter: Jay Dobies <jason.dobies>
Component: openstack-tripleo-commonAssignee: Derek Higgins <derekh>
Status: CLOSED ERRATA QA Contact: Raviv Bar-Tal <rbartal>
Severity: high Docs Contact:
Priority: urgent    
Version: 8.0 (Liberty)CC: apevec, athomas, brad, dtantsur, hbrock, jason.dobies, jcoufal, jduncan, jslagle, lhh, mburns, oblaut, racedoro, rbartal, rhel-osp-director-maint, sclewis, slinaber, srevivo, tvignaud, ukalifon
Target Milestone: rcKeywords: Triaged
Target Release: 10.0 (Newton)Flags: rbartal: needinfo? (jason.dobies)
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1284247 Environment:
Last Closed: 2016-12-14 15:36:15 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On: 1284247    
Bug Blocks:    

Description Jay Dobies 2016-06-07 19:43:29 UTC
+++ This bug was initially created as a clone of Bug #1284247 +++

Description of problem:
When introspection retries it throws an exception and a traceback:

[stack@puma01 ~]$ openstack baremetal introspection bulk start
Setting available nodes to manageable...
Starting introspection of node: f1651455-d1a3-4716-9818-303c55a90e89
Starting introspection of node: 404100ea-df81-448d-ab75-7cdaa3f45373
Starting introspection of node: f2d7b8b0-cbd7-4466-9aac-3d5d07af5559
Starting introspection of node: f1c24a81-34c1-4646-bcf6-92c82a8c591e
Starting introspection of node: ab1cd1a1-d52b-459e-b03a-2a4753e8692c
Starting introspection of node: 1bbc969d-479d-4f9b-becf-713be9085289
Waiting for introspection to finish...
Introspection for UUID f1651455-d1a3-4716-9818-303c55a90e89 finished successfully.
Introspection for UUID 404100ea-df81-448d-ab75-7cdaa3f45373 finished successfully.
Introspection for UUID f2d7b8b0-cbd7-4466-9aac-3d5d07af5559 finished successfully.
Introspection for UUID f1c24a81-34c1-4646-bcf6-92c82a8c591e finished successfully.
Introspection for UUID ab1cd1a1-d52b-459e-b03a-2a4753e8692c finished successfully.
Introspection for UUID 1bbc969d-479d-4f9b-becf-713be9085289 finished successfully.
Setting manageable nodes to available...
Node f1651455-d1a3-4716-9818-303c55a90e89 has been set to available.
Node 404100ea-df81-448d-ab75-7cdaa3f45373 has been set to available.
Node f2d7b8b0-cbd7-4466-9aac-3d5d07af5559 has been set to available.
Node f1c24a81-34c1-4646-bcf6-92c82a8c591e has been set to available.
Request returned failure status.
Error contacting Ironic server: Node ab1cd1a1-d52b-459e-b03a-2a4753e8692c is locked by host puma01.scl.lab.tlv.redhat.com, please retry after the current operation is completed.
Traceback (most recent call last):

  File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 142, in inner
    return func(*args, **kwargs)

  File "/usr/lib/python2.7/site-packages/ironic/conductor/manager.py", line 1149, in do_provisioning_action
    % action) as task:

  File "/usr/lib/python2.7/site-packages/ironic/conductor/task_manager.py", line 152, in acquire
    driver_name=driver_name, purpose=purpose)

  File "/usr/lib/python2.7/site-packages/ironic/conductor/task_manager.py", line 221, in __init__
    self.release_resources()

  File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 195, in __exit__
    six.reraise(self.type_, self.value, self.tb)

  File "/usr/lib/python2.7/site-packages/ironic/conductor/task_manager.py", line 203, in __init__
    self._lock()

  File "/usr/lib/python2.7/site-packages/ironic/conductor/task_manager.py", line 242, in _lock
    reserve_node()

  File "/usr/lib/python2.7/site-packages/retrying.py", line 68, in wrapped_f
    return Retrying(*dargs, **dkw).call(f, *args, **kw)

  File "/usr/lib/python2.7/site-packages/retrying.py", line 229, in call
    raise attempt.get()

  File "/usr/lib/python2.7/site-packages/retrying.py", line 261, in get
    six.reraise(self.value[0], self.value[1], self.value[2])

  File "/usr/lib/python2.7/site-packages/retrying.py", line 217, in call
    attempt = Attempt(fn(*args, **kwargs), attempt_number, False)

  File "/usr/lib/python2.7/site-packages/ironic/conductor/task_manager.py", line 235, in reserve_node
    self.node_id)

  File "/usr/lib/python2.7/site-packages/ironic/objects/node.py", line 228, in reserve
    db_node = cls.dbapi.reserve_node(tag, node_id)

  File "/usr/lib/python2.7/site-packages/ironic/db/sqlalchemy/api.py", line 226, in reserve_node
    host=node['reservation'])

NodeLocked: Node ab1cd1a1-d52b-459e-b03a-2a4753e8692c is locked by host puma01.scl.lab.tlv.redhat.com, please retry after the current operation is completed.
 (HTTP 409). Attempt 1 of 6
Node ab1cd1a1-d52b-459e-b03a-2a4753e8692c has been set to available.
Request returned failure status.


Version-Release number of selected component (if applicable):
openstack-ironic-conductor-4.2.0-2.1.el7ost.noarch
openstack-ironic-inspector-2.2.2-1.el7ost.noarch
python-ironic-inspector-client-1.2.0-5.el7ost.noarch
python-ironicclient-0.8.1-1.el7ost.noarch
openstack-ironic-common-4.2.0-2.1.el7ost.noarch
openstack-ironic-api-4.2.0-2.1.el7ost.noarch


How reproducible:
When retries occur


Steps to Reproduce:
1. Install OSP-d puddle 2015.11.19.2 (final beta)
2. Run introspection of bare metal nodes

--- Additional comment from Red Hat Bugzilla Rules Engine on 2015-11-22 06:45:17 EST ---

Since this issue was entered in bugzilla without a release flag set, rhos-8.0? has been automatically added to ensure that it is properly evaluated for this release.

--- Additional comment from Dmitry Tantsur on 2015-11-23 04:39:36 EST ---

First of all, this bug is not directly related to ironic and/or inspector ("thanks" to bulk start command for being so confusing). I think the root cause is that we've dropped the ironicclient patch for bumping retries number. tripleoclient must do it now instead when trying to update node provision state.

--- Additional comment from Dmitry Tantsur on 2015-11-23 05:46:55 EST ---

Ofer, could you explain why you changed the component? This error does not even involve discoverd. I've set it to the correct component previously.

--- Additional comment from Ofer Blaut on 2015-11-29 05:37:27 EST ---

Hi

We don't have python-rdomanager-oscplugin in OSPd8 so i moved it to ironic 

I will move it to ospd 

Ofer

--- Additional comment from Mike Burns on 2016-02-23 08:18:14 EST ---

marking urgent due to blocker flag

--- Additional comment from John Skeoch on 2016-04-18 05:09:59 EDT ---

User yeylon@redhat.com's account has been closed

Comment 1 Dmitry Tantsur 2016-08-19 08:15:16 UTC
*** Bug 1290066 has been marked as a duplicate of this bug. ***

Comment 2 Dmitry Tantsur 2016-10-05 08:34:56 UTC
This issue was fixed upstream by bumping (again) the retry timeout when creating an ironic client for mistral workflows. I think we can close it.

Comment 4 Raviv Bar-Tal 2016-10-10 14:21:46 UTC
Hi Jay,
I tested this on puma servers and did not reproduce the error your getting.
If you still have this problem please contact my and I will check your environment.

B.R
Raviv

Comment 6 errata-xmlrpc 2016-12-14 15:36:15 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-2948.html