Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1284247

Summary: Retry timeout when setting node state to available
Product: Red Hat OpenStack Reporter: Udi Kalifon <ukalifon>
Component: python-tripleoclientAssignee: Brad P. Crochet <brad>
Status: CLOSED WORKSFORME QA Contact: Shai Revivo <srevivo>
Severity: high Docs Contact:
Priority: urgent    
Version: 8.0 (Liberty)CC: apevec, arubin, athomas, brad, dtantsur, hbrock, jason.dobies, jcoufal, jslagle, lhh, mburns, oblaut, racedoro, rhel-osp-director-maint, srevivo, tvignaud, ukalifon
Target Milestone: gaKeywords: Triaged
Target Release: 9.0 (Mitaka)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1343744 (view as bug list) Environment:
Last Closed: 2016-06-28 13:48:32 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1343744    

Description Udi Kalifon 2015-11-22 11:45:15 UTC
Description of problem:
When introspection retries it throws an exception and a traceback:

[stack@puma01 ~]$ openstack baremetal introspection bulk start
Setting available nodes to manageable...
Starting introspection of node: f1651455-d1a3-4716-9818-303c55a90e89
Starting introspection of node: 404100ea-df81-448d-ab75-7cdaa3f45373
Starting introspection of node: f2d7b8b0-cbd7-4466-9aac-3d5d07af5559
Starting introspection of node: f1c24a81-34c1-4646-bcf6-92c82a8c591e
Starting introspection of node: ab1cd1a1-d52b-459e-b03a-2a4753e8692c
Starting introspection of node: 1bbc969d-479d-4f9b-becf-713be9085289
Waiting for introspection to finish...
Introspection for UUID f1651455-d1a3-4716-9818-303c55a90e89 finished successfully.
Introspection for UUID 404100ea-df81-448d-ab75-7cdaa3f45373 finished successfully.
Introspection for UUID f2d7b8b0-cbd7-4466-9aac-3d5d07af5559 finished successfully.
Introspection for UUID f1c24a81-34c1-4646-bcf6-92c82a8c591e finished successfully.
Introspection for UUID ab1cd1a1-d52b-459e-b03a-2a4753e8692c finished successfully.
Introspection for UUID 1bbc969d-479d-4f9b-becf-713be9085289 finished successfully.
Setting manageable nodes to available...
Node f1651455-d1a3-4716-9818-303c55a90e89 has been set to available.
Node 404100ea-df81-448d-ab75-7cdaa3f45373 has been set to available.
Node f2d7b8b0-cbd7-4466-9aac-3d5d07af5559 has been set to available.
Node f1c24a81-34c1-4646-bcf6-92c82a8c591e has been set to available.
Request returned failure status.
Error contacting Ironic server: Node ab1cd1a1-d52b-459e-b03a-2a4753e8692c is locked by host puma01.scl.lab.tlv.redhat.com, please retry after the current operation is completed.
Traceback (most recent call last):

  File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 142, in inner
    return func(*args, **kwargs)

  File "/usr/lib/python2.7/site-packages/ironic/conductor/manager.py", line 1149, in do_provisioning_action
    % action) as task:

  File "/usr/lib/python2.7/site-packages/ironic/conductor/task_manager.py", line 152, in acquire
    driver_name=driver_name, purpose=purpose)

  File "/usr/lib/python2.7/site-packages/ironic/conductor/task_manager.py", line 221, in __init__
    self.release_resources()

  File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 195, in __exit__
    six.reraise(self.type_, self.value, self.tb)

  File "/usr/lib/python2.7/site-packages/ironic/conductor/task_manager.py", line 203, in __init__
    self._lock()

  File "/usr/lib/python2.7/site-packages/ironic/conductor/task_manager.py", line 242, in _lock
    reserve_node()

  File "/usr/lib/python2.7/site-packages/retrying.py", line 68, in wrapped_f
    return Retrying(*dargs, **dkw).call(f, *args, **kw)

  File "/usr/lib/python2.7/site-packages/retrying.py", line 229, in call
    raise attempt.get()

  File "/usr/lib/python2.7/site-packages/retrying.py", line 261, in get
    six.reraise(self.value[0], self.value[1], self.value[2])

  File "/usr/lib/python2.7/site-packages/retrying.py", line 217, in call
    attempt = Attempt(fn(*args, **kwargs), attempt_number, False)

  File "/usr/lib/python2.7/site-packages/ironic/conductor/task_manager.py", line 235, in reserve_node
    self.node_id)

  File "/usr/lib/python2.7/site-packages/ironic/objects/node.py", line 228, in reserve
    db_node = cls.dbapi.reserve_node(tag, node_id)

  File "/usr/lib/python2.7/site-packages/ironic/db/sqlalchemy/api.py", line 226, in reserve_node
    host=node['reservation'])

NodeLocked: Node ab1cd1a1-d52b-459e-b03a-2a4753e8692c is locked by host puma01.scl.lab.tlv.redhat.com, please retry after the current operation is completed.
 (HTTP 409). Attempt 1 of 6
Node ab1cd1a1-d52b-459e-b03a-2a4753e8692c has been set to available.
Request returned failure status.


Version-Release number of selected component (if applicable):
openstack-ironic-conductor-4.2.0-2.1.el7ost.noarch
openstack-ironic-inspector-2.2.2-1.el7ost.noarch
python-ironic-inspector-client-1.2.0-5.el7ost.noarch
python-ironicclient-0.8.1-1.el7ost.noarch
openstack-ironic-common-4.2.0-2.1.el7ost.noarch
openstack-ironic-api-4.2.0-2.1.el7ost.noarch


How reproducible:
When retries occur


Steps to Reproduce:
1. Install OSP-d puddle 2015.11.19.2 (final beta)
2. Run introspection of bare metal nodes

Comment 2 Dmitry Tantsur 2015-11-23 09:39:36 UTC
First of all, this bug is not directly related to ironic and/or inspector ("thanks" to bulk start command for being so confusing). I think the root cause is that we've dropped the ironicclient patch for bumping retries number. tripleoclient must do it now instead when trying to update node provision state.

Comment 3 Dmitry Tantsur 2015-11-23 10:46:55 UTC
Ofer, could you explain why you changed the component? This error does not even involve discoverd. I've set it to the correct component previously.

Comment 4 Ofer Blaut 2015-11-29 10:37:27 UTC
Hi

We don't have python-rdomanager-oscplugin in OSPd8 so i moved it to ironic 

I will move it to ospd 

Ofer

Comment 8 Jaromir Coufal 2016-06-07 19:52:38 UTC
Doc_text only if the issue is in unified CLI and there is workaround for using Ironic CLI.

Comment 19 Brad P. Crochet 2016-06-16 19:07:45 UTC
(In reply to Dmitry Tantsur from comment #2)
> First of all, this bug is not directly related to ironic and/or inspector
> ("thanks" to bulk start command for being so confusing). I think the root
> cause is that we've dropped the ironicclient patch for bumping retries
> number. tripleoclient must do it now instead when trying to update node
> provision state.

Can you point me to the patch you speak of?

Comment 20 Dmitry Tantsur 2016-06-17 07:31:28 UTC
It will be tricky to find it now, you can probably look at OSPd7 patches. But it was essentially bumping the default values of max_retries argument to Ironic client __init__.

Comment 21 Brad P. Crochet 2016-06-23 11:34:42 UTC
Can you reproduce this and provide the output with the --debug flag turned on? I have not been able to reproduce this on my end.

Comment 22 Udi Kalifon 2016-06-28 13:36:10 UTC
This does not reproduce with the latest 9.0 (puddle from Jun 27). I tried to fail introspection by powering off the nodes, and also by providing a wrong ipmi address. The node that fails goes to maintenance mode and the process behaved as expected.