Bug 1212134
Summary: | 'instack-ironic-deployment --discover-nodes' is failing with 'node locked by host' error | ||
---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Ronelle Landy <rlandy> |
Component: | python-ironicclient | Assignee: | Dmitry Tantsur <dtantsur> |
Status: | CLOSED ERRATA | QA Contact: | Toure Dunnon <tdunnon> |
Severity: | unspecified | Docs Contact: | |
Priority: | medium | ||
Version: | 7.0 (Kilo) | CC: | apevec, dsneddon, dtantsur, jliberma, kobi.ginon, lhh, mlopes, mtanino, tsekiyam, ukalifon, whayutin |
Target Milestone: | ga | Keywords: | Automation, Triaged |
Target Release: | 7.0 (Kilo) | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: |
Previously, certain operations in OpenStack Bare Metal Provisioning (Ironic) would fail to run while the node was in a `locked` state.
This update implements a `retry` function in the Ironic client. As a result, certain operations take longer to run, but do not fail due to `node locked` errors.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2015-08-05 13:22:45 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Ronelle Landy
2015-04-15 15:42:11 UTC
I believe the proper fix is for ironicclient to retry on 409, so I'll work upstream on it. Then this whole class of problems will be gone. Upstream patch for retrying: https://review.openstack.org/#/c/174359 Upstream patch landed in master, pending stable/kilo: https://review.openstack.org/#/c/175301/ ironicclient in delorean rebased on the latter patch, so it should be fixed now. https://bugzilla.redhat.com/show_bug.cgi?id=1233452 ... shows similar errors during overcloud deploy So I wonder if we need more longer retry time... Anyway, let's continue in that report, they're slightly different. (In reply to Dmitry Tantsur from comment #4) > I believe the proper fix is for ironicclient to retry on 409, so I'll work > upstream on it. Then this whole class of problems will be gone. The retry is not always the solution. See bug https://bugzilla.redhat.com/show_bug.cgi?id=1241424 Yeah, but that's another bug Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2015:1548 So Dmitry Tantsur what is the solution for this issue at the end of the day. We are suffering from this issue, In most cases the introspection pass, but i can see lately after we upgraded to linux 7.3 many more cases of failures. What i identify is that when 1 bare metal finishes introspection and the the second one is more or less in the same stage. I'm starting to see the warning Message, the second bare metal starts a shutdown but seems like it is delaying during the shutdown although introspection Finishes without waiting for the second baremetal to shutdown completely. The user starts the overcloud deployment since he got back the prompt and he is familiar with those 'harmless warning' but deployment will fail. So is there a solution for the reported issue? is my description gives more light to this issue or is it a completely differen case ? regards Hello! It's hard to judge at first glance, but I suspect your issue might be a slightly different one. Which version of OSP are you using? Could you please paste the output of 'ironic node-list' after the (failed) introspection? (In reply to Dmitry Tantsur from comment #19) > Hello! > > It's hard to judge at first glance, but I suspect your issue might be a > slightly different one. Which version of OSP are you using? Could you please > paste the output of 'ironic node-list' after the (failed) introspection? Hi Dmitry - thanks for the prompt reply i'm using ospd8 python-ironic-inspector-client-1.2.0-6.el7ost.noarch openstack-ironic-inspector-2.2.6-1.el7ost.noarch openstack-ironic-conductor-4.2.5-3.el7ost.noarch openstack-ironic-common-4.2.5-3.el7ost.noarch python-ironicclient-0.8.1-1.el7ost.noarch openstack-ironic-api-4.2.5-3.el7ost.noarch Below is the output on screen - i have 1 controller and 1 compute in this test You will notice that only one of them changed to state to available. and then a warning when you try to start deployment. As i mentioned it i could see on screen that the second one started shutdown But did not managed to finish it while the introspection Finished and did not wait for it to finish. 19:53:37 Node 49ebe10b-e8c4-4cf2-835a-d8147181d6fd power state is in transition. Waiting up to 120 seconds for it to complete. 19:53:47 performing introspection. 20:00:34 Request returned failure status. 20:00:34 Error contacting Ironic server: Node 49ebe10b-e8c4-4cf2-835a-d8147181d6fd is locked by host localhost.localdomain, please retry after the current operation is completed. (HTTP 409). Attempt 1 of 6 20:00:38 Request returned failure status. 20:00:38 Error contacting Ironic server: Node 49ebe10b-e8c4-4cf2-835a-d8147181d6fd is locked by host localhost.localdomain, please retry after the current operation is completed. (HTTP 409). Attempt 2 of 6 20:00:42 Request returned failure status. 20:00:42 Error contacting Ironic server: Node 49ebe10b-e8c4-4cf2-835a-d8147181d6fd is locked by host localhost.localdomain, please retry after the current operation is completed. (HTTP 409). Attempt 3 of 6 20:00:46 Request returned failure status. 20:00:46 Error contacting Ironic server: Node 49ebe10b-e8c4-4cf2-835a-d8147181d6fd is locked by host localhost.localdomain, please retry after the current operation is completed. (HTTP 409). Attempt 4 of 6 20:00:50 Request returned failure status. 20:00:50 Error contacting Ironic server: Node 49ebe10b-e8c4-4cf2-835a-d8147181d6fd is locked by host localhost.localdomain, please retry after the current operation is completed. (HTTP 409). Attempt 5 of 6 20:00:54 Request returned failure status. 20:00:54 Error contacting Ironic server: Node 49ebe10b-e8c4-4cf2-835a-d8147181d6fd is locked by host localhost.localdomain, please retry after the current operation is completed. (HTTP 409). Attempt 6 of 6 20:00:54 Node 49ebe10b-e8c4-4cf2-835a-d8147181d6fd is locked by host localhost.localdomain, please retry after the current operation is completed. (HTTP 409) 20:00:54 Setting nodes for introspection to manageable... 20:00:54 Starting introspection of node: 301e92a5-002d-4f3e-a614-af67f4a5dc4c 20:00:54 Starting introspection of node: 49ebe10b-e8c4-4cf2-835a-d8147181d6fd 20:00:54 Waiting for introspection to finish... 20:00:54 Introspection for UUID 301e92a5-002d-4f3e-a614-af67f4a5dc4c finished successfully. 20:00:54 Introspection for UUID 49ebe10b-e8c4-4cf2-835a-d8147181d6fd finished successfully. 20:00:54 Setting manageable nodes to available... 20:00:54 Node 301e92a5-002d-4f3e-a614-af67f4a5dc4c has been set to available. 20:00:57 Ironic Node introspection succeeded 20:00:57 performing overcloud deployment. 20:01:10 Error: only 0 of 1 requested ironic nodes are tagged to profile compute (for flavor compute) 20:01:10 Recommendation: tag more nodes using ironic node-update <NODE ID> replace properties/capabilities=profile:compute,boot_option:local 20:01:10 Configuration has 1 errors, fix them before proceeding. Ignoring these errors is likely to lead to a failed deploy. Hi Dmitry Please read comment 20 first But i would like to add that i found the code that you suggested https://review.openstack.org/#/c/175301/ Already existing in my ospd8 version i m just not sure if i want to enlarge the number of retries DEFAULT_MAX_RETRIES how do i do it from ironic.conf file ? which filed or is there another way to do it ? regards |