Bug 1647005
| Summary: | Ansible Networking - cannot delete BM guests on Overcloud | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Arkady Shtempler <ashtempl> | ||||
| Component: | openstack-tripleo-heat-templates | Assignee: | Jakub Libosvar <jlibosva> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Arkady Shtempler <ashtempl> | ||||
| Severity: | urgent | Docs Contact: | |||||
| Priority: | urgent | ||||||
| Version: | 14.0 (Rocky) | CC: | bfournie, cjanisze, dradez, jamsmith, jlibosva, mburns, michapma, shrjoshi | ||||
| Target Milestone: | beta | Keywords: | Triaged, ZStream | ||||
| Target Release: | 16.0 (Train on RHEL 8.1) | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | openstack-tripleo-heat-templates-11.3.1-0.20191202212740.a4800ba.el8ost | Doc Type: | Known Issue | ||||
| Doc Text: |
Nova-compute ironic driver tries to update BM node while the node is being cleaned up. The cleaning takes approximately five minutes but nova-compute attempts to update the node for approximately two minutes. After timeout, nova-compute stops and puts nova instance into ERROR state.
As a workaround, set the following configuration option for nova-compute service:
[ironic]
api_max_retries = 180
As a result, nova-compute continues to attempt to update BM node longer and eventually succeeds.
|
Story Points: | --- | ||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2020-02-06 14:39:53 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
I troubleshooted the issue and from the logs it seems cleaning just takes too much time and nova-compute service gives up too early. conductor logs deleting started at 15:51:24.789 and node went to "available" state at 15:55:32.908 while nova ironic driver started to update "something" at 15:52:47.540 - so the node was still in transition state between cleaning and available and ironic driver stopped trying at 15:54:51.352 before the node moved to available As a workaround we can bump api_max_retries on ironic client side. Given that we have a reasonable workaround for this issue, I'm lowering severity and removing blocker flag. See also https://bugzilla.redhat.com/show_bug.cgi?id=1563303, specifically the comment here about changing api_max_retries https://bugzilla.redhat.com/show_bug.cgi?id=1563303#c15 We got another bug 1678868 which could be a dup of this one but the environment is different. I'm rising priority and severity of this bug and assigning it to myself. *** Bug 1678868 has been marked as a duplicate of this bug. *** Jakub - it looks like the referenced patch [0] has merged, its not clear if the patch in the Launchpad bug [1] is also needed? Can this bug move to POST or is more work needed? [0] https://review.opendev.org/#/c/636571/ [1] https://review.opendev.org/#/c/638119/ (In reply to Bob Fournier from comment #6) > Jakub - it looks like the referenced patch [0] has merged, its not clear if > the patch in the Launchpad bug [1] is also needed? Can this bug move to > POST or is more work needed? > > [0] https://review.opendev.org/#/c/636571/ > [1] https://review.opendev.org/#/c/638119/ Whoops, there is a wrong patch linked from this BZ. Correct one is in the LP. https://review.opendev.org/#/c/638119/ is what we need. Thanks for pointing that out. I'd say let's move this to post. I think the issue is addressed, if it resurfaces we'll address it again. sry, prematurely set this to on_dev, the patches haven't merged yet. Jakub - should this be backported to stable/train so it will be in OSP-16? (In reply to Bob Fournier from comment #10) > Jakub - should this be backported to stable/train so it will be in OSP-16? I requested a backport here: https://review.opendev.org/#/c/696300/ I'm not sure though, if it's not breaking a backporting policy as it introduces a new config option. We'll see. If it gets accepted, I'll re-schedule the bug to be included in OSP16 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2020:0283 |
Created attachment 1502461 [details] Nova ERROR I'm trying to delete BM guest on Overcloud using: openstack server delete <BM-ID> Note: The above command doesn't prompt any Error or Warning message Then it takes some (a few minutes) and Status field on: openstack server list command output (for this specific BM) is getting into ERROR status. There were no ERRORs in Neutron server.log (3 controllers), the only ERROR i saw was in nova-compute.log (see attachment)