Created attachment 1502461 [details] Nova ERROR I'm trying to delete BM guest on Overcloud using: openstack server delete <BM-ID> Note: The above command doesn't prompt any Error or Warning message Then it takes some (a few minutes) and Status field on: openstack server list command output (for this specific BM) is getting into ERROR status. There were no ERRORs in Neutron server.log (3 controllers), the only ERROR i saw was in nova-compute.log (see attachment)
I troubleshooted the issue and from the logs it seems cleaning just takes too much time and nova-compute service gives up too early. conductor logs deleting started at 15:51:24.789 and node went to "available" state at 15:55:32.908 while nova ironic driver started to update "something" at 15:52:47.540 - so the node was still in transition state between cleaning and available and ironic driver stopped trying at 15:54:51.352 before the node moved to available As a workaround we can bump api_max_retries on ironic client side. Given that we have a reasonable workaround for this issue, I'm lowering severity and removing blocker flag.
See also https://bugzilla.redhat.com/show_bug.cgi?id=1563303, specifically the comment here about changing api_max_retries https://bugzilla.redhat.com/show_bug.cgi?id=1563303#c15
We got another bug 1678868 which could be a dup of this one but the environment is different. I'm rising priority and severity of this bug and assigning it to myself.
*** Bug 1678868 has been marked as a duplicate of this bug. ***
Jakub - it looks like the referenced patch [0] has merged, its not clear if the patch in the Launchpad bug [1] is also needed? Can this bug move to POST or is more work needed? [0] https://review.opendev.org/#/c/636571/ [1] https://review.opendev.org/#/c/638119/
(In reply to Bob Fournier from comment #6) > Jakub - it looks like the referenced patch [0] has merged, its not clear if > the patch in the Launchpad bug [1] is also needed? Can this bug move to > POST or is more work needed? > > [0] https://review.opendev.org/#/c/636571/ > [1] https://review.opendev.org/#/c/638119/ Whoops, there is a wrong patch linked from this BZ. Correct one is in the LP. https://review.opendev.org/#/c/638119/ is what we need. Thanks for pointing that out.
I'd say let's move this to post. I think the issue is addressed, if it resurfaces we'll address it again.
sry, prematurely set this to on_dev, the patches haven't merged yet.
Jakub - should this be backported to stable/train so it will be in OSP-16?
(In reply to Bob Fournier from comment #10) > Jakub - should this be backported to stable/train so it will be in OSP-16? I requested a backport here: https://review.opendev.org/#/c/696300/ I'm not sure though, if it's not breaking a backporting policy as it introduces a new config option. We'll see. If it gets accepted, I'll re-schedule the bug to be included in OSP16
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2020:0283