Bug 1827537 - Can't remove/replace baremetal node when its IPMI interface is not available
Summary: Can't remove/replace baremetal node when its IPMI interface is not available
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: documentation
Version: 16.2 (Train)
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: ---
Assignee: Irina
QA Contact: Paras Babbar
URL:
Whiteboard:
Depends On:
Blocks: 2023628
TreeView+ depends on / blocked
 
Reported: 2020-04-24 06:45 UTC by Takashi Kajinami
Modified: 2023-09-07 22:56 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-06-07 14:01:17 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker OSP-13459 0 None None None 2022-03-09 10:51:25 UTC

Description Takashi Kajinami 2020-04-24 06:45:49 UTC
Description of problem:

This issue was originally discussed in bz 1814123 .

According to the actual failure happening, IPMI interface of the broken baremetal node becomes unavailable.
While ironic stops polling power status for that node with its IPMI interface down, it still requires to
access IPMI interface to power off the node during deploy process.

This causes failure when we remove or replace that node, because deleting nova instance fails
during stack update.
To avoid the error, we should remove baremetal node by 
 $ openstack baremetal node delete <baremetal node id>
so that nova will skip undeploying the node.

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1. deploy overcloud
2. disable IPMI interface of one overcloud nodes
3. Remove or Replace that node according to our product documentation[1]

[1] Compute: https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/html/director_installation_and_usage/scaling-overcloud-nodes#removing-compute-nodes
    Controller: https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/html/director_installation_and_usage/replacing-controller-nodes

Actual results:
stack becomes UPDATE_FAILED status, because of error while deleting the nova instance

Expected results:
stack becomes UPDATE_COMPLETE status without any failures

Additional info:

Comment 10 Steve Baker 2022-03-14 20:23:56 UTC
https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.1/html-single/director_installation_and_usage/index#removing-compute-nodes
16.3.8.i
We've decided this is correct, and we really don't want to recommend "openstack baremetal node delete" in general. If overcloud node delete fails in maintenance mode there could be any number of root causes so no general advice would apply.

However, 16.3.8.i should recommend to wait for 2 minutes after setting maintenance mode.

[2]https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.1/html-single/director_installation_and_usage/index#replacing-a-controller-node
17.4.4, (no change required, 2 minutes will elapse just reading the docs)


Note You need to log in before you can comment on or make changes to this bug.