Description of problem: When we undeploy baremetal node, ironic always tries to power off node via power driver, and node will get error provisioning status if it fails to power off. However, this error also happens even if the node has maintenance mode enabled, which would mean that the node is not functional now. IMO, ironic should skip powering off the node if the node has maintenance enabled, so that we can avoid the expected failure caused by failed node. How reproducible: Always Steps to Reproduce: 1. Deploy overcloud 2. Disalbe IPMI interface about one overcloud node 3. Undeploy the baremetal node by 'openstack baremetal node undeploy <id>' Actual results: Undeploy fails and node goes into error provisioning status Expected results: Undeploy completes without error Additional info:
Hi, Generally, we try to avoid provisioning actions on nodes that experience management problems. A lot of operations inside ironic rely on being able to talk to the node (e.g. with cleaning enabled, you would enter cleaning after tear down, and that would also fail). It is expected that you recover from maintenance first, then proceed with any complex actions. If the node is not recoverable, you can just delete it completely with `openstack baremetal node delete`.
Hi Dmitry, Thanks for your clarifying. > It is expected that you recover from maintenance first, then proceed with any complex actions. I understood. May I ask one more question regarding this behaviour ? The actual failure was observed when manipulating overcloud to remove overcloud nodes. In this case, "openstack overcloud undeploy" was executed directly to get rid of a failed node before removing the node from nova, however IIUC when the baremetal instance is deleted in nova (which can be happen when updating heat stack with new blacklisting for example), nova will request the same unprovision to ironic node then get error if the node has its IPMI not working. My concern is that current installation doc does not mention that we expect ipmi interface devices working when removing nodes from overcloud, and we get failure in nova instance deletion (which ends up in stack update failure in TripleO) if the ipmi of the node to be removed is not working. Do you think my above concern is valid ? If the concern is valid one, I'll submit a bug against installation doc to have some notice about this expected error. Thank you, Takashi
> My concern is that current installation doc does not mention that we expect ipmi interface devices working when removing nodes from overcloud It's a good point. We need to make clear that a different procedure must be followed for nodes without management access. I think we have this procedure somewhere, maybe we should direct the customers the correct way? Feel free to reopen and target to documentation.