Description of problem: Switching a node from manage to provide kicks of automated_clean but will not PXE boot if node is in maintenance state Version-Release number of selected component (if applicable): most recent OSP 13 How reproducible: ~~~ ironic node-set-maintenance compute1 true ironic node-set-provision-state compute1 manage ironic node-set-provision-state compute1 provide ~~~ ~~~ (undercloud) [stack@director ~]$ sudo grep clean /etc/ironic -R | grep -v ':#' /etc/ironic/ironic.conf:automated_clean=true (...) ~~~ Steps to Reproduce: 1. 2. 3. Actual results: compute1 will go to clean, will be booted by ironic but fail on iPXE boot Expected results: compute1 should refuse to go to clean, should not boot, the user should be presented with an error message of some sort Additional info:
It was discussed as part of https://storyboard.openstack.org/#!/story/1563644 and the community wanted to keep the current behavior. We can try having this conversation again, but of course I cannot guarantee different results.
Per Comment 1, this is as expected and the accepted upstream behavior. Closing.
Hi, Can we keep this open and re-discuss with upstream? The current situation is *very* misleading for administrators. At least a warning message would be the minimum. The logical behavior here would be that: automated clean does *not* kick in when the node is in maintenance state and some error message is thrown, etc. With the current behavior, the node even PXE boots and after some amount of time goes into clean_failed but with no obvious reason for the administrator. - Andreas
OK, let's keep this open and we'll discuss this with upstream/investigate a warning message at minimum.
I created a KCS. Upon inspection of the bugreport and change reviews, this doesn't look trivial to fix. If there's nothing else we can do, then I'm fine with the knowledge base solution, only. However, I perceive this is a (minor) issue, so if we can fix it by adding a warning message or something or making it easier for admins to understand what's going on, that would be appreciated!
Andreas - I think our best bet is the KCS article as you indicated in Comment 5. The state machine is designed to function like this and adding a warning message isn't possible since it would require querying the node before the action was taken.
As the last resort I'm going to propose a patch with an option to not start cleaning in maintenance. The default behavior will not change (as desired upstream), but we'll be able to change it for TripleO. If this approach is rejected, I'll have not options other than to close the bug.
Fix has merged to master.
TripleO patch proposed
As there are multiple fixes here including an addition configuration parameter (and the bz severity is low) marking this for OSP-16. Prior to 16 we will have to rely on the KCS article that Andreas created.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2020:0283