Bug 1712561 - Switching a node from manage to provide kicks of automated_clean but will not PXE boot if node is in maintenance state
Summary: Switching a node from manage to provide kicks of automated_clean but will not...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 16.0 (Train)
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: beta
: 16.0 (Train on RHEL 8.1)
Assignee: Dmitry Tantsur
QA Contact: mlammon
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-05-21 19:18 UTC by Andreas Karis
Modified: 2020-02-06 14:41 UTC (History)
7 users (show)

Fixed In Version: openstack-tripleo-heat-templates-11.3.1-0.20191126041653.414d4d9.el8ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-02-06 14:40:53 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack Storyboard 1563644 0 None None None 2019-05-22 08:27:10 UTC
OpenStack gerrit 366828 0 'None' master: MERGED ironic: Add an option to abort cleaning and deployment if node is in maintenance (I9f3ee44f39c448eb2609c5989acd36e7da844... 2019-12-06 19:00:51 UTC
OpenStack gerrit 683970 0 'None' master: MERGED puppet-ironic: Support configuring [conductor]allow_provisioning_in_maintenance (I2c24180025aaaa9526807faf4913850d2f0f07... 2019-12-06 19:00:57 UTC
OpenStack gerrit 683975 0 'None' master: MERGED tripleo-heat-templates: Ironic: disallow deployment and cleaning in maintenance mode (I3b3f6037970e741f93549878e4e36d362... 2019-12-06 19:01:08 UTC
Red Hat Knowledge Base (Solution) 4192271 0 None None Switching a node from manage to provide kicks of automated_clean but will not PXE boot if node is in maintenance state ... 2019-06-03 13:25:31 UTC
Red Hat Product Errata RHEA-2020:0283 0 None None None 2020-02-06 14:41:32 UTC

Description Andreas Karis 2019-05-21 19:18:05 UTC
Description of problem:
Switching a node from manage to provide kicks of automated_clean but will not PXE boot if node is in maintenance state

Version-Release number of selected component (if applicable):
most recent OSP 13

How reproducible:
~~~
ironic node-set-maintenance compute1 true
ironic node-set-provision-state compute1 manage
ironic node-set-provision-state compute1 provide
~~~

~~~
(undercloud) [stack@director ~]$ sudo grep clean /etc/ironic -R | grep -v ':#'
/etc/ironic/ironic.conf:automated_clean=true
(...)
~~~

Steps to Reproduce:
1.
2.
3.

Actual results:
compute1 will go to clean, will be booted by ironic but fail on iPXE boot

Expected results:
compute1 should refuse to go to clean, should not boot, the user should be presented with an error message of some sort

Additional info:

Comment 1 Dmitry Tantsur 2019-05-22 08:27:10 UTC
It was discussed as part of https://storyboard.openstack.org/#!/story/1563644 and the community wanted to keep the current behavior. We can try having this conversation again, but of course I cannot guarantee different results.

Comment 2 Bob Fournier 2019-06-03 12:56:08 UTC
Per Comment 1, this is as expected and the accepted upstream behavior.  Closing.

Comment 3 Andreas Karis 2019-06-03 13:12:34 UTC
Hi,

Can we keep this open and re-discuss with upstream? The current situation is *very* misleading for administrators. At least a warning message would be the minimum.

The logical behavior here would be that: automated clean does *not* kick in when the node is in maintenance state and some error message is thrown, etc. With the current behavior, the node even PXE boots and after some amount of time goes into clean_failed but with no obvious reason for the administrator.

- Andreas

Comment 4 Bob Fournier 2019-06-03 13:23:51 UTC
OK, let's keep this open and we'll discuss this with upstream/investigate a warning message at minimum.

Comment 5 Andreas Karis 2019-06-03 13:26:56 UTC
I created a KCS. Upon inspection of the bugreport and change reviews, this doesn't look trivial to fix. If there's nothing else we can do, then I'm fine with the knowledge base solution, only. However, I perceive this is a (minor) issue, so if we can fix it by adding a warning message or something or making it easier for admins to understand what's going on, that would be appreciated!

Comment 6 Bob Fournier 2019-07-26 16:29:11 UTC
Andreas - I think our best bet is the KCS article as you indicated in Comment 5.  The state machine is designed to function like this and adding a warning message isn't possible since it would require querying the node before the action was taken.

Comment 7 Dmitry Tantsur 2019-07-29 09:08:41 UTC
As the last resort I'm going to propose a patch with an option to not start cleaning in maintenance. The default behavior will not change (as desired upstream), but we'll be able to change it for TripleO. If this approach is rejected, I'll have not options other than to close the bug.

Comment 10 Bob Fournier 2019-09-21 16:51:29 UTC
Fix has merged to master.

Comment 11 Dmitry Tantsur 2019-09-23 14:17:01 UTC
TripleO patch proposed

Comment 12 Bob Fournier 2019-09-26 13:26:29 UTC
As there are multiple fixes here including an addition configuration parameter (and the bz severity is low) marking this for OSP-16.  Prior to 16 we will have to rely on the KCS article that Andreas created.

Comment 16 errata-xmlrpc 2020-02-06 14:40:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2020:0283


Note You need to log in before you can comment on or make changes to this bug.