1712561 – Switching a node from manage to provide kicks of automated_clean but will not PXE boot if node is in maintenance state

Bug 1712561 - Switching a node from manage to provide kicks of automated_clean but will not PXE boot if node is in maintenance state

Summary: Switching a node from manage to provide kicks of automated_clean but will not...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	openstack-tripleo-heat-templates
Sub Component:
Version:	16.0 (Train)
Hardware:	Unspecified
OS:	Unspecified
Priority:	low
Severity:	low
Target Milestone:	beta
Target Release:	16.0 (Train on RHEL 8.1)
Assignee:	Dmitry Tantsur
QA Contact:	mlammon
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-05-21 19:18 UTC by Andreas Karis
Modified:	2020-02-06 14:41 UTC (History)
CC List:	7 users (show)
Fixed In Version:	openstack-tripleo-heat-templates-11.3.1-0.20191126041653.414d4d9.el8ost
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-02-06 14:40:53 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
OpenStack Storyboard	1563644	None	None	None	2019-05-22 08:27:10 UTC
OpenStack gerrit	366828	'None'	master: MERGED	ironic: Add an option to abort cleaning and deployment if node is in maintenance (I9f3ee44f39c448eb2609c5989acd36e7da844...	2019-12-06 19:00:51 UTC
OpenStack gerrit	683970	'None'	master: MERGED	puppet-ironic: Support configuring [conductor]allow_provisioning_in_maintenance (I2c24180025aaaa9526807faf4913850d2f0f07...	2019-12-06 19:00:57 UTC
OpenStack gerrit	683975	'None'	master: MERGED	tripleo-heat-templates: Ironic: disallow deployment and cleaning in maintenance mode (I3b3f6037970e741f93549878e4e36d362...	2019-12-06 19:01:08 UTC
Red Hat Knowledge Base (Solution)	4192271	None	None	Switching a node from manage to provide kicks of automated_clean but will not PXE boot if node is in maintenance state ...	2019-06-03 13:25:31 UTC
Red Hat Product Errata	RHEA-2020:0283	None	None	None	2020-02-06 14:41:32 UTC

Description Andreas Karis 2019-05-21 19:18:05 UTC

Description of problem:
Switching a node from manage to provide kicks of automated_clean but will not PXE boot if node is in maintenance state

Version-Release number of selected component (if applicable):
most recent OSP 13

How reproducible:
~~~
ironic node-set-maintenance compute1 true
ironic node-set-provision-state compute1 manage
ironic node-set-provision-state compute1 provide
~~~

~~~
(undercloud) [stack@director ~]$ sudo grep clean /etc/ironic -R | grep -v ':#'
/etc/ironic/ironic.conf:automated_clean=true
(...)
~~~

Steps to Reproduce:
1.
2.
3.

Actual results:
compute1 will go to clean, will be booted by ironic but fail on iPXE boot

Expected results:
compute1 should refuse to go to clean, should not boot, the user should be presented with an error message of some sort

Additional info:

Comment 1 Dmitry Tantsur 2019-05-22 08:27:10 UTC

It was discussed as part of https://storyboard.openstack.org/#!/story/1563644 and the community wanted to keep the current behavior. We can try having this conversation again, but of course I cannot guarantee different results.

Comment 2 Bob Fournier 2019-06-03 12:56:08 UTC

Per Comment 1, this is as expected and the accepted upstream behavior.  Closing.

Comment 3 Andreas Karis 2019-06-03 13:12:34 UTC

Hi,

Can we keep this open and re-discuss with upstream? The current situation is *very* misleading for administrators. At least a warning message would be the minimum.

The logical behavior here would be that: automated clean does *not* kick in when the node is in maintenance state and some error message is thrown, etc. With the current behavior, the node even PXE boots and after some amount of time goes into clean_failed but with no obvious reason for the administrator.

- Andreas

Comment 4 Bob Fournier 2019-06-03 13:23:51 UTC

OK, let's keep this open and we'll discuss this with upstream/investigate a warning message at minimum.

Comment 5 Andreas Karis 2019-06-03 13:26:56 UTC

I created a KCS. Upon inspection of the bugreport and change reviews, this doesn't look trivial to fix. If there's nothing else we can do, then I'm fine with the knowledge base solution, only. However, I perceive this is a (minor) issue, so if we can fix it by adding a warning message or something or making it easier for admins to understand what's going on, that would be appreciated!

Comment 6 Bob Fournier 2019-07-26 16:29:11 UTC

Andreas - I think our best bet is the KCS article as you indicated in Comment 5.  The state machine is designed to function like this and adding a warning message isn't possible since it would require querying the node before the action was taken.

Comment 7 Dmitry Tantsur 2019-07-29 09:08:41 UTC

As the last resort I'm going to propose a patch with an option to not start cleaning in maintenance. The default behavior will not change (as desired upstream), but we'll be able to change it for TripleO. If this approach is rejected, I'll have not options other than to close the bug.

Comment 10 Bob Fournier 2019-09-21 16:51:29 UTC

Fix has merged to master.

Comment 11 Dmitry Tantsur 2019-09-23 14:17:01 UTC

TripleO patch proposed

Comment 12 Bob Fournier 2019-09-26 13:26:29 UTC

As there are multiple fixes here including an addition configuration parameter (and the bz severity is low) marking this for OSP-16.  Prior to 16 we will have to rely on the KCS article that Andreas created.

Comment 16 errata-xmlrpc 2020-02-06 14:40:53 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2020:0283

Note You need to log in before you can comment on or make changes to this bug.