Bug 1826505

Summary: [Masters] Power Off message for node in maintenance state need a updated message referring to workloads
Product: OpenShift Container Platform Reporter: bjacot
Component: Console Metal3 PluginAssignee: Jiri Tomasek <jtomasek>
Status: CLOSED ERRATA QA Contact: mlammon
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.4CC: abeekhof, aos-bugs, gharden, mlammon, ukalifon
Target Milestone: ---   
Target Release: 4.5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-07-13 17:29:58 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1801238, 1840133    
Bug Blocks:    

Description bjacot 2020-04-21 19:59:32 UTC
Description of problem:
The Power Off message does not reflect the node status.  Even though the admin has moved the node to "Under Maintenace" the admin can not power off the node.  The Admin has to select power off immediately then they are prompt with a message that workloads won't be moved before powering off.

Version-Release number of selected component (if applicable):
4.4.0-0.nightly-2020-04-21-033933

How reproducible:
100%

Steps to reproduce:
1. Deploy cluster
2. Install CNV 2.30 (which include node maintenance operator)
3. login to openshift container platform
4. select compute -> Bare Metal Host
5: select 3 dots on a node and select "Start Maintenance"
   Give a reason and wait for node status to change "Under maintenance"
6: Select menu compute --> Bare Metal Hosts
7: select 3 dots on the node that was just changed to "Under maintenance"
8: Select Power OFF


Actual results:
The Power OFF button is grey out and not selectable
The checkbox "power off immediately" will need to be checked.
There is a warning message for this node even though is already Under maintenance

"Workloads currently running on this host will not be moved before powering off. This may cause service disruptions."

Expected results:
Since the node is already under maintenance there should not be any workload to offload.  Should be able to just power off the node without workload warning.

Additional info:
I see in the Power off window 
"To power off gracefully, Start maintenance  on this host to move all managed workloads to other hosts in the cluster."

Start maintenance is grey out as it appears as an option that the Admin can click on it.  It is not selectable.

Comment 1 Udi Kalifon 2020-04-22 07:15:48 UTC
I can recreate this for master nodes, but it works well for worker nodes. It's also very strange that there is no "start maintenance" on master nodes from the baremetal hosts page, and you have to do the start maintenance from the nodes page and then do the power off from the hosts page, and ignore that it's telling you to start maintenance first...

Comment 2 Jiri Tomasek 2020-04-22 07:41:17 UTC
This is caused by https://bugzilla.redhat.com/show_bug.cgi?id=1801238. It happens only on masters. Due to the fact that master Bare Metal Host is not properly assigned to the Node, the host can't see that the node is in maintenance and therefore it does not take it into account when resolving the power off situation.

@Udi, not being able to start the maintenance from the master host is again caused by the same bug. The host can't 'see' its assigned node and therefore it is not possible to start its maintenance from the host.

Comment 3 bjacot 2020-04-22 12:47:48 UTC
I applied the workaround scripts that were shared in https://bugzilla.redhat.com/show_bug.cgi?id=1801238 and got the expected behavior.

Comment 4 Jiri Tomasek 2020-05-28 10:07:33 UTC
Fixed by https://bugzilla.redhat.com/show_bug.cgi?id=1840133

Comment 5 Udi Kalifon 2020-06-17 13:24:31 UTC
Verified: 4.5.0-0.nightly-2020-06-17-001505

Comment 7 errata-xmlrpc 2020-07-13 17:29:58 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409