Description of problem: The Power Off message does not reflect the node status. Even though the admin has moved the node to "Under Maintenace" the admin can not power off the node. The Admin has to select power off immediately then they are prompt with a message that workloads won't be moved before powering off. Version-Release number of selected component (if applicable): 4.4.0-0.nightly-2020-04-21-033933 How reproducible: 100% Steps to reproduce: 1. Deploy cluster 2. Install CNV 2.30 (which include node maintenance operator) 3. login to openshift container platform 4. select compute -> Bare Metal Host 5: select 3 dots on a node and select "Start Maintenance" Give a reason and wait for node status to change "Under maintenance" 6: Select menu compute --> Bare Metal Hosts 7: select 3 dots on the node that was just changed to "Under maintenance" 8: Select Power OFF Actual results: The Power OFF button is grey out and not selectable The checkbox "power off immediately" will need to be checked. There is a warning message for this node even though is already Under maintenance "Workloads currently running on this host will not be moved before powering off. This may cause service disruptions." Expected results: Since the node is already under maintenance there should not be any workload to offload. Should be able to just power off the node without workload warning. Additional info: I see in the Power off window "To power off gracefully, Start maintenance on this host to move all managed workloads to other hosts in the cluster." Start maintenance is grey out as it appears as an option that the Admin can click on it. It is not selectable.
I can recreate this for master nodes, but it works well for worker nodes. It's also very strange that there is no "start maintenance" on master nodes from the baremetal hosts page, and you have to do the start maintenance from the nodes page and then do the power off from the hosts page, and ignore that it's telling you to start maintenance first...
This is caused by https://bugzilla.redhat.com/show_bug.cgi?id=1801238. It happens only on masters. Due to the fact that master Bare Metal Host is not properly assigned to the Node, the host can't see that the node is in maintenance and therefore it does not take it into account when resolving the power off situation. @Udi, not being able to start the maintenance from the master host is again caused by the same bug. The host can't 'see' its assigned node and therefore it is not possible to start its maintenance from the host.
I applied the workaround scripts that were shared in https://bugzilla.redhat.com/show_bug.cgi?id=1801238 and got the expected behavior.
Fixed by https://bugzilla.redhat.com/show_bug.cgi?id=1840133
Verified: 4.5.0-0.nightly-2020-06-17-001505
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2409