Bug 1824241 - Node Maintenance feature should consolidate to one place in UI (Compute-> Nodes)
Summary: Node Maintenance feature should consolidate to one place in UI (Compute-> Nodes)
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Console Metal3 Plugin
Version: 4.4
Hardware: Unspecified
OS: Unspecified
unspecified
low
Target Milestone: ---
: 4.5.0
Assignee: Jiri Tomasek
QA Contact: Udi Kalifon
URL:
Whiteboard:
Depends On: 1801238 1840133
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-04-15 15:39 UTC by mlammon
Modified: 2020-07-13 17:28 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-07-13 17:27:56 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift console pull 5533 0 None closed Bug 1824241: Relabel host maintenance actions to match node maintenance action 2020-07-13 17:29:30 UTC
Red Hat Product Errata RHBA-2020:2409 0 None None None 2020-07-13 17:28:28 UTC

Description mlammon 2020-04-15 15:39:55 UTC
Description:
Node Maintenance feature should consolidate to one place in UI (Compute-> Nodes)

The current situation "Start Maintenance" is available in Compute - > Bare Metal Hosts for workers only and is missing for "Master" Nodes on UI.

Compute->Nodes uses "Start Node Maintenance" and it is available for both Workers and Master nodes on UI.

This issue would request it be dropped from Bare Metal Hosts all together in favor of one common location (Compute -> Nodes).  I think its confusing to have it in both places and if the end result workflow is the same, why?

Also in Compute->Nodes it is called "Start Node Maintenance"
     in Compute-> bmh it is called "Start Maintenance" 
* This also confusing. Appears like different workflow from UI


Tested version steps:
Version:
4.4.0-0.nightly-2020-04-04-025830

Steps to reproduce:
1. Deploy cluster
2. Install CNV 2.30 (which include node maintenance operator)
3. login to openshift container platform
4. select compute -> Bare Metal Host 
5. select 3 dots  on master (missing) 

1. select compute -> Bare Metal Host 
2. select 3 dots on master (Available)

Note: CNV origin
https://github.com/kubevirt/hyperconverged-cluster-operator



How reproducible:
100%

Comment 1 Andrew Beekhof 2020-05-13 12:32:56 UTC
Hi Jiri, any thoughts on how to provide a good experience for this?

Comment 2 Jiri Tomasek 2020-05-15 11:01:53 UTC
I think the reasoning behind setting the maintenance also from the host is because it directly affects the workflow for host hardware maintenance. If host runs workloads (via node) then setting the maintenance on the node ensures that host can then be powered off and it won't be acquired for different node asi it still has one associated. Then a HW maintenance (ram replacement etc.) can be safely performed without having to deprovision the host completely.

I've asked Andy (UX) to comment here as well.

Comment 3 Andy Braren 2020-05-15 20:29:45 UTC
It sounds like there are a few pieces to consider here. Unfortunately I don't have an environment handy to check the current implementation, but I'll try to imagine it based on my mockups. I'm sure there are usability & text improvements that could be made in the future, but those may be RFEs rather than "bugs".

1. "Start Node Maintenance" vs "Start Maintenance" label

I agree that the action menu labels should be the same. "Start Maintenance" should work for both.

2. "Start Maintenance" action is missing for master nodes/hosts

My technical understanding may be incorrect, but I believe starting maintenance/draining a master node isn't recommended, which is why it isn't shown. The current Web Console design convention is to hide certain actions rather than disable them or explain why they're disabled, so this behavior seems to be correct. I can discuss alternative approaches with the design team if we think this convention should be reconsidered.

3. Having the same action in 2 places

UXD did some usability testing related to how starting maintenance, and the Host/Node relationship more generally, should be treated in the UI. Our test only had 5 participants (4 external) and starting maintenance was a secondary task, but we found that users were split 2/3 on which resource they expected to interact with for maintenance/power/etc. and it depended on their mental model of what each resource represents. In post-task discussions there was general agreement that the relationship between the two is fuzzy and certain actions like maintenance could make sense on both. "I want to start maintenance on my hardware" may lead users to look at the BareMetalHost resource first, for example.

I think there's more work to do in the UI to explain the 1:1 relationship between nodes/hosts better and simplify the two-step workflow Jiri described for power/maintenance to make it more seamless, but having the maintenance action available on BareMetalHosts is more of a convenient shortcut for users who expect the action to be there than a bug.

Here's a slide deck covering some of that research for those interested:
https://docs.google.com/presentation/d/1rAF3LetD5D8NZZ7-sZ1OyamlDdjON-WrDOmdihJ2cDU/edit#slide=id.g5e2bbf155f_0_301

Comment 5 Andrew Beekhof 2020-05-26 02:04:55 UTC
@andy, regarding point 2, bug #1801238 prevents machines and nodes being associated for masters and that breakage is why the functionality doesn't show up (because the UI doesn't know how to find the Node that matches the master Machine).
1801238 also affects a number of other UI interactions too (such as shutdown).

I wouldn't normally have a problem with the functionality being available in both places (point 3), but the current state (all machines work in one area, only some work in the other) is worse than not having it.
I advocate either:
- disabling maintenance from machines
- somehow indicating that machines representing masters cannot be operated on in this way

Comment 9 mlammon 2020-06-17 15:11:31 UTC
Deployed OCP 4.5 and installed CNV 2.4
oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.5.0-0.nightly-2020-06-17-001505   True        False         44m     Cluster version is 4.5.0-0.nightly-2020-06-17-001505

Followed steps above and now working as expected.
It is functioning in both Compute-Nodes as well as Compute-BMHs and now labeled in both as "Start Maintenance" 
This verifies the issue

Comment 10 errata-xmlrpc 2020-07-13 17:27:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409


Note You need to log in before you can comment on or make changes to this bug.