Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1824241

Summary:	Node Maintenance feature should consolidate to one place in UI (Compute-> Nodes)
Product:	OpenShift Container Platform	Reporter:	mlammon
Component:	Console Metal3 Plugin	Assignee:	Jiri Tomasek <jtomasek>
Status:	CLOSED ERRATA	QA Contact:	Udi Kalifon <ukalifon>
Severity:	low	Docs Contact:
Priority:	unspecified
Version:	4.4	CC:	abeekhof, abraren, aos-bugs, gharden, jtomasek, msluiter
Target Milestone:	---
Target Release:	4.5.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2020-07-13 17:27:56 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1801238, 1840133
Bug Blocks:

Description mlammon 2020-04-15 15:39:55 UTC

Description:
Node Maintenance feature should consolidate to one place in UI (Compute-> Nodes)

The current situation "Start Maintenance" is available in Compute - > Bare Metal Hosts for workers only and is missing for "Master" Nodes on UI.

Compute->Nodes uses "Start Node Maintenance" and it is available for both Workers and Master nodes on UI.

This issue would request it be dropped from Bare Metal Hosts all together in favor of one common location (Compute -> Nodes).  I think its confusing to have it in both places and if the end result workflow is the same, why?

Also in Compute->Nodes it is called "Start Node Maintenance"
     in Compute-> bmh it is called "Start Maintenance" 
* This also confusing. Appears like different workflow from UI


Tested version steps:
Version:
4.4.0-0.nightly-2020-04-04-025830

Steps to reproduce:
1. Deploy cluster
2. Install CNV 2.30 (which include node maintenance operator)
3. login to openshift container platform
4. select compute -> Bare Metal Host 
5. select 3 dots  on master (missing) 

1. select compute -> Bare Metal Host 
2. select 3 dots on master (Available)

Note: CNV origin
https://github.com/kubevirt/hyperconverged-cluster-operator



How reproducible:
100%

Comment 1 Andrew Beekhof 2020-05-13 12:32:56 UTC

Hi Jiri, any thoughts on how to provide a good experience for this?

Comment 2 Jiri Tomasek 2020-05-15 11:01:53 UTC

I think the reasoning behind setting the maintenance also from the host is because it directly affects the workflow for host hardware maintenance. If host runs workloads (via node) then setting the maintenance on the node ensures that host can then be powered off and it won't be acquired for different node asi it still has one associated. Then a HW maintenance (ram replacement etc.) can be safely performed without having to deprovision the host completely.

I've asked Andy (UX) to comment here as well.

Comment 3 Andy Braren 2020-05-15 20:29:45 UTC

It sounds like there are a few pieces to consider here. Unfortunately I don't have an environment handy to check the current implementation, but I'll try to imagine it based on my mockups. I'm sure there are usability & text improvements that could be made in the future, but those may be RFEs rather than "bugs".

1. "Start Node Maintenance" vs "Start Maintenance" label

I agree that the action menu labels should be the same. "Start Maintenance" should work for both.

2. "Start Maintenance" action is missing for master nodes/hosts

My technical understanding may be incorrect, but I believe starting maintenance/draining a master node isn't recommended, which is why it isn't shown. The current Web Console design convention is to hide certain actions rather than disable them or explain why they're disabled, so this behavior seems to be correct. I can discuss alternative approaches with the design team if we think this convention should be reconsidered.

3. Having the same action in 2 places

UXD did some usability testing related to how starting maintenance, and the Host/Node relationship more generally, should be treated in the UI. Our test only had 5 participants (4 external) and starting maintenance was a secondary task, but we found that users were split 2/3 on which resource they expected to interact with for maintenance/power/etc. and it depended on their mental model of what each resource represents. In post-task discussions there was general agreement that the relationship between the two is fuzzy and certain actions like maintenance could make sense on both. "I want to start maintenance on my hardware" may lead users to look at the BareMetalHost resource first, for example.

I think there's more work to do in the UI to explain the 1:1 relationship between nodes/hosts better and simplify the two-step workflow Jiri described for power/maintenance to make it more seamless, but having the maintenance action available on BareMetalHosts is more of a convenient shortcut for users who expect the action to be there than a bug.

Here's a slide deck covering some of that research for those interested:
https://docs.google.com/presentation/d/1rAF3LetD5D8NZZ7-sZ1OyamlDdjON-WrDOmdihJ2cDU/edit#slide=id.g5e2bbf155f_0_301

Comment 4 Jiri Tomasek 2020-05-22 08:42:54 UTC

1. fixed by https://github.com/openshift/console/pull/5533

2. fixed by https://bugzilla.redhat.com/show_bug.cgi?id=1801238

Comment 5 Andrew Beekhof 2020-05-26 02:04:55 UTC

@andy, regarding point 2, bug #1801238 prevents machines and nodes being associated for masters and that breakage is why the functionality doesn't show up (because the UI doesn't know how to find the Node that matches the master Machine).
1801238 also affects a number of other UI interactions too (such as shutdown).

I wouldn't normally have a problem with the functionality being available in both places (point 3), but the current state (all machines work in one area, only some work in the other) is worse than not having it.
I advocate either:
- disabling maintenance from machines
- somehow indicating that machines representing masters cannot be operated on in this way

Comment 9 mlammon 2020-06-17 15:11:31 UTC

Deployed OCP 4.5 and installed CNV 2.4
oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.5.0-0.nightly-2020-06-17-001505   True        False         44m     Cluster version is 4.5.0-0.nightly-2020-06-17-001505

Followed steps above and now working as expected.
It is functioning in both Compute-Nodes as well as Compute-BMHs and now labeled in both as "Start Maintenance" 
This verifies the issue

Comment 10 errata-xmlrpc 2020-07-13 17:27:56 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409