Description: Node Maintenance feature should consolidate to one place in UI (Compute-> Nodes) The current situation "Start Maintenance" is available in Compute - > Bare Metal Hosts for workers only and is missing for "Master" Nodes on UI. Compute->Nodes uses "Start Node Maintenance" and it is available for both Workers and Master nodes on UI. This issue would request it be dropped from Bare Metal Hosts all together in favor of one common location (Compute -> Nodes). I think its confusing to have it in both places and if the end result workflow is the same, why? Also in Compute->Nodes it is called "Start Node Maintenance" in Compute-> bmh it is called "Start Maintenance" * This also confusing. Appears like different workflow from UI Tested version steps: Version: 4.4.0-0.nightly-2020-04-04-025830 Steps to reproduce: 1. Deploy cluster 2. Install CNV 2.30 (which include node maintenance operator) 3. login to openshift container platform 4. select compute -> Bare Metal Host 5. select 3 dots on master (missing) 1. select compute -> Bare Metal Host 2. select 3 dots on master (Available) Note: CNV origin https://github.com/kubevirt/hyperconverged-cluster-operator How reproducible: 100%
Hi Jiri, any thoughts on how to provide a good experience for this?
I think the reasoning behind setting the maintenance also from the host is because it directly affects the workflow for host hardware maintenance. If host runs workloads (via node) then setting the maintenance on the node ensures that host can then be powered off and it won't be acquired for different node asi it still has one associated. Then a HW maintenance (ram replacement etc.) can be safely performed without having to deprovision the host completely. I've asked Andy (UX) to comment here as well.
It sounds like there are a few pieces to consider here. Unfortunately I don't have an environment handy to check the current implementation, but I'll try to imagine it based on my mockups. I'm sure there are usability & text improvements that could be made in the future, but those may be RFEs rather than "bugs". 1. "Start Node Maintenance" vs "Start Maintenance" label I agree that the action menu labels should be the same. "Start Maintenance" should work for both. 2. "Start Maintenance" action is missing for master nodes/hosts My technical understanding may be incorrect, but I believe starting maintenance/draining a master node isn't recommended, which is why it isn't shown. The current Web Console design convention is to hide certain actions rather than disable them or explain why they're disabled, so this behavior seems to be correct. I can discuss alternative approaches with the design team if we think this convention should be reconsidered. 3. Having the same action in 2 places UXD did some usability testing related to how starting maintenance, and the Host/Node relationship more generally, should be treated in the UI. Our test only had 5 participants (4 external) and starting maintenance was a secondary task, but we found that users were split 2/3 on which resource they expected to interact with for maintenance/power/etc. and it depended on their mental model of what each resource represents. In post-task discussions there was general agreement that the relationship between the two is fuzzy and certain actions like maintenance could make sense on both. "I want to start maintenance on my hardware" may lead users to look at the BareMetalHost resource first, for example. I think there's more work to do in the UI to explain the 1:1 relationship between nodes/hosts better and simplify the two-step workflow Jiri described for power/maintenance to make it more seamless, but having the maintenance action available on BareMetalHosts is more of a convenient shortcut for users who expect the action to be there than a bug. Here's a slide deck covering some of that research for those interested: https://docs.google.com/presentation/d/1rAF3LetD5D8NZZ7-sZ1OyamlDdjON-WrDOmdihJ2cDU/edit#slide=id.g5e2bbf155f_0_301
1. fixed by https://github.com/openshift/console/pull/5533 2. fixed by https://bugzilla.redhat.com/show_bug.cgi?id=1801238
@andy, regarding point 2, bug #1801238 prevents machines and nodes being associated for masters and that breakage is why the functionality doesn't show up (because the UI doesn't know how to find the Node that matches the master Machine). 1801238 also affects a number of other UI interactions too (such as shutdown). I wouldn't normally have a problem with the functionality being available in both places (point 3), but the current state (all machines work in one area, only some work in the other) is worse than not having it. I advocate either: - disabling maintenance from machines - somehow indicating that machines representing masters cannot be operated on in this way
Deployed OCP 4.5 and installed CNV 2.4 oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.5.0-0.nightly-2020-06-17-001505 True False 44m Cluster version is 4.5.0-0.nightly-2020-06-17-001505 Followed steps above and now working as expected. It is functioning in both Compute-Nodes as well as Compute-BMHs and now labeled in both as "Start Maintenance" This verifies the issue
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2409