1872893 – [OCP 4.5] UI Graceful popup not detected when power off node under Actions button in Compute->BMH-><name of node>->Actions->Power Off but works correctly when you use Compute->BMH-><name of node>-><Kebab button: Power Off

Bug 1872893 - [OCP 4.5] UI Graceful popup not detected when power off node under Actions button in Compute->BMH-><name of node>->Actions->Power Off but works correctly when you use Compute->BMH-><name of node>-><Kebab button: Power Off

Summary: [OCP 4.5] UI Graceful popup not detected when power off node under Actions bu...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Console Metal3 Plugin
Sub Component:
Version:	4.5
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	4.6.0
Assignee:	Rastislav Wagner
QA Contact:	Yanping Zhang
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1872896 (view as bug list)
Depends On:
Blocks:	1872896
TreeView+	depends on / blocked

Reported:	2020-08-26 20:37 UTC by mlammon
Modified:	2020-10-27 16:34 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Cause: UI does not correctly evaluates if graceful shutdown is possible Consequence: UI shows incorrect warnings when shutting down the BMH Fix: Wait for node pods to load Result: UI shows correct warnings when shutting down the BMH
Clone Of:
Clones:	1872896 (view as bug list)
Environment:
Last Closed:	2020-10-27 16:34:32 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
The Kebab (good) popup (184.03 KB, image/png) 2020-08-26 20:41 UTC, mlammon	no flags	Details
The Actions (bad) popup or not graceful (184.03 KB, image/png) 2020-08-26 20:42 UTC, mlammon	no flags	Details
poweroff-popupmessage (158.23 KB, image/png) 2020-09-28 10:33 UTC, Yanping Zhang	no flags	Details
Verified POP UP Message (266.53 KB, image/png) 2020-10-05 15:57 UTC, mlammon	no flags	Details
View All

Links
System	ID	Priority	Status	Summary	Last Updated
Github	openshift console pull 6650	None	closed	Bug 1872893: Wait for node pods to load before detecting graceful shu…	2020-10-09 06:55:51 UTC
Github	openshift console pull 6812	None	closed	Bug 1872893: Improve poweroff message for nodes in maintenance	2020-10-06 18:36:33 UTC
Red Hat Product Errata	RHBA-2020:4196	None	None	None	2020-10-27 16:34:51 UTC

Description mlammon 2020-08-26 20:37:40 UTC

Description of problem:
UI Graceful popup not detected when power off node under Actions button in Compute->BMH-><name of node>->Actions->Power Off
but works correctly when you use Compute->BMH-><name of node>-><Kebab button: Power Off>

See attachments

Version-Release number of selected component (if applicable):
version   4.5.6

How reproducible:
100%

Steps to Reproduce:
1. Install OCP 
2. Installed  Openshift Virtualization in OperatorHub (kubevirt-hyperconverged - Red Hat Operators) for use with
Node Maintenance Operator (NM)
3. Compute->BMH->Start Maintenance or Compute->Nodes->Start Maintenance
(After node goes into "Under Maintenance"
4. Compute->BMH->Start Maintenance->(...)Kebab button "Power Off"
(This will cause a pop up message) "Host is ready to be gracefully powered off.  The host is currently under maintenance and all workloads have already been moved." 
****  This is what we want to see

Or in this case (the bug)

4. Compute->BMH->Start Maintenance->openshift-master-0-0->Actions->Power Off
It gives you a different pop up basically "thinking its not ready to be gracefully turned off" with checking the Power off immediately



Actual results:
Not getting correct pop up for graceful power off

Expected results:
pop up for graceful power off (confirmation)

Additional info:


oc describe nodemaintenances.nodemaintenance.kubevirt.io nm-45zcv 

Name:         nm-45zcv
Namespace:
Labels:       <none>
Annotations:  <none>
API Version:  nodemaintenance.kubevirt.io/v1beta1
Kind:         NodeMaintenance
Metadata:
  Creation Timestamp:  2020-08-26T19:55:49Z
  Finalizers:
    foregroundDeleteNodeMaintenance
  Generate Name:  nm-
  Generation:     1
  Managed Fields:
    API Version:  nodemaintenance.kubevirt.io/v1beta1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:generateName:
      f:spec:
        .:
        f:nodeName:
        f:reason:
    Manager:      Mozilla
    Operation:    Update
    Time:         2020-08-26T19:55:49Z
    API Version:  nodemaintenance.kubevirt.io/v1beta1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:finalizers:
          .:
          v:"foregroundDeleteNodeMaintenance":
      f:status:
        .:
        f:evictionPods:
        f:phase:
        f:totalpods:
    Manager:         node-maintenance-operator
    Operation:       Update
    Time:            2020-08-26T19:56:30Z
  Resource Version:  71970
  Self Link:         /apis/nodemaintenance.kubevirt.io/v1beta1/nodemaintenances/nm-45zcv
  UID:               4f45006b-4687-45ba-b289-e298d7f229d2
Spec:
  Node Name:  master-0-0
  Reason:     replace server
Status:
  Eviction Pods:  31
  Phase:          Succeeded
  Totalpods:      51
Events:           <none>

Comment 1 mlammon 2020-08-26 20:41:40 UTC

Created attachment 1712738 [details]
The Kebab (good) popup

Comment 2 mlammon 2020-08-26 20:42:23 UTC

Created attachment 1712740 [details]
The  Actions (bad) popup or not graceful

Comment 3 Jiri Tomasek 2020-09-14 07:20:18 UTC

*** Bug 1872896 has been marked as a duplicate of this bug. ***

Comment 4 Rastislav Wagner 2020-09-16 13:18:00 UTC

Im actually seeing an opposite behavior where popup opened from Kebab does not correctly detect DaemonSet and unmanaged static pods and thus wrongly shows graceful shutdown is possible.

Comment 6 Yanping Zhang 2020-09-28 10:33:39 UTC

Created attachment 1717239 [details]
poweroff-popupmessage

Comment 7 Yanping Zhang 2020-09-28 10:39:41 UTC

Checked on ocp 4.6 BM cluster with payload 4.6.0-0.nightly-2020-09-27-075304
Compute->BMH->Start Maintenance to one BMH, after it's status is "Under maintenance", click "Power off" in kebab, check popup message:
"To power off gracefully, start maintenance on this host to move all managed workloads to other nodes in the cluster." and shows daemonset pods and unmanaged static pods.
Since the BMH is already under maintenance, it's no use to say "start maintenance' in the message. Need this be improved?

Comment 8 Yadan Pei 2020-09-29 02:32:19 UTC

1. Install OCP 
2. Install 'Openshift Virtualization' in OperatorHub (kubevirt-hyperconverged - Red Hat Operators) for use with
Node Maintenance Operator (NM)
3. Compute->BMH->Start Maintenance, after several minutes BMH goes into "Under Maintenance" status
$ oc describe  nodemaintenances.kubevirt.io worker-0-0-xx89n
Name:         worker-0-0-xx89n
Namespace:    
Labels:       <none>
Annotations:  <none>
API Version:  kubevirt.io/v1alpha1
Kind:         NodeMaintenance
Metadata:
  Creation Timestamp:  2020-09-29T02:20:38Z
  Finalizers:
    foregroundDeleteNodeMaintenance
  Generate Name:  worker-0-0-
  Generation:     2
  Managed Fields:
    API Version:  kubevirt.io/v1alpha1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:generateName:
      f:spec:
        .:
        f:nodeName:
    Manager:      Mozilla
    Operation:    Update
    Time:         2020-09-29T02:20:38Z
    API Version:  kubevirt.io/v1alpha1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:finalizers:
          .:
          v:"foregroundDeleteNodeMaintenance":
      f:status:
        .:
        f:evictionPods:
        f:lastError:
        f:phase:
        f:totalpods:
    Manager:         node-maintenance-operator
    Operation:       Update
    Time:            2020-09-29T02:23:31Z
  Resource Version:  1171099
  Self Link:         /apis/kubevirt.io/v1alpha1/nodemaintenances/worker-0-0-xx89n
  UID:               e96b1a66-aba0-442e-a7ef-19e3ffa844a8
Spec:
  Node Name:  worker-0-0
Status:
  Eviction Pods:  20
  Last Error:     drain did not complete after 1m0s interval. retrying
  Phase:          Succeeded
  Totalpods:      34
Events:           <none>
4. Compute->BMH->click on the ...(kebab button) for the node in 'Under Maintenance' status  -> click "Power Off", since not all workloads have been moved successfully, we see a popup message in the attachment in comment 6.

@Rastislav Wagner, is this popup message what we expect? IMO it is acceptable but I would like to confirm with you

Comment 9 Yadan Pei 2020-09-29 02:42:13 UTC

After checking the code, I think this should be acceptable. 

Moving to VERIFIED

Comment 10 Yadan Pei 2020-09-29 02:49:09 UTC

Sorry I forgot to mention that, in two scenarios:

1. Compute -> BMH -> click on kebab button of node under maintenance -> click 'Power Off'
2. Compute -> BMH -> click on node under maintenance -> Actions -> click 'Power Off'

User can get the same popup message, in which 'start maintenance' is greyed out(disabled) and remaining workloads is shown

Comment 11 mlammon 2020-09-29 13:55:17 UTC

I just tested using Cluster version is 4.6.0-fc.8

I am still seeing issue after putting a master node in mainteance.
I put one in maintenance and then click "power off"

The pop modal that appears "not graceful"
You would have too check []Power off immediately to highlight to click Power Off

My understanding is that you should be able to Power Off gracefully after node is in mainteance.
I think this should be re-opened.

Comment 12 mlammon 2020-09-29 14:41:27 UTC

Failed QA

Comment 13 Rastislav Wagner 2020-09-29 14:47:54 UTC

@mlammon maybe you still have daemon sets and static pods running on the node ? Setting host to maintenance wont migrate those and I think we should still warn user.

Comment 14 mlammon 2020-09-29 15:57:38 UTC

I have seen this work before and let the user know it can be safely powered off (graceful).  The original opening of the bug:
"UI Graceful popup not detected when power off node under Actions button in Compute->BMH-><name of node>->Actions->Power Off
but works correctly when you use Compute->BMH-><name of node>-><Kebab button: Power Off>"

It now don't provide graceful popup in both locations.  As a user and putting a node in maintenance, I would argue
that it "would appear to be safe now" or what is the point of the feature?

@abeekoff?

Comment 15 mlammon 2020-09-29 16:01:57 UTC

@rawagner "maybe you still have daemon sets and static pods running on the node ? Setting host to maintenance wont migrate those and I think we should still warn user."
I am not disagreeing with your statement but the perception of putting a node in maintenance mode and then warning the user is kind of redundant.

Comment 16 Tomas Jelinek 2020-09-30 12:08:27 UTC

It is not a blocker so targeting 4.7 but we will try to fix it in 4.6 anyway.

Comment 17 Andrew Beekhof 2020-10-02 01:31:15 UTC

I would suggest that if the node is in Maintenance mode there should be no warning.
They've evacuated everything that can be evacuated, and anything that can could be reported as part of the maintenance request ("All possible workloads have been migrated, but X static pods and Y daemonsets have been skipped")

Otherwise, look for static pods and daemonsets, and show the warning if appropriate.

Does that sound overly complicated?

Comment 19 mlammon 2020-10-05 15:57:34 UTC

Created attachment 1719094 [details]
Verified POP UP Message

Comment 20 mlammon 2020-10-05 15:58:00 UTC

Installed nightly 4.6.0-0.nightly-2020-10-03-051134
Installed CNV 2.5.0 which has NMO

1. Put master-0-0 into maintenance 
2.  Checked power off button from Compute->BMH->kebab(master-0-0) Power Off as well as in the "Actions" 
and both produced the same Power Off Host popup

"Host is ready to be gracefully powered off.
The host is currently under maintenance and all workloads have already been moved, but 8 static pods and 13 daemon sets have been skipped."
The user then just needs to confirm "Power off" (see attachment)

This can now be verified.

Comment 23 errata-xmlrpc 2020-10-27 16:34:32 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196

Note You need to log in before you can comment on or make changes to this bug.