Description of problem: UI Graceful popup not detected when power off node under Actions button in Compute->BMH-><name of node>->Actions->Power Off but works correctly when you use Compute->BMH-><name of node>-><Kebab button: Power Off> See attachments Version-Release number of selected component (if applicable): version 4.5.6 How reproducible: 100% Steps to Reproduce: 1. Install OCP 2. Installed Openshift Virtualization in OperatorHub (kubevirt-hyperconverged - Red Hat Operators) for use with Node Maintenance Operator (NM) 3. Compute->BMH->Start Maintenance or Compute->Nodes->Start Maintenance (After node goes into "Under Maintenance" 4. Compute->BMH->Start Maintenance->(...)Kebab button "Power Off" (This will cause a pop up message) "Host is ready to be gracefully powered off. The host is currently under maintenance and all workloads have already been moved." **** This is what we want to see Or in this case (the bug) 4. Compute->BMH->Start Maintenance->openshift-master-0-0->Actions->Power Off It gives you a different pop up basically "thinking its not ready to be gracefully turned off" with checking the Power off immediately Actual results: Not getting correct pop up for graceful power off Expected results: pop up for graceful power off (confirmation) Additional info: oc describe nodemaintenances.nodemaintenance.kubevirt.io nm-45zcv Name: nm-45zcv Namespace: Labels: <none> Annotations: <none> API Version: nodemaintenance.kubevirt.io/v1beta1 Kind: NodeMaintenance Metadata: Creation Timestamp: 2020-08-26T19:55:49Z Finalizers: foregroundDeleteNodeMaintenance Generate Name: nm- Generation: 1 Managed Fields: API Version: nodemaintenance.kubevirt.io/v1beta1 Fields Type: FieldsV1 fieldsV1: f:metadata: f:generateName: f:spec: .: f:nodeName: f:reason: Manager: Mozilla Operation: Update Time: 2020-08-26T19:55:49Z API Version: nodemaintenance.kubevirt.io/v1beta1 Fields Type: FieldsV1 fieldsV1: f:metadata: f:finalizers: .: v:"foregroundDeleteNodeMaintenance": f:status: .: f:evictionPods: f:phase: f:totalpods: Manager: node-maintenance-operator Operation: Update Time: 2020-08-26T19:56:30Z Resource Version: 71970 Self Link: /apis/nodemaintenance.kubevirt.io/v1beta1/nodemaintenances/nm-45zcv UID: 4f45006b-4687-45ba-b289-e298d7f229d2 Spec: Node Name: master-0-0 Reason: replace server Status: Eviction Pods: 31 Phase: Succeeded Totalpods: 51 Events: <none>
Created attachment 1712738 [details] The Kebab (good) popup
Created attachment 1712740 [details] The Actions (bad) popup or not graceful
*** Bug 1872896 has been marked as a duplicate of this bug. ***
Im actually seeing an opposite behavior where popup opened from Kebab does not correctly detect DaemonSet and unmanaged static pods and thus wrongly shows graceful shutdown is possible.
Created attachment 1717239 [details] poweroff-popupmessage
Checked on ocp 4.6 BM cluster with payload 4.6.0-0.nightly-2020-09-27-075304 Compute->BMH->Start Maintenance to one BMH, after it's status is "Under maintenance", click "Power off" in kebab, check popup message: "To power off gracefully, start maintenance on this host to move all managed workloads to other nodes in the cluster." and shows daemonset pods and unmanaged static pods. Since the BMH is already under maintenance, it's no use to say "start maintenance' in the message. Need this be improved?
1. Install OCP 2. Install 'Openshift Virtualization' in OperatorHub (kubevirt-hyperconverged - Red Hat Operators) for use with Node Maintenance Operator (NM) 3. Compute->BMH->Start Maintenance, after several minutes BMH goes into "Under Maintenance" status $ oc describe nodemaintenances.kubevirt.io worker-0-0-xx89n Name: worker-0-0-xx89n Namespace: Labels: <none> Annotations: <none> API Version: kubevirt.io/v1alpha1 Kind: NodeMaintenance Metadata: Creation Timestamp: 2020-09-29T02:20:38Z Finalizers: foregroundDeleteNodeMaintenance Generate Name: worker-0-0- Generation: 2 Managed Fields: API Version: kubevirt.io/v1alpha1 Fields Type: FieldsV1 fieldsV1: f:metadata: f:generateName: f:spec: .: f:nodeName: Manager: Mozilla Operation: Update Time: 2020-09-29T02:20:38Z API Version: kubevirt.io/v1alpha1 Fields Type: FieldsV1 fieldsV1: f:metadata: f:finalizers: .: v:"foregroundDeleteNodeMaintenance": f:status: .: f:evictionPods: f:lastError: f:phase: f:totalpods: Manager: node-maintenance-operator Operation: Update Time: 2020-09-29T02:23:31Z Resource Version: 1171099 Self Link: /apis/kubevirt.io/v1alpha1/nodemaintenances/worker-0-0-xx89n UID: e96b1a66-aba0-442e-a7ef-19e3ffa844a8 Spec: Node Name: worker-0-0 Status: Eviction Pods: 20 Last Error: drain did not complete after 1m0s interval. retrying Phase: Succeeded Totalpods: 34 Events: <none> 4. Compute->BMH->click on the ...(kebab button) for the node in 'Under Maintenance' status -> click "Power Off", since not all workloads have been moved successfully, we see a popup message in the attachment in comment 6. @Rastislav Wagner, is this popup message what we expect? IMO it is acceptable but I would like to confirm with you
After checking the code, I think this should be acceptable. Moving to VERIFIED
Sorry I forgot to mention that, in two scenarios: 1. Compute -> BMH -> click on kebab button of node under maintenance -> click 'Power Off' 2. Compute -> BMH -> click on node under maintenance -> Actions -> click 'Power Off' User can get the same popup message, in which 'start maintenance' is greyed out(disabled) and remaining workloads is shown
I just tested using Cluster version is 4.6.0-fc.8 I am still seeing issue after putting a master node in mainteance. I put one in maintenance and then click "power off" The pop modal that appears "not graceful" You would have too check []Power off immediately to highlight to click Power Off My understanding is that you should be able to Power Off gracefully after node is in mainteance. I think this should be re-opened.
Failed QA
@mlammon maybe you still have daemon sets and static pods running on the node ? Setting host to maintenance wont migrate those and I think we should still warn user.
I have seen this work before and let the user know it can be safely powered off (graceful). The original opening of the bug: "UI Graceful popup not detected when power off node under Actions button in Compute->BMH-><name of node>->Actions->Power Off but works correctly when you use Compute->BMH-><name of node>-><Kebab button: Power Off>" It now don't provide graceful popup in both locations. As a user and putting a node in maintenance, I would argue that it "would appear to be safe now" or what is the point of the feature? @abeekoff?
@rawagner "maybe you still have daemon sets and static pods running on the node ? Setting host to maintenance wont migrate those and I think we should still warn user." I am not disagreeing with your statement but the perception of putting a node in maintenance mode and then warning the user is kind of redundant.
It is not a blocker so targeting 4.7 but we will try to fix it in 4.6 anyway.
I would suggest that if the node is in Maintenance mode there should be no warning. They've evacuated everything that can be evacuated, and anything that can could be reported as part of the maintenance request ("All possible workloads have been migrated, but X static pods and Y daemonsets have been skipped") Otherwise, look for static pods and daemonsets, and show the warning if appropriate. Does that sound overly complicated?
Created attachment 1719094 [details] Verified POP UP Message
Installed nightly 4.6.0-0.nightly-2020-10-03-051134 Installed CNV 2.5.0 which has NMO 1. Put master-0-0 into maintenance 2. Checked power off button from Compute->BMH->kebab(master-0-0) Power Off as well as in the "Actions" and both produced the same Power Off Host popup "Host is ready to be gracefully powered off. The host is currently under maintenance and all workloads have already been moved, but 8 static pods and 13 daemon sets have been skipped." The user then just needs to confirm "Power off" (see attachment) This can now be verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196