Bug 1812354 - NMO should update .status.pendingPods more frequently [NEEDINFO]
Summary: NMO should update .status.pendingPods more frequently
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node Maintenance Operator
Version: 4.3.z
Hardware: Unspecified
OS: Unspecified
medium
low
Target Milestone: ---
: 4.6.0
Assignee: Marc Sluiter
QA Contact: Omri Hochman
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-03-11 05:56 UTC by shahan
Modified: 2022-08-15 07:42 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-08-26 12:19:34 UTC
Target Upstream Version:
Embargoed:
jtomasek: needinfo-
abeekhof: needinfo?


Attachments (Terms of Use)

Description shahan 2020-03-11 05:56:20 UTC
Description of problem:
moving workload always show 0% when start maintenance host

Version-Release number of selected component (if applicable):
4.3.0-0.nightly-2020-03-06-003558

How reproducible:
Always

Steps to Reproduce:
1. install Container-native virtualization Operator on ipi BareMetal cluster
2. goto /BareMetalHost list page and select a host to maintenance
3. click the starting maintenance popover on the list page

Actual results:
It will show the progress of  moving workload, it always show 0% until get the under maintenance

Expected results:
The popover should show the exactly progress in the popover

Additional info:

Comment 1 Jiri Tomasek 2020-05-11 14:05:39 UTC
This is caused by node maintenance operator not reporting the pod counts frequently enough. The pod counts are updated often in the operator logs but not in resource status.

Comment 2 Andrew Beekhof 2020-05-13 12:42:03 UTC
Looks like a UI issue, bouncing to Tomas

Comment 3 Jiri Tomasek 2020-05-15 10:29:53 UTC
The UI calculates the percentage from pod counts reported from NM CR [1]. status.pendingPods is not being updated frequently enough (IIRC it is just once a minute). on the other hand the NMO log reports how pods are being evicted in detail. So it would be good if NMO updated the status.pendingPods in the same frequency. That would make the percentage of maintenance progress actually useful.

[1] https://github.com/openshift/console/blob/master/frontend/packages/metal3-plugin/src/selectors/node-maintenance.ts#L14

Comment 4 Andrew Beekhof 2020-07-22 12:30:11 UTC
Since the work needed here is in the NMO, we'll take this one back.
Sorry for the noise

Comment 5 Marc Sluiter 2020-07-27 09:24:40 UTC
Hey, there were some code changes already, which should result in a more frequent update of the status. In detail this happens:

- NMO calls the k8s node drain code, which logs in detail what happens with pods. No chance for NMO to "intercept" these detailed steps for updating the CR status.
- after some timeout NMO gives up, in case not all pod were evicted yet. That timeout was already reduced from 1 min to 30 seconds. That's when NMO updates the CR status.
- after that NMO waits 5 seconds before triggering another drain. Repeat until done.

So atm we get a fresh status after max 35 seconds. Is that good enough, or do we need a even shorter period? E.g. set drain timeout to 10 seconds + wait 5 seconds = fresh status every 15 seconds?

Andrew, Jiri: thoughts?


Note You need to log in before you can comment on or make changes to this bug.