1812354 – NMO should update .status.pendingPods more frequently

Bug 1812354 - NMO should update .status.pendingPods more frequently [NEEDINFO]

Summary: NMO should update .status.pendingPods more frequently

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Node Maintenance Operator
Sub Component:
Version:	4.3.z
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	low
Target Milestone:	---
Target Release:	4.6.0
Assignee:	Marc Sluiter
QA Contact:	Omri Hochman
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-03-11 05:56 UTC by shahan
Modified:	2022-08-15 07:42 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-08-26 12:19:34 UTC
Target Upstream Version:
Embargoed:
Flags:	jtomasek: needinfo- abeekhof: needinfo?

Attachments	(Terms of Use)

Description shahan 2020-03-11 05:56:20 UTC

Description of problem:
moving workload always show 0% when start maintenance host

Version-Release number of selected component (if applicable):
4.3.0-0.nightly-2020-03-06-003558

How reproducible:
Always

Steps to Reproduce:
1. install Container-native virtualization Operator on ipi BareMetal cluster
2. goto /BareMetalHost list page and select a host to maintenance
3. click the starting maintenance popover on the list page

Actual results:
It will show the progress of  moving workload, it always show 0% until get the under maintenance

Expected results:
The popover should show the exactly progress in the popover

Additional info:

Comment 1 Jiri Tomasek 2020-05-11 14:05:39 UTC

This is caused by node maintenance operator not reporting the pod counts frequently enough. The pod counts are updated often in the operator logs but not in resource status.

Comment 2 Andrew Beekhof 2020-05-13 12:42:03 UTC

Looks like a UI issue, bouncing to Tomas

Comment 3 Jiri Tomasek 2020-05-15 10:29:53 UTC

The UI calculates the percentage from pod counts reported from NM CR [1]. status.pendingPods is not being updated frequently enough (IIRC it is just once a minute). on the other hand the NMO log reports how pods are being evicted in detail. So it would be good if NMO updated the status.pendingPods in the same frequency. That would make the percentage of maintenance progress actually useful.

[1] https://github.com/openshift/console/blob/master/frontend/packages/metal3-plugin/src/selectors/node-maintenance.ts#L14

Comment 4 Andrew Beekhof 2020-07-22 12:30:11 UTC

Since the work needed here is in the NMO, we'll take this one back.
Sorry for the noise

Comment 5 Marc Sluiter 2020-07-27 09:24:40 UTC

Hey, there were some code changes already, which should result in a more frequent update of the status. In detail this happens:

- NMO calls the k8s node drain code, which logs in detail what happens with pods. No chance for NMO to "intercept" these detailed steps for updating the CR status.
- after some timeout NMO gives up, in case not all pod were evicted yet. That timeout was already reduced from 1 min to 30 seconds. That's when NMO updates the CR status.
- after that NMO waits 5 seconds before triggering another drain. Repeat until done.

So atm we get a fresh status after max 35 seconds. Is that good enough, or do we need a even shorter period? E.g. set drain timeout to 10 seconds + wait 5 seconds = fresh status every 15 seconds?

Andrew, Jiri: thoughts?

Note You need to log in before you can comment on or make changes to this bug.