Bug 1812354
Summary: | NMO should update .status.pendingPods more frequently | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | shahan <hasha> |
Component: | Node Maintenance Operator | Assignee: | Marc Sluiter <msluiter> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Omri Hochman <ohochman> |
Severity: | low | Docs Contact: | |
Priority: | medium | ||
Version: | 4.3.z | CC: | abeekhof, aos-bugs, jokerman, jtomasek |
Target Milestone: | --- | Keywords: | Triaged |
Target Release: | 4.6.0 | Flags: | jtomasek:
needinfo-
abeekhof: needinfo? |
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-08-26 12:19:34 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
shahan
2020-03-11 05:56:20 UTC
This is caused by node maintenance operator not reporting the pod counts frequently enough. The pod counts are updated often in the operator logs but not in resource status. Looks like a UI issue, bouncing to Tomas The UI calculates the percentage from pod counts reported from NM CR [1]. status.pendingPods is not being updated frequently enough (IIRC it is just once a minute). on the other hand the NMO log reports how pods are being evicted in detail. So it would be good if NMO updated the status.pendingPods in the same frequency. That would make the percentage of maintenance progress actually useful. [1] https://github.com/openshift/console/blob/master/frontend/packages/metal3-plugin/src/selectors/node-maintenance.ts#L14 Since the work needed here is in the NMO, we'll take this one back. Sorry for the noise Hey, there were some code changes already, which should result in a more frequent update of the status. In detail this happens: - NMO calls the k8s node drain code, which logs in detail what happens with pods. No chance for NMO to "intercept" these detailed steps for updating the CR status. - after some timeout NMO gives up, in case not all pod were evicted yet. That timeout was already reduced from 1 min to 30 seconds. That's when NMO updates the CR status. - after that NMO waits 5 seconds before triggering another drain. Repeat until done. So atm we get a fresh status after max 35 seconds. Is that good enough, or do we need a even shorter period? E.g. set drain timeout to 10 seconds + wait 5 seconds = fresh status every 15 seconds? Andrew, Jiri: thoughts? |