Bug 1840577
| Summary: | Node failed on draining during machine remediation | ||||||
|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | vsibirsk | ||||
| Component: | Cloud Compute | Assignee: | Beth White <beth.white> | ||||
| Cloud Compute sub component: | BareMetal Provider | QA Contact: | Amit Ugol <augol> | ||||
| Status: | CLOSED DUPLICATE | Docs Contact: | |||||
| Severity: | high | ||||||
| Priority: | unspecified | CC: | ipinto, stbenjam | ||||
| Version: | 4.5 | ||||||
| Target Milestone: | --- | ||||||
| Target Release: | --- | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2020-05-28 13:00:45 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
*** This bug has been marked as a duplicate of bug 1828003 *** |
Created attachment 1692607 [details] machine-controller log Description of problem: During machine remediation process, old machine is stuck in "deleting process" due to node draining stuck Version-Release number of selected component (if applicable): How reproducible: Always Steps to Reproduce: 1.Configure MHC object 2."kill" one of the nodes (stop kubelet service) Actual results: machine is stuck in "deleting" phase Expected results: node is drained, machine is deleted Additional info: (full log attached) I0526 15:05:54.505913 1 info.go:20] unable to drain node "worker-0.cnvcl2.lab.eng.tlv2.redhat.com" I0526 15:05:54.505917 1 info.go:20] there are pending nodes to be drained: worker-0.cnvcl2.lab.eng.tlv2.redhat.com W0526 15:05:54.505924 1 controller.go:364] drain failed for machine "cnvcl2-worker-0-f2zl4": [global timeout!! Skip eviction retries for pod "virt-api-77b78bb6c4-gcqd8", error when waiting for pod "recycle-pvs-9dd87fbff-bh85z" terminating: timed out waiting for the condition, error when waiting for pod "virt-template-validator-86bd85989d-fksgt" terminating: timed out waiting for the condition, error when waiting for pod "alertmanager-main-0" terminating: timed out waiting for the condition, error when waiting for pod "virt-launcher-test-cirros-vk-d6687" terminating: timed out waiting for the condition, error when waiting for pod "prometheus-k8s-0" terminating: timed out waiting for the condition, global timeout!! Skip eviction retries for pod "router-default-5cf67ff54-mx66h"]