Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1840577

Summary:

Node failed on draining during machine remediation

Product:

OpenShift Container Platform

Reporter:

vsibirsk

Component:

Cloud Compute

Assignee:

Beth White <beth.white>

Cloud Compute sub component:

BareMetal Provider

QA Contact:

Amit Ugol <augol>

Status:

CLOSED DUPLICATE

Docs Contact:

Severity:

high

Priority:

unspecified

CC:

ipinto, stbenjam

Version:

4.5

Target Milestone:

---

Target Release:

---

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2020-05-28 13:00:45 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
machine-controller log	none

Description vsibirsk 2020-05-27 09:07:26 UTC

Created attachment 1692607 [details]
machine-controller log

Description of problem:
During machine remediation process, old machine is stuck in "deleting process" due to node draining stuck

Version-Release number of selected component (if applicable):


How reproducible:
Always


Steps to Reproduce:
1.Configure MHC object
2."kill" one of the nodes (stop kubelet service)

Actual results:
machine is stuck in "deleting" phase

Expected results:
node is drained, machine is deleted


Additional info:
(full log attached)
I0526 15:05:54.505913       1 info.go:20] unable to drain node "worker-0.cnvcl2.lab.eng.tlv2.redhat.com"
I0526 15:05:54.505917       1 info.go:20] there are pending nodes to be drained: worker-0.cnvcl2.lab.eng.tlv2.redhat.com
W0526 15:05:54.505924       1 controller.go:364] drain failed for machine "cnvcl2-worker-0-f2zl4": [global timeout!! Skip eviction retries for pod "virt-api-77b78bb6c4-gcqd8", error when waiting for pod "recycle-pvs-9dd87fbff-bh85z" terminating: timed out waiting for the condition, error when waiting for pod "virt-template-validator-86bd85989d-fksgt" terminating: timed out waiting for the condition, error when waiting for pod "alertmanager-main-0" terminating: timed out waiting for the condition, error when waiting for pod "virt-launcher-test-cirros-vk-d6687" terminating: timed out waiting for the condition, error when waiting for pod "prometheus-k8s-0" terminating: timed out waiting for the condition, global timeout!! Skip eviction retries for pod "router-default-5cf67ff54-mx66h"]

Comment 1 Stephen Benjamin 2020-05-28 13:00:45 UTC


*** This bug has been marked as a duplicate of bug 1828003 ***