Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1626523

Summary:

[starter-us-east-1] drain hang due to node unresponsive

Product:

OpenShift Container Platform

Reporter:

Justin Pierce <jupierce>

Component:

Node

Assignee:

Seth Jennings <sjenning>

Status:

CLOSED NOTABUG

QA Contact:

DeShuai Ma <dma>

Severity:

unspecified

Docs Contact:

Priority:

unspecified

Version:

3.11.0

CC:

aos-bugs, jokerman, mmccomas

Target Milestone:

---

Target Release:

---

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2018-09-07 18:42:14 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
unresponsive node listings	none

Description Justin Pierce 2018-09-07 14:30:27 UTC

Created attachment 1481598 [details]
unresponsive node listings

Description of problem:
During upgrade to v3.11, oc adm drain hung on node: ip-172-31-52-2.ec2.internal . This condition persisted for over 30 minutes with no perceivable progress.

See attached listings for detail.

Version-Release number of selected component (if applicable):
master: 3.11.0-0.21.0
node: v3.10.19

Additional info:
Logs will be attached at log level 4

Comment 7 Justin Pierce 2018-09-07 18:42:14 UTC

This node hit max_user_watches and became unresponsive. Process freed up after sudo sysctl fs.inotify.max_user_watches=1048576 . This value is now managed in later versions of openshift-ansible.