Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1626523

Summary: [starter-us-east-1] drain hang due to node unresponsive
Product: OpenShift Container Platform Reporter: Justin Pierce <jupierce>
Component: NodeAssignee: Seth Jennings <sjenning>
Status: CLOSED NOTABUG QA Contact: DeShuai Ma <dma>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 3.11.0CC: aos-bugs, jokerman, mmccomas
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-09-07 18:42:14 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
unresponsive node listings none

Description Justin Pierce 2018-09-07 14:30:27 UTC
Created attachment 1481598 [details]
unresponsive node listings

Description of problem:
During upgrade to v3.11, oc adm drain hung on node: ip-172-31-52-2.ec2.internal . This condition persisted for over 30 minutes with no perceivable progress.

See attached listings for detail.

Version-Release number of selected component (if applicable):
master: 3.11.0-0.21.0
node: v3.10.19

Additional info:
Logs will be attached at log level 4

Comment 7 Justin Pierce 2018-09-07 18:42:14 UTC
This node hit max_user_watches and became unresponsive. Process freed up after sudo sysctl fs.inotify.max_user_watches=1048576 . This value is now managed in later versions of openshift-ansible.