Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1487334

Summary: Node goes "NotReady" with plenty of available resources
Product: OpenShift Container Platform Reporter: Sten Turpin <sten>
Component: NodeAssignee: Seth Jennings <sjenning>
Status: CLOSED DUPLICATE QA Contact: DeShuai Ma <dma>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 3.6.1CC: aos-bugs, jokerman, mmccomas, vichoudh
Target Milestone: ---Keywords: OpsBlocker
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-09-12 19:31:09 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Sten Turpin 2017-08-31 16:17:28 UTC
Description of problem: 1 of 3 recurring issues observed on Starter clusters this week. A node goes NotReady, failed state can be associated with a particular pod, but that pod doesn't appear to be using an inordinate amount of resources. 


Version-Release number of selected component (if applicable): atomic-openshift-3.6.173.0.5-1.git.0.f30b99e.el7.x86_64


How reproducible: Rarely, tied to particular pods 


Steps to Reproduce:
1. User runs a pod 
2. Node goes NotReady
3. Ops checks system load, docker stats, nothing appears to be out of reasonable bounds
4. Ops disables or moves the pod 
5. Node recovers 

Actual results:
Node goes NotReady, despite available resources. 

Expected results:
Node should stay in Ready state, or report what failure prevents it from being Ready. 

Additional info:

Comment 4 Seth Jennings 2017-09-12 19:31:09 UTC
These looks similar enough and are in starter cluster.

*** This bug has been marked as a duplicate of bug 1486914 ***

Comment 5 Vikas Choudhary 2017-09-15 08:23:58 UTC
Sten, Since you mentioning that issue occurs only with specific pod, Can you please share pod yaml file so that it could be reproduced on local system to understand what that pod does in order to make node NOT-READY.