Bug 1248662

Summary:	Scheduler failing to launch pod when node is running low on disk space
Product:	OpenShift Container Platform	Reporter:	Steve Speicher <sspeiche>
Component:	Node	Assignee:	Paul Morie <pmorie>
Status:	CLOSED DUPLICATE	QA Contact:	Jianwei Hou <jhou>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	3.0.0	CC:	aos-bugs, daniel.falkner, decarr, dmcphers, jokerman, libra-bugs, libra-onpremise-devel, misalunk, mmccomas, pep, pweil, sdodson
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2016-02-03 16:51:42 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1267746

Description Steve Speicher 2015-07-30 14:26:18 UTC

Description of problem:
Only 1 node was close to being out of disk space(had ~200megs free) all the others had ~7GB free. So if it really was the issue the bug should be open against the scheduler to account for disk space when selecting the node.

(from Wesley Hearn)

> > Here is the end of the output about the builder pod:
> >
> > $ oc get pods nodejs-example-3-build -o json
> >
> >     "status": {
> >         "phase": "Failed",
> >         "message": "Pod cannot be started due to lack of disk space.",
> >         "startTime": "2015-07-28T20:49:57Z"
> >     }
> >
> > Good ole out of disk space.
> >
> > To be clear, build 1 failed clearly. Builds 2 & 3 failed but are reported
> > as succeeded.
> >
> > $ oc get builds
> > NAME               TYPE      STATUS     POD
> > nodejs-example-1   Source    Failed     nodejs-example-1-build
> > nodejs-example-2   Source    Complete   nodejs-example-2-build
> > nodejs-example-3   Source    Complete   nodejs-example-3-build
> >

This is running on https://console.stg.openshift.com/console/

Version-Release number of selected component (if applicable): 3.0.0


How reproducible:
Create a project, select the sample (say nodejs-ex) and attempt to build it


Steps to Reproduce:
1.
2.
3.

Actual results: fails immediately, due to inability to start a pod


Expected results: pod starts, build runs, result image pushed


Additional info:
The project showing this behavior is https://console.stg.openshift.com/console/project/ldpjs/overview

Comment 2 Ben Parees 2015-07-30 15:47:56 UTC

Not clear why you assigned this to me, Paul?

Comment 3 Paul Weil 2015-07-30 15:55:42 UTC

Two things I see on this:

1. The build status doesn't seem to be properly conveyed.  The build was failed but marked as completed.  
2. A possible bug with the scheduler not taking node disk space into account - I pinged pmorie about that portion

Comment 4 Ben Parees 2015-07-30 17:07:55 UTC

The build status issue was already fixed here:
https://github.com/openshift/origin/pull/3936

this bug was opened for the scheduler issue, which is really an RFE more than a bug imho.

Comment 5 Derek Carr 2016-02-03 16:51:42 UTC


*** This bug has been marked as a duplicate of bug 1252520 ***