Bug 1248662

Summary: Scheduler failing to launch pod when node is running low on disk space
Product: OpenShift Container Platform Reporter: Steve Speicher <sspeiche>
Component: NodeAssignee: Paul Morie <pmorie>
Status: CLOSED DUPLICATE QA Contact: Jianwei Hou <jhou>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.0.0CC: aos-bugs, daniel.falkner, decarr, dmcphers, jokerman, libra-bugs, libra-onpremise-devel, misalunk, mmccomas, pep, pweil, sdodson
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-02-03 16:51:42 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1267746    

Description Steve Speicher 2015-07-30 14:26:18 UTC
Description of problem:
Only 1 node was close to being out of disk space(had ~200megs free) all the others had ~7GB free. So if it really was the issue the bug should be open against the scheduler to account for disk space when selecting the node.

(from Wesley Hearn)

> > Here is the end of the output about the builder pod:
> >
> > $ oc get pods nodejs-example-3-build -o json
> >
> >     "status": {
> >         "phase": "Failed",
> >         "message": "Pod cannot be started due to lack of disk space.",
> >         "startTime": "2015-07-28T20:49:57Z"
> >     }
> >
> > Good ole out of disk space.
> >
> > To be clear, build 1 failed clearly. Builds 2 & 3 failed but are reported
> > as succeeded.
> >
> > $ oc get builds
> > NAME               TYPE      STATUS     POD
> > nodejs-example-1   Source    Failed     nodejs-example-1-build
> > nodejs-example-2   Source    Complete   nodejs-example-2-build
> > nodejs-example-3   Source    Complete   nodejs-example-3-build
> >

This is running on https://console.stg.openshift.com/console/

Version-Release number of selected component (if applicable): 3.0.0


How reproducible:
Create a project, select the sample (say nodejs-ex) and attempt to build it


Steps to Reproduce:
1.
2.
3.

Actual results: fails immediately, due to inability to start a pod


Expected results: pod starts, build runs, result image pushed


Additional info:
The project showing this behavior is https://console.stg.openshift.com/console/project/ldpjs/overview

Comment 2 Ben Parees 2015-07-30 15:47:56 UTC
Not clear why you assigned this to me, Paul?

Comment 3 Paul Weil 2015-07-30 15:55:42 UTC
Two things I see on this:

1. The build status doesn't seem to be properly conveyed.  The build was failed but marked as completed.  
2. A possible bug with the scheduler not taking node disk space into account - I pinged pmorie about that portion

Comment 4 Ben Parees 2015-07-30 17:07:55 UTC
The build status issue was already fixed here:
https://github.com/openshift/origin/pull/3936

this bug was opened for the scheduler issue, which is really an RFE more than a bug imho.

Comment 5 Derek Carr 2016-02-03 16:51:42 UTC

*** This bug has been marked as a duplicate of bug 1252520 ***