Bug 1248662 - Scheduler failing to launch pod when node is running low on disk space
Scheduler failing to launch pod when node is running low on disk space
Status: CLOSED DUPLICATE of bug 1252520
Product: OpenShift Container Platform
Classification: Red Hat
Component: Pod (Show other bugs)
3.0.0
Unspecified Unspecified
medium Severity medium
: ---
: ---
Assigned To: Paul Morie
Jianwei Hou
:
Depends On:
Blocks: 1267746
  Show dependency treegraph
 
Reported: 2015-07-30 10:26 EDT by Steve Speicher
Modified: 2016-02-03 11:51 EST (History)
12 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-02-03 11:51:42 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Steve Speicher 2015-07-30 10:26:18 EDT
Description of problem:
Only 1 node was close to being out of disk space(had ~200megs free) all the others had ~7GB free. So if it really was the issue the bug should be open against the scheduler to account for disk space when selecting the node.

(from Wesley Hearn)

> > Here is the end of the output about the builder pod:
> >
> > $ oc get pods nodejs-example-3-build -o json
> >
> >     "status": {
> >         "phase": "Failed",
> >         "message": "Pod cannot be started due to lack of disk space.",
> >         "startTime": "2015-07-28T20:49:57Z"
> >     }
> >
> > Good ole out of disk space.
> >
> > To be clear, build 1 failed clearly. Builds 2 & 3 failed but are reported
> > as succeeded.
> >
> > $ oc get builds
> > NAME               TYPE      STATUS     POD
> > nodejs-example-1   Source    Failed     nodejs-example-1-build
> > nodejs-example-2   Source    Complete   nodejs-example-2-build
> > nodejs-example-3   Source    Complete   nodejs-example-3-build
> >

This is running on https://console.stg.openshift.com/console/

Version-Release number of selected component (if applicable): 3.0.0


How reproducible:
Create a project, select the sample (say nodejs-ex) and attempt to build it


Steps to Reproduce:
1.
2.
3.

Actual results: fails immediately, due to inability to start a pod


Expected results: pod starts, build runs, result image pushed


Additional info:
The project showing this behavior is https://console.stg.openshift.com/console/project/ldpjs/overview
Comment 2 Ben Parees 2015-07-30 11:47:56 EDT
Not clear why you assigned this to me, Paul?
Comment 3 Paul Weil 2015-07-30 11:55:42 EDT
Two things I see on this:

1. The build status doesn't seem to be properly conveyed.  The build was failed but marked as completed.  
2. A possible bug with the scheduler not taking node disk space into account - I pinged pmorie about that portion
Comment 4 Ben Parees 2015-07-30 13:07:55 EDT
The build status issue was already fixed here:
https://github.com/openshift/origin/pull/3936

this bug was opened for the scheduler issue, which is really an RFE more than a bug imho.
Comment 5 Derek Carr 2016-02-03 11:51:42 EST

*** This bug has been marked as a duplicate of bug 1252520 ***

Note You need to log in before you can comment on or make changes to this bug.