1248662 – Scheduler failing to launch pod when node is running low on disk space

Bug 1248662 - Scheduler failing to launch pod when node is running low on disk space

Summary: Scheduler failing to launch pod when node is running low on disk space

Keywords:
Status:	CLOSED DUPLICATE of bug 1252520
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Node
Sub Component:
Version:	3.0.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Paul Morie
QA Contact:	Jianwei Hou
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1267746
TreeView+	depends on / blocked

Reported:	2015-07-30 14:26 UTC by Steve Speicher
Modified:	2019-10-10 10:01 UTC (History)
CC List:	12 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2016-02-03 16:51:42 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Steve Speicher 2015-07-30 14:26:18 UTC

Description of problem:
Only 1 node was close to being out of disk space(had ~200megs free) all the others had ~7GB free. So if it really was the issue the bug should be open against the scheduler to account for disk space when selecting the node.

(from Wesley Hearn)

> > Here is the end of the output about the builder pod:
> >
> > $ oc get pods nodejs-example-3-build -o json
> >
> >     "status": {
> >         "phase": "Failed",
> >         "message": "Pod cannot be started due to lack of disk space.",
> >         "startTime": "2015-07-28T20:49:57Z"
> >     }
> >
> > Good ole out of disk space.
> >
> > To be clear, build 1 failed clearly. Builds 2 & 3 failed but are reported
> > as succeeded.
> >
> > $ oc get builds
> > NAME               TYPE      STATUS     POD
> > nodejs-example-1   Source    Failed     nodejs-example-1-build
> > nodejs-example-2   Source    Complete   nodejs-example-2-build
> > nodejs-example-3   Source    Complete   nodejs-example-3-build
> >

This is running on https://console.stg.openshift.com/console/

Version-Release number of selected component (if applicable): 3.0.0


How reproducible:
Create a project, select the sample (say nodejs-ex) and attempt to build it


Steps to Reproduce:
1.
2.
3.

Actual results: fails immediately, due to inability to start a pod


Expected results: pod starts, build runs, result image pushed


Additional info:
The project showing this behavior is https://console.stg.openshift.com/console/project/ldpjs/overview

Comment 2 Ben Parees 2015-07-30 15:47:56 UTC

Not clear why you assigned this to me, Paul?

Comment 3 Paul Weil 2015-07-30 15:55:42 UTC

Two things I see on this:

1. The build status doesn't seem to be properly conveyed.  The build was failed but marked as completed.  
2. A possible bug with the scheduler not taking node disk space into account - I pinged pmorie about that portion

Comment 4 Ben Parees 2015-07-30 17:07:55 UTC

The build status issue was already fixed here:
https://github.com/openshift/origin/pull/3936

this bug was opened for the scheduler issue, which is really an RFE more than a bug imho.

Comment 5 Derek Carr 2016-02-03 16:51:42 UTC


*** This bug has been marked as a duplicate of bug 1252520 ***

Note You need to log in before you can comment on or make changes to this bug.