Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1297521 - Scaling up pod causes loop with Node is out of disk
Scaling up pod causes loop with Node is out of disk
Status: CLOSED ERRATA
Product: OpenShift Container Platform
Classification: Red Hat
Component: Pod (Show other bugs)
3.1.0
Unspecified Unspecified
high Severity high
: ---
: ---
Assigned To: Andy Goldstein
DeShuai Ma
:
Depends On:
Blocks: 1267746
  Show dependency treegraph
 
Reported: 2016-01-11 13:45 EST by Ryan Howe
Modified: 2017-03-08 13 EST (History)
9 users (show)

See Also:
Fixed In Version: atomic-openshift-3.1.1.900-1.git.1.bacd67f.el7
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-05-12 12:26:35 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
GUI images of issue (194.42 KB, image/gif)
2016-01-11 13:45 EST, Ryan Howe
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2016:1064 normal SHIPPED_LIVE Important: Red Hat OpenShift Enterprise 3.2 security, bug fix, and enhancement update 2016-05-12 16:19:17 EDT

  None (edit)
Description Ryan Howe 2016-01-11 13:45:35 EST
Created attachment 1113648 [details]
GUI images of issue

Description of problem:
 When manually scaling up pod, openshift tries to place it on a node that doesn't have any disk space left, and openshift proceeds to get stuck in a loop of trying to deploy. 


Version-Release number of selected component (if applicable):
3.1 


Steps to Reproduce:
1. Schedule pod to a node
2. Fill up diskspace
3. Manually scale pod up via GUI or CLI 

Actual results:
Pod continues to loop creating many pods that fail with the same error.

Expected results:
Fail once with error

Additional info:
https://lists.openshift.redhat.com/openshift-archives/users/2016-January/msg00033.html

# oc get pods
....
logging-fluentd-1-x5h0i   0/1       OutOfDisk          0          1s
logging-fluentd-1-xl4hz   0/1       OutOfDisk          0          12s
logging-fluentd-1-xqhul   0/1       OutOfDisk          0          10s
logging-fluentd-1-ykpku   0/1       OutOfDisk          0          13s
logging-fluentd-1-z2map   0/1       OutOfDisk          0          7s

[root master-001 ~]# oc get pods | wc -l
116
[root master-001 ~]# oc get pods | wc -l
119
Comment 1 Andy Goldstein 2016-01-11 14:11:56 EST
This should be resolved with the next rebase into origin. The following upstream PRs add the ability to prevent scheduling to nodes that are out of disk:

https://github.com/kubernetes/kubernetes/pull/16178
https://github.com/kubernetes/kubernetes/pull/16179
Comment 2 Andy Goldstein 2016-01-12 09:21:59 EST
Not a 3.1.1 blocker
Comment 3 Eric Paris 2016-02-02 11:32:17 EST
Upstream fixed merged Oct 29 and Nov 2. Fixed when rebase lands.
Comment 4 Derek Carr 2016-02-03 11:28:47 EST
The upstream PRs have landed in openshift/origin repository.
Comment 5 DeShuai Ma 2016-02-24 00:07:35 EST
Verify on openshift v3.1.1.905

steps:
1. Get the node
[root@openshift-115 dma]# oc get node
NAME                               STATUS                     AGE
openshift-115.lab.sjc.redhat.com   Ready,SchedulingDisabled   1d
openshift-136.lab.sjc.redhat.com   Ready                      1d

2.Create a rc and scale the pod replicas=0
[root@openshift-115 dma]# oc get rc -n dma
CONTROLLER   CONTAINER(S)   IMAGE(S)                                                                           SELECTOR                                               REPLICAS   AGE
mysql-1      mysql          brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhscl/mysql-56-rhel7:latest   deployment=mysql-1,deploymentconfig=mysql,name=mysql   0          18m

3.Create a large file to fill the disk with 100% usage
[root@openshift-136 ~]# df -lh
Filesystem               Size  Used Avail Use% Mounted on
/dev/mapper/rhel72-root   10G   10G   20K 100% /
devtmpfs                 1.9G     0  1.9G   0% /dev
tmpfs                    1.9G     0  1.9G   0% /dev/shm
tmpfs                    1.9G  190M  1.7G  11% /run
tmpfs                    1.9G     0  1.9G   0% /sys/fs/cgroup
/dev/vda1                497M  197M  300M  40% /boot
tmpfs                    380M     0  380M   0% /run/user/0

4.Scale the rc with replicas=3
# oc scale rc/mysql-1 --replicas=3 -n dma

5. Check the pod status
[root@openshift-115 dma]# oc get pod -n dma
NAME            READY     STATUS    RESTARTS   AGE
mysql-1-8ss17   0/1       Pending   0          1m
mysql-1-aj620   0/1       Pending   0          1m
mysql-1-ufryk   0/1       Pending   0          1m
[root@openshift-115 dma]# oc describe pod/mysql-1-8ss17 -n dma|grep FailedScheduling
  1m		33s		7	{default-scheduler }			Warning		FailedScheduling	no nodes available to schedule pods
[root@openshift-115 dma]# oc describe pod/mysql-1-aj620 -n dma|grep FailedScheduling
  2m		11s		12	{default-scheduler }			Warning		FailedScheduling	no nodes available to schedule pods
[root@openshift-115 dma]# oc describe pod/mysql-1-ufryk -n dma|grep FailedScheduling
  2m		14s		13	{default-scheduler }			Warning		FailedScheduling	no nodes available to schedule pods
Comment 8 errata-xmlrpc 2016-05-12 12:26:35 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2016:1064

Note You need to log in before you can comment on or make changes to this bug.