Created attachment 1113648 [details] GUI images of issue Description of problem: When manually scaling up pod, openshift tries to place it on a node that doesn't have any disk space left, and openshift proceeds to get stuck in a loop of trying to deploy. Version-Release number of selected component (if applicable): 3.1 Steps to Reproduce: 1. Schedule pod to a node 2. Fill up diskspace 3. Manually scale pod up via GUI or CLI Actual results: Pod continues to loop creating many pods that fail with the same error. Expected results: Fail once with error Additional info: https://lists.openshift.redhat.com/openshift-archives/users/2016-January/msg00033.html # oc get pods .... logging-fluentd-1-x5h0i 0/1 OutOfDisk 0 1s logging-fluentd-1-xl4hz 0/1 OutOfDisk 0 12s logging-fluentd-1-xqhul 0/1 OutOfDisk 0 10s logging-fluentd-1-ykpku 0/1 OutOfDisk 0 13s logging-fluentd-1-z2map 0/1 OutOfDisk 0 7s [root master-001 ~]# oc get pods | wc -l 116 [root master-001 ~]# oc get pods | wc -l 119
This should be resolved with the next rebase into origin. The following upstream PRs add the ability to prevent scheduling to nodes that are out of disk: https://github.com/kubernetes/kubernetes/pull/16178 https://github.com/kubernetes/kubernetes/pull/16179
Not a 3.1.1 blocker
Upstream fixed merged Oct 29 and Nov 2. Fixed when rebase lands.
The upstream PRs have landed in openshift/origin repository.
Verify on openshift v3.1.1.905 steps: 1. Get the node [root@openshift-115 dma]# oc get node NAME STATUS AGE openshift-115.lab.sjc.redhat.com Ready,SchedulingDisabled 1d openshift-136.lab.sjc.redhat.com Ready 1d 2.Create a rc and scale the pod replicas=0 [root@openshift-115 dma]# oc get rc -n dma CONTROLLER CONTAINER(S) IMAGE(S) SELECTOR REPLICAS AGE mysql-1 mysql brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhscl/mysql-56-rhel7:latest deployment=mysql-1,deploymentconfig=mysql,name=mysql 0 18m 3.Create a large file to fill the disk with 100% usage [root@openshift-136 ~]# df -lh Filesystem Size Used Avail Use% Mounted on /dev/mapper/rhel72-root 10G 10G 20K 100% / devtmpfs 1.9G 0 1.9G 0% /dev tmpfs 1.9G 0 1.9G 0% /dev/shm tmpfs 1.9G 190M 1.7G 11% /run tmpfs 1.9G 0 1.9G 0% /sys/fs/cgroup /dev/vda1 497M 197M 300M 40% /boot tmpfs 380M 0 380M 0% /run/user/0 4.Scale the rc with replicas=3 # oc scale rc/mysql-1 --replicas=3 -n dma 5. Check the pod status [root@openshift-115 dma]# oc get pod -n dma NAME READY STATUS RESTARTS AGE mysql-1-8ss17 0/1 Pending 0 1m mysql-1-aj620 0/1 Pending 0 1m mysql-1-ufryk 0/1 Pending 0 1m [root@openshift-115 dma]# oc describe pod/mysql-1-8ss17 -n dma|grep FailedScheduling 1m 33s 7 {default-scheduler } Warning FailedScheduling no nodes available to schedule pods [root@openshift-115 dma]# oc describe pod/mysql-1-aj620 -n dma|grep FailedScheduling 2m 11s 12 {default-scheduler } Warning FailedScheduling no nodes available to schedule pods [root@openshift-115 dma]# oc describe pod/mysql-1-ufryk -n dma|grep FailedScheduling 2m 14s 13 {default-scheduler } Warning FailedScheduling no nodes available to schedule pods
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2016:1064