Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1252520

Summary: Openshift master spawns pods when out of disk space
Product: OpenShift Container Platform Reporter: Kenny Woodson <kwoodson>
Component: NodeAssignee: Paul Morie <pmorie>
Status: CLOSED ERRATA QA Contact: DeShuai Ma <dma>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 3.0.0CC: agrimm, dma, dmcphers, jokerman, libra-bugs, libra-onpremise-devel, mmccomas, pep, sspeiche, tdawson
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-05-12 16:24:28 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1303130, 1267746    

Description Kenny Woodson 2015-08-11 15:36:05 UTC
Description of problem:
I left a cluster of openshift v3 up for a week or two without checking on it.  When coming back I noticed that some of the services in my application were not functioning.  In particular my web ui that attaches to a database was showing an error about not being able to connect to it.

I noticed that /var/log/messages had filled up /var on both of my node's file systems.

When I checked `oc get pods` I noticed that the master had spun up 5000 pods of the mysql instance all in the Error state of OutOfDisk.  As well as the other pods running on that machine.
<snip>
NAME                              READY     REASON    RESTARTS   AGE
mysql-1-00bg6                     0/1       OutOfDisk  0          8h
mysql-1-01579                     0/1       OutOfDisk  0          8h
mysql-1-01d0m                     0/1       OutOfDisk  0          6h
mysql-1-02293                     0/1       OutOfDisk  0          11h
...
<snip>

Version-Release number of selected component (if applicable):
openshift-master-3.0.1.0-0.git.205.2c9a9b0.el7ose.x86_64
openshift-3.0.1.0-0.git.205.2c9a9b0.el7ose.x86_64
openshift-sdn-ovs-3.0.1.0-0.git.205.2c9a9b0.el7ose.x86_64
openshift-node-3.0.1.0-0.git.205.2c9a9b0.el7ose.x86_64
tuned-profiles-openshift-node-3.0.1.0-0.git.205.2c9a9b0.el7ose.x86_64

How reproducible:
Unsure.

Steps to Reproduce:
1. Install an application.
2. Fill up /var
3. Verify that openshift-master attempts to create lots of pods.

Actual results:
Openshift-master has attempted to create new pods when the disk space had run out.

Expected results:
Openshift-master should recognize that the node(s) is in a bad state and not schedule any more pod creations.

Additional info:
Docker is doing direct lvm using a separate disk (xvdb) and is 100GB.  /var is on xvda3 and is 8GB.

Comment 2 Derek Carr 2016-02-03 16:51:42 UTC
*** Bug 1248662 has been marked as a duplicate of this bug. ***

Comment 3 Paul Morie 2016-02-03 22:10:52 UTC
The fix for this should be in master now that the latest rebase has landed, please retest.

Comment 4 DeShuai Ma 2016-02-24 05:07:53 UTC
Verify on openshift v3.1.1.905

steps:
1. Get the node
[root@openshift-115 dma]# oc get node
NAME                               STATUS                     AGE
openshift-115.lab.sjc.redhat.com   Ready,SchedulingDisabled   1d
openshift-136.lab.sjc.redhat.com   Ready                      1d

2.Create a rc and scale the pod replicas=0
[root@openshift-115 dma]# oc get rc -n dma
CONTROLLER   CONTAINER(S)   IMAGE(S)                                                                           SELECTOR                                               REPLICAS   AGE
mysql-1      mysql          brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhscl/mysql-56-rhel7:latest   deployment=mysql-1,deploymentconfig=mysql,name=mysql   0          18m

3.Create a large file to fill the disk with 100% usage
[root@openshift-136 ~]# df -lh
Filesystem               Size  Used Avail Use% Mounted on
/dev/mapper/rhel72-root   10G   10G   20K 100% /
devtmpfs                 1.9G     0  1.9G   0% /dev
tmpfs                    1.9G     0  1.9G   0% /dev/shm
tmpfs                    1.9G  190M  1.7G  11% /run
tmpfs                    1.9G     0  1.9G   0% /sys/fs/cgroup
/dev/vda1                497M  197M  300M  40% /boot
tmpfs                    380M     0  380M   0% /run/user/0

4.Scale the rc with replicas=3
# oc scale rc/mysql-1 --replicas=3 -n dma

5. Check the pod status
[root@openshift-115 dma]# oc get pod -n dma
NAME            READY     STATUS    RESTARTS   AGE
mysql-1-8ss17   0/1       Pending   0          1m
mysql-1-aj620   0/1       Pending   0          1m
mysql-1-ufryk   0/1       Pending   0          1m
[root@openshift-115 dma]# oc describe pod/mysql-1-8ss17 -n dma|grep FailedScheduling
  1m		33s		7	{default-scheduler }			Warning		FailedScheduling	no nodes available to schedule pods
[root@openshift-115 dma]# oc describe pod/mysql-1-aj620 -n dma|grep FailedScheduling
  2m		11s		12	{default-scheduler }			Warning		FailedScheduling	no nodes available to schedule pods
[root@openshift-115 dma]# oc describe pod/mysql-1-ufryk -n dma|grep FailedScheduling
  2m		14s		13	{default-scheduler }			Warning		FailedScheduling	no nodes available to schedule pods

Comment 6 errata-xmlrpc 2016-05-12 16:24:28 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2016:1064