Bug 1252520 - Openshift master spawns pods when out of disk space
Summary: Openshift master spawns pods when out of disk space
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Pod   
(Show other bugs)
Version: 3.0.0
Hardware: All
OS: Linux
Target Milestone: ---
: ---
Assignee: Paul Morie
QA Contact: DeShuai Ma
: 1248662 (view as bug list)
Depends On:
Blocks: OSOPS_V3 1267746
TreeView+ depends on / blocked
Reported: 2015-08-11 15:36 UTC by Kenny Woodson
Modified: 2016-05-12 16:24 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2016-05-12 16:24:28 UTC
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2016:1064 normal SHIPPED_LIVE Important: Red Hat OpenShift Enterprise 3.2 security, bug fix, and enhancement update 2016-05-12 20:19:17 UTC

Description Kenny Woodson 2015-08-11 15:36:05 UTC
Description of problem:
I left a cluster of openshift v3 up for a week or two without checking on it.  When coming back I noticed that some of the services in my application were not functioning.  In particular my web ui that attaches to a database was showing an error about not being able to connect to it.

I noticed that /var/log/messages had filled up /var on both of my node's file systems.

When I checked `oc get pods` I noticed that the master had spun up 5000 pods of the mysql instance all in the Error state of OutOfDisk.  As well as the other pods running on that machine.
NAME                              READY     REASON    RESTARTS   AGE
mysql-1-00bg6                     0/1       OutOfDisk  0          8h
mysql-1-01579                     0/1       OutOfDisk  0          8h
mysql-1-01d0m                     0/1       OutOfDisk  0          6h
mysql-1-02293                     0/1       OutOfDisk  0          11h

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. Install an application.
2. Fill up /var
3. Verify that openshift-master attempts to create lots of pods.

Actual results:
Openshift-master has attempted to create new pods when the disk space had run out.

Expected results:
Openshift-master should recognize that the node(s) is in a bad state and not schedule any more pod creations.

Additional info:
Docker is doing direct lvm using a separate disk (xvdb) and is 100GB.  /var is on xvda3 and is 8GB.

Comment 2 Derek Carr 2016-02-03 16:51:42 UTC
*** Bug 1248662 has been marked as a duplicate of this bug. ***

Comment 3 Paul Morie 2016-02-03 22:10:52 UTC
The fix for this should be in master now that the latest rebase has landed, please retest.

Comment 4 DeShuai Ma 2016-02-24 05:07:53 UTC
Verify on openshift v3.1.1.905

1. Get the node
[root@openshift-115 dma]# oc get node
NAME                               STATUS                     AGE
openshift-115.lab.sjc.redhat.com   Ready,SchedulingDisabled   1d
openshift-136.lab.sjc.redhat.com   Ready                      1d

2.Create a rc and scale the pod replicas=0
[root@openshift-115 dma]# oc get rc -n dma
CONTROLLER   CONTAINER(S)   IMAGE(S)                                                                           SELECTOR                                               REPLICAS   AGE
mysql-1      mysql          brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhscl/mysql-56-rhel7:latest   deployment=mysql-1,deploymentconfig=mysql,name=mysql   0          18m

3.Create a large file to fill the disk with 100% usage
[root@openshift-136 ~]# df -lh
Filesystem               Size  Used Avail Use% Mounted on
/dev/mapper/rhel72-root   10G   10G   20K 100% /
devtmpfs                 1.9G     0  1.9G   0% /dev
tmpfs                    1.9G     0  1.9G   0% /dev/shm
tmpfs                    1.9G  190M  1.7G  11% /run
tmpfs                    1.9G     0  1.9G   0% /sys/fs/cgroup
/dev/vda1                497M  197M  300M  40% /boot
tmpfs                    380M     0  380M   0% /run/user/0

4.Scale the rc with replicas=3
# oc scale rc/mysql-1 --replicas=3 -n dma

5. Check the pod status
[root@openshift-115 dma]# oc get pod -n dma
mysql-1-8ss17   0/1       Pending   0          1m
mysql-1-aj620   0/1       Pending   0          1m
mysql-1-ufryk   0/1       Pending   0          1m
[root@openshift-115 dma]# oc describe pod/mysql-1-8ss17 -n dma|grep FailedScheduling
  1m		33s		7	{default-scheduler }			Warning		FailedScheduling	no nodes available to schedule pods
[root@openshift-115 dma]# oc describe pod/mysql-1-aj620 -n dma|grep FailedScheduling
  2m		11s		12	{default-scheduler }			Warning		FailedScheduling	no nodes available to schedule pods
[root@openshift-115 dma]# oc describe pod/mysql-1-ufryk -n dma|grep FailedScheduling
  2m		14s		13	{default-scheduler }			Warning		FailedScheduling	no nodes available to schedule pods

Comment 6 errata-xmlrpc 2016-05-12 16:24:28 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.