Bug 1252520 - Openshift master spawns pods when out of disk space
Openshift master spawns pods when out of disk space
Status: CLOSED ERRATA
Product: OpenShift Container Platform
Classification: Red Hat
Component: Kubernetes (Show other bugs)
3.0.0
All Linux
unspecified Severity medium
: ---
: ---
Assigned To: Paul Morie
DeShuai Ma
:
: 1248662 (view as bug list)
Depends On:
Blocks: OSOPS_V3 1267746
  Show dependency treegraph
 
Reported: 2015-08-11 11:36 EDT by Kenny Woodson
Modified: 2016-05-12 12:24 EDT (History)
10 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-05-12 12:24:28 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Kenny Woodson 2015-08-11 11:36:05 EDT
Description of problem:
I left a cluster of openshift v3 up for a week or two without checking on it.  When coming back I noticed that some of the services in my application were not functioning.  In particular my web ui that attaches to a database was showing an error about not being able to connect to it.

I noticed that /var/log/messages had filled up /var on both of my node's file systems.

When I checked `oc get pods` I noticed that the master had spun up 5000 pods of the mysql instance all in the Error state of OutOfDisk.  As well as the other pods running on that machine.
<snip>
NAME                              READY     REASON    RESTARTS   AGE
mysql-1-00bg6                     0/1       OutOfDisk  0          8h
mysql-1-01579                     0/1       OutOfDisk  0          8h
mysql-1-01d0m                     0/1       OutOfDisk  0          6h
mysql-1-02293                     0/1       OutOfDisk  0          11h
...
<snip>

Version-Release number of selected component (if applicable):
openshift-master-3.0.1.0-0.git.205.2c9a9b0.el7ose.x86_64
openshift-3.0.1.0-0.git.205.2c9a9b0.el7ose.x86_64
openshift-sdn-ovs-3.0.1.0-0.git.205.2c9a9b0.el7ose.x86_64
openshift-node-3.0.1.0-0.git.205.2c9a9b0.el7ose.x86_64
tuned-profiles-openshift-node-3.0.1.0-0.git.205.2c9a9b0.el7ose.x86_64

How reproducible:
Unsure.

Steps to Reproduce:
1. Install an application.
2. Fill up /var
3. Verify that openshift-master attempts to create lots of pods.

Actual results:
Openshift-master has attempted to create new pods when the disk space had run out.

Expected results:
Openshift-master should recognize that the node(s) is in a bad state and not schedule any more pod creations.

Additional info:
Docker is doing direct lvm using a separate disk (xvdb) and is 100GB.  /var is on xvda3 and is 8GB.
Comment 2 Derek Carr 2016-02-03 11:51:42 EST
*** Bug 1248662 has been marked as a duplicate of this bug. ***
Comment 3 Paul Morie 2016-02-03 17:10:52 EST
The fix for this should be in master now that the latest rebase has landed, please retest.
Comment 4 DeShuai Ma 2016-02-24 00:07:53 EST
Verify on openshift v3.1.1.905

steps:
1. Get the node
[root@openshift-115 dma]# oc get node
NAME                               STATUS                     AGE
openshift-115.lab.sjc.redhat.com   Ready,SchedulingDisabled   1d
openshift-136.lab.sjc.redhat.com   Ready                      1d

2.Create a rc and scale the pod replicas=0
[root@openshift-115 dma]# oc get rc -n dma
CONTROLLER   CONTAINER(S)   IMAGE(S)                                                                           SELECTOR                                               REPLICAS   AGE
mysql-1      mysql          brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhscl/mysql-56-rhel7:latest   deployment=mysql-1,deploymentconfig=mysql,name=mysql   0          18m

3.Create a large file to fill the disk with 100% usage
[root@openshift-136 ~]# df -lh
Filesystem               Size  Used Avail Use% Mounted on
/dev/mapper/rhel72-root   10G   10G   20K 100% /
devtmpfs                 1.9G     0  1.9G   0% /dev
tmpfs                    1.9G     0  1.9G   0% /dev/shm
tmpfs                    1.9G  190M  1.7G  11% /run
tmpfs                    1.9G     0  1.9G   0% /sys/fs/cgroup
/dev/vda1                497M  197M  300M  40% /boot
tmpfs                    380M     0  380M   0% /run/user/0

4.Scale the rc with replicas=3
# oc scale rc/mysql-1 --replicas=3 -n dma

5. Check the pod status
[root@openshift-115 dma]# oc get pod -n dma
NAME            READY     STATUS    RESTARTS   AGE
mysql-1-8ss17   0/1       Pending   0          1m
mysql-1-aj620   0/1       Pending   0          1m
mysql-1-ufryk   0/1       Pending   0          1m
[root@openshift-115 dma]# oc describe pod/mysql-1-8ss17 -n dma|grep FailedScheduling
  1m		33s		7	{default-scheduler }			Warning		FailedScheduling	no nodes available to schedule pods
[root@openshift-115 dma]# oc describe pod/mysql-1-aj620 -n dma|grep FailedScheduling
  2m		11s		12	{default-scheduler }			Warning		FailedScheduling	no nodes available to schedule pods
[root@openshift-115 dma]# oc describe pod/mysql-1-ufryk -n dma|grep FailedScheduling
  2m		14s		13	{default-scheduler }			Warning		FailedScheduling	no nodes available to schedule pods
Comment 6 errata-xmlrpc 2016-05-12 12:24:28 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2016:1064

Note You need to log in before you can comment on or make changes to this bug.