Bug 1386055 - [infrastructure_public_371]Shouldn't schedule pod on node when node become 'DiskPressure=True'
Summary: [infrastructure_public_371]Shouldn't schedule pod on node when node become 'D...
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 3.4.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: ---
Assignee: Avesh Agarwal
QA Contact: DeShuai Ma
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-10-18 02:37 UTC by DeShuai Ma
Modified: 2016-10-20 16:35 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-10-20 16:35:12 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description DeShuai Ma 2016-10-18 02:37:04 UTC
Description of problem:
When node become DiskPressure=true, shouldn't schedule pod on those nodes.

Version-Release number of selected component (if applicable):
openshift v3.4.0.12
kubernetes v1.4.0+776c994
etcd 3.1.0-alpha.1

How reproducible:
Always

Steps to Reproduce:
1.Create a pod on node and create large file in pod
$ oc create -f https://raw.githubusercontent.com/mdshuai/testfile-openshift/master/k8s/hello-pod-tmp-hostpath.yaml
$ oc exec hello-pod -- dd if=/dev/zero of=/tmp/test1 bs=10M count=1024

2.When node become 'DiskPressure=True', create a another pod
$ oc describe node openshift-128.lab.sjc.redhat.com|grep DiskPressure
$ oc create -f https://raw.githubusercontent.com/mdshuai/testfile-openshift/master/k8s/hello-pod.yaml

3.Check the second pod status
$ oc describe pod hello-pod

Actual results:
3.The second pod become 'Evicted'
[root@openshift-105 ~]# oc describe pod hello-pod
Name:			hello-pod
Namespace:		default
Security Policy:	anyuid
Node:			openshift-128.lab.sjc.redhat.com/
Start Time:		Mon, 17 Oct 2016 22:07:59 -0400
Labels:			name=hello-pod
Status:			Failed
Reason:			Evicted
Message:		Pod The node was low on compute resources.
IP:			
Controllers:		<none>
Containers:
  hello-pod:
    Image:	docker.io/deshuai/hello-pod:latest
    Port:	8080/TCP
    Volume Mounts:
      /tmp from tmp (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-7t1xx (ro)
    Environment Variables:	<none>
Volumes:
  tmp:
    Type:	EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:	
  default-token-7t1xx:
    Type:	Secret (a volume populated by a Secret)
    SecretName:	default-token-7t1xx
QoS Class:	BestEffort
Tolerations:	<none>
Events:
  FirstSeen	LastSeen	Count	From						SubobjectPath	Type		Reason		Message
  ---------	--------	-----	----						-------------	--------	------		-------
  18m		18m		1	{default-scheduler }						Normal		Scheduled	Successfully assigned hello-pod to openshift-128.lab.sjc.redhat.com
  18m		18m		1	{kubelet openshift-128.lab.sjc.redhat.com}			Warning		Evicted		The node was low on compute resources.

Expected results:
3.The pod should pending and scheduler Shouldn't schedule pod on node when node become 'DiskPressure=True'

Additional info:

Comment 1 DeShuai Ma 2016-10-18 06:12:23 UTC
latest kubernetes don't have this issue.

Comment 2 Derek Carr 2016-10-19 22:06:07 UTC
Can you include the kubeletArguments snippet that you used to configure the node?

Comment 3 Derek Carr 2016-10-20 16:08:01 UTC
FWIW, I tried to repro using a simple reproduction that just set the nodefs.available<$(high_value) so a node will automatically report DiskPressure and saw that pods were not scheduled as expected.  It's possible the scheduler cache could have been latent, but it would be good to see the full node-config.yaml.

Comment 4 Avesh Agarwal 2016-10-20 16:23:39 UTC
Even I tried on latest ose(its close to 3.4.0.12) and can not reproduce and it works as expected:

#oc describe node --config=./openshift.local.config/master/admin.kubeconfig | grep  DiskPres
  DiskPressure 		True 	Thu, 20 Oct 2016 12:21:59 -0400 	Thu, 20 Oct 2016 12:18:38 -0400 	KubeletHasDiskPressure 		kubelet has disk pressure
  3m		3m		2	{kubelet 192.168.124.61}			Normal		NodeHasNoDiskPressure	Node 192.168.124.61 status is now: NodeHasNoDiskPressure
  3m		3m		1	{kubelet 192.168.124.61}			Normal		NodeHasDiskPressure	Node 192.168.124.61 status is now: NodeHasDiskPressure


And the pod status is pending with the following event:

Events:
  FirstSeen	LastSeen	Count	From			SubobjectPath	Type		Reason			Message
  ---------	--------	-----	----			-------------	--------	------			-------
  2m		7s		14	{default-scheduler }			Warning		FailedScheduling	pod (hello-pod) failed to fit in any node
fit failure on node (192.168.124.61): NodeUnderDiskPressure

Comment 5 Avesh Agarwal 2016-10-20 16:35:12 UTC
Hi DeShaui,

In my setup to simulate Disk Pressure, I had:
kubeletArguments:
  eviction-hard:
    - "nodefs.available<12Gi"

In my setup and Derek's setup, we could not reproduce it. One thing as Derek said could be related to latent scheduler cache. Anyway, would be good to look at your node-config.yaml to see what it has. 

I am closing it for time being. Please reopen if you see it consistently.


Note You need to log in before you can comment on or make changes to this bug.