Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1386055

Summary:	[infrastructure_public_371]Shouldn't schedule pod on node when node become 'DiskPressure=True'
Product:	OpenShift Container Platform	Reporter:	DeShuai Ma <dma>
Component:	Node	Assignee:	Avesh Agarwal <avagarwa>
Status:	CLOSED WORKSFORME	QA Contact:	DeShuai Ma <dma>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	3.4.0	CC:	aos-bugs, decarr, jokerman, mmccomas
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2016-10-20 16:35:12 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description DeShuai Ma 2016-10-18 02:37:04 UTC

Description of problem:
When node become DiskPressure=true, shouldn't schedule pod on those nodes.

Version-Release number of selected component (if applicable):
openshift v3.4.0.12
kubernetes v1.4.0+776c994
etcd 3.1.0-alpha.1

How reproducible:
Always

Steps to Reproduce:
1.Create a pod on node and create large file in pod
$ oc create -f https://raw.githubusercontent.com/mdshuai/testfile-openshift/master/k8s/hello-pod-tmp-hostpath.yaml
$ oc exec hello-pod -- dd if=/dev/zero of=/tmp/test1 bs=10M count=1024

2.When node become 'DiskPressure=True', create a another pod
$ oc describe node openshift-128.lab.sjc.redhat.com|grep DiskPressure
$ oc create -f https://raw.githubusercontent.com/mdshuai/testfile-openshift/master/k8s/hello-pod.yaml

3.Check the second pod status
$ oc describe pod hello-pod

Actual results:
3.The second pod become 'Evicted'
[root@openshift-105 ~]# oc describe pod hello-pod
Name:			hello-pod
Namespace:		default
Security Policy:	anyuid
Node:			openshift-128.lab.sjc.redhat.com/
Start Time:		Mon, 17 Oct 2016 22:07:59 -0400
Labels:			name=hello-pod
Status:			Failed
Reason:			Evicted
Message:		Pod The node was low on compute resources.
IP:			
Controllers:		<none>
Containers:
  hello-pod:
    Image:	docker.io/deshuai/hello-pod:latest
    Port:	8080/TCP
    Volume Mounts:
      /tmp from tmp (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-7t1xx (ro)
    Environment Variables:	<none>
Volumes:
  tmp:
    Type:	EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:	
  default-token-7t1xx:
    Type:	Secret (a volume populated by a Secret)
    SecretName:	default-token-7t1xx
QoS Class:	BestEffort
Tolerations:	<none>
Events:
  FirstSeen	LastSeen	Count	From						SubobjectPath	Type		Reason		Message
  ---------	--------	-----	----						-------------	--------	------		-------
  18m		18m		1	{default-scheduler }						Normal		Scheduled	Successfully assigned hello-pod to openshift-128.lab.sjc.redhat.com
  18m		18m		1	{kubelet openshift-128.lab.sjc.redhat.com}			Warning		Evicted		The node was low on compute resources.

Expected results:
3.The pod should pending and scheduler Shouldn't schedule pod on node when node become 'DiskPressure=True'

Additional info:

Comment 1 DeShuai Ma 2016-10-18 06:12:23 UTC

latest kubernetes don't have this issue.

Comment 2 Derek Carr 2016-10-19 22:06:07 UTC

Can you include the kubeletArguments snippet that you used to configure the node?

Comment 3 Derek Carr 2016-10-20 16:08:01 UTC

FWIW, I tried to repro using a simple reproduction that just set the nodefs.available<$(high_value) so a node will automatically report DiskPressure and saw that pods were not scheduled as expected.  It's possible the scheduler cache could have been latent, but it would be good to see the full node-config.yaml.

Comment 4 Avesh Agarwal 2016-10-20 16:23:39 UTC

Even I tried on latest ose(its close to 3.4.0.12) and can not reproduce and it works as expected:

#oc describe node --config=./openshift.local.config/master/admin.kubeconfig | grep  DiskPres
  DiskPressure 		True 	Thu, 20 Oct 2016 12:21:59 -0400 	Thu, 20 Oct 2016 12:18:38 -0400 	KubeletHasDiskPressure 		kubelet has disk pressure
  3m		3m		2	{kubelet 192.168.124.61}			Normal		NodeHasNoDiskPressure	Node 192.168.124.61 status is now: NodeHasNoDiskPressure
  3m		3m		1	{kubelet 192.168.124.61}			Normal		NodeHasDiskPressure	Node 192.168.124.61 status is now: NodeHasDiskPressure


And the pod status is pending with the following event:

Events:
  FirstSeen	LastSeen	Count	From			SubobjectPath	Type		Reason			Message
  ---------	--------	-----	----			-------------	--------	------			-------
  2m		7s		14	{default-scheduler }			Warning		FailedScheduling	pod (hello-pod) failed to fit in any node
fit failure on node (192.168.124.61): NodeUnderDiskPressure

Comment 5 Avesh Agarwal 2016-10-20 16:35:12 UTC

Hi DeShaui,

In my setup to simulate Disk Pressure, I had:
kubeletArguments:
  eviction-hard:
    - "nodefs.available<12Gi"

In my setup and Derek's setup, we could not reproduce it. One thing as Derek said could be related to latent scheduler cache. Anyway, would be good to look at your node-config.yaml to see what it has. 

I am closing it for time being. Please reopen if you see it consistently.