Bug 1554471

Summary:	Limit ranges are being applied with cpu-cfs-quota set to false
Product:	OpenShift Container Platform	Reporter:	Taneem Ibrahim <tibrahim>
Component:	Node	Assignee:	Derek Carr <decarr>
Status:	CLOSED ERRATA	QA Contact:	DeShuai Ma <dma>
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	3.7.0	CC:	aos-bugs, avagarwa, bleanhar, decarr, dma, jokerman, mmccomas, sjenning
Target Milestone:	---
Target Release:	3.7.z
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:
Clones:	1558155 1558157 (view as bug list)		Environment:
Last Closed:	2018-06-27 07:59:11 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1558155, 1558157

Description Taneem Ibrahim 2018-03-12 18:08:17 UTC

Description of problem:

The node-config.yaml has cpu-cfs-quota set to false. However, when you set a limit range on CPU for example to 100 millicores for the pod it gets applied. According to our documentation, having cpu-cfs-quota set to false should ignore any limits being from being enforced.

https://docs.openshift.com/container-platform/3.7/admin_guide/overcommit.html#enforcing-cpu-limits

Verified that CPU is not being throttled:

oc exec <POD_NAME> -- cat /sys/fs/cgroup/cpu/cpu.stat
nr_periods 0
nr_throttled 0
throttled_time 0

Version-Release number of selected component (if applicable):

v3.7.9

How reproducible:

Possible.


Steps to Reproduce:
1. Set cpu-cfs-quota to false.
2. Create a tomcat pod with an application deployed.
3. Set limit on cpu and requests at 100 millicore.
4. Pod health check with fail after 600 seconds. The following error message is present in the console:

--> Scaling test-1-0-5-snapshot-23 to 1
error: update acceptor rejected test-1-0-5-snapshot-23: pods for rc '<redacted>' took longer than 600 seconds to become available
Actual results:


Expected results:

The cpu limits will be ignored and health check would succeed due to cpu-cfs-quota being set to false

Additional info:

Comment 3 Avesh Agarwal 2018-03-13 20:34:45 UTC

I will try to reproduce it locally and get back to you.

Comment 4 Avesh Agarwal 2018-03-13 22:11:11 UTC

I can not reproduce in origin's 3.9 branch. I will check 3.7 now.

Comment 6 Avesh Agarwal 2018-03-14 17:34:17 UTC

I am still investigating it and will get back you once I have something to share.

Comment 7 Avesh Agarwal 2018-03-14 18:17:02 UTC

Here is my investigation so far:

I tested it on 3.9 and latest upstream kubernetes master branch and I am seeing same behaviour in both. And it seems it also matches the behaviour in 3.7. 

1. cpu-cfs-quota to true
1a) if pod specifies only request, there was no throttling (nr_periods 0, nr_throttled 0, throttled_time 0) and pod was able to use as much as cpu as it wanted. (the pod had 5cpu, and i ran dd if=/dev/zero of=/dev/null, it was able to go upto 20% (around 1cpu), which is expected with dd). Also, the pod set its CpuQuota to 0, CpuPeriod to 0. 

1b) if pod specifies limit, there was no throttling if it stayed below its limits. There was throttling (nr_periods, nr_throttled, throttled_time) kept increasing when pod trried to go beyond its limit and remained always within its limit. (the pod had 5cpu, and its limit was 500m, and i ran dd if=/dev/zero of=/dev/null, it was always around 10%). Also, the pod set its CpuQuota to 50000, CpuPeriod to 100000.  


2. cpu-cfs-quota to false:
2a) if pod specifies only request, there was no throttling (nr_periods 0, nr_throttled 0, throttled_time 0) and pod was able to use as much as cpu as it wanted. (the pod had 5cpu, and i ran dd if=/dev/zero of=/dev/null, it was able to go upto 20% (around 1cpu), which is expected with dd). Also, the pod set its CpuQuota to 0, CpuPeriod to 0. 

2b) if pod specifies limit, there was no throttling (nr_periods 0, nr_throttled 0, throttled_time 0) but remained always within its limit. (the pod had 5cpu, and its limit was 500m, and i ran dd if=/dev/zero of=/dev/null, it was always around 10%). Also, the pod set its CpuQuota to 0, CpuPeriod to 0.

As per above, all cases are working as expected except 2b which needs to be checked if its working as expected or not. In 2b, the main discrepancy is that cpu-cfs-quota was set to false, and the pod set its CpuQuota to 0, CpuPeriod to 0 as expected, then why it remained at 10% and did not go upto 20%, even though there was no throttling?

Comment 8 Avesh Agarwal 2018-03-14 18:58:15 UTC

Just to capture some offline conversation:

Tested is done with docker 1.12.6. And i was checking values here https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/kuberuntime/kuberuntime_container_linux.go#L65-L71 and they seem to be set correctly as per my last comment.

Comment 9 Avesh Agarwal 2018-03-14 21:49:47 UTC

I checked that values shown in docker inspect are same as what is being passed by kubelet to docker. I will keep investigating but so far I am not seeing if anything is wrong with kubelet.


I found another bz which seems to deal in the same area: https://bugzilla.redhat.com/show_bug.cgi?id=1455071 just putting here for reference as I go through the bz.

Comment 10 Avesh Agarwal 2018-03-14 22:28:54 UTC

In 2b case (as described in the comment https://bugzilla.redhat.com/show_bug.cgi?id=1554471#c7), I see following:

# kubectl --kubeconfig /var/run/kubernetes/admin.kubeconfig exec nginx5-d2czw -- cat /sys/fs/cgroup/cpu/cpu.cfs_period_us
100000
# kubectl --kubeconfig /var/run/kubernetes/admin.kubeconfig exec nginx5-d2czw -- cat /sys/fs/cgroup/cpu/cpu.cfs_quota_us
-1
# kubectl --kubeconfig /var/run/kubernetes/admin.kubeconfig exec nginx5-d2czw -- cat /sys/fs/cgroup/cpu/cpu.shares
10


docker version: docker-1.12.6-71.git3e8e77d.el7.x86_64
kernel version: 3.10.0-709.el7.x86_64

Comment 11 Derek Carr 2018-03-15 18:47:09 UTC

I think I have isolated the root issue.

The container cgroup for cfs quota is unbound, but the pod level cgroup is bounded.  I will need some time to see how much needs to change upstream to keep this unbounded when pod level cgroups are enabled.

Comment 13 Derek Carr 2018-03-16 18:56:03 UTC

Upstream PR posted:
https://github.com/kubernetes/kubernetes/pull/61294

Comment 14 Avesh Agarwal 2018-03-16 19:12:39 UTC

(In reply to Derek Carr from comment #13)
> Upstream PR posted:
> https://github.com/kubernetes/kubernetes/pull/61294

Thanks a lot Derek. I did some testing and it is working as expected.

Comment 16 Derek Carr 2018-03-19 17:35:49 UTC

Upstream PR merged.

Origin PR for 3.10 here:
https://github.com/openshift/origin/pull/19028

Will do pro-active picks with cloned bzs.

Comment 20 Brenton Leanhardt 2018-04-05 12:19:08 UTC

This shipped with https://access.redhat.com/errata/product/290/ver=3.7/rhel---7/x86_64/RHBA-2018:0636

Comment 21 DeShuai Ma 2018-04-11 10:09:37 UTC

Verify on ocp v3.7.43

Steps to verify:
Set cpu-cfs-quota=true in /etc/origin/node/node-config.yaml
kubeletArguments:
  cpu-cfs-quota:
  - 'true'
//Case 1 (cpu-cfs-quota=true + without limits)
1. Create a pod without limits
2. rsh into pod run 'dd if=/dev/zero of=/dev/null'
3. In another terminal rsh into pod and run 'while true; do sleep 4; ps aux|grep dd ; done' The cpu usage is about 99% for one core
sh-4.2# while true; do sleep 4; ps aux|grep dd ; done
root         19 98.0  0.0   4348   344 ?        R+   08:59   0:24 dd if=/dev/zero of=/dev/null
4. The the host which the pod running, using `htop` to watch the cpu usage. one of the four cpu core usage is about 99%

//Case 2 (cpu-cfs-quota=true + with limits)
1. Create a pod with limits.cpu=500m
2. rsh into pod run 'dd if=/dev/zero of=/dev/null'
3. In another terminal rsh into pod and run 'while true; do sleep 4; ps aux|grep dd ; done' The cpu usage is about 50% for one core
4. The the host which the pod running, using `htop` to watch the cpu usage. one of the four cpu core usage is about 50%


//Case 3 (cpu-cfs-quota=false + without limits)
1. Create a pod without limits.cpu=500m
2. rsh into pod run 'dd if=/dev/zero of=/dev/null'
3. In another terminal rsh into pod and run 'while true; do sleep 4; ps aux|grep dd ; done' The cpu usage is about 99% for one core
4. The the host which the pod running, using `htop` to watch the cpu usage. one of the four cpu core usage is about 99%

//Case 4 (cpu-cfs-quota=false + with limits)
1. Create a pod without limits
2. rsh into pod run 'dd if=/dev/zero of=/dev/null'
3. In another terminal rsh into pod and run 'while true; do sleep 4; ps aux|grep dd ; done' The cpu usage is about 99% for one core.
4. The the host which the pod running, using `htop` to watch the cpu usage. one of the four cpu core usage is about 99%.

Comment 23 errata-xmlrpc 2018-06-27 07:59:11 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2009