Bug 1464367 - Incorrect memory limit calculation for kubepod in the cgroup hierarchy
Incorrect memory limit calculation for kubepod in the cgroup hierarchy
Status: CLOSED NOTABUG
Product: OpenShift Container Platform
Classification: Red Hat
Component: Kubernetes (Show other bugs)
3.6.0
Unspecified Unspecified
unspecified Severity medium
: ---
: ---
Assigned To: Derek Carr
Qixuan Wang
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-06-23 05:12 EDT by Qixuan Wang
Modified: 2017-06-23 16:46 EDT (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-06-23 16:46:11 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Qixuan Wang 2017-06-23 05:12:56 EDT
Description of problem:
Compared the value of /sys/fs/cgroup/memory/kubepod.slice/memory.limit_in_bytes with experimental-allocatable-ignore-eviction enable and disable, I found that the result is always equal to Node.Capacity not Node.Allocatable. (according to https://github.com/kubernetes/community/pull/348/files , line#170: kubepods or kubepods.slice (Node Allocatable enforced here by Kubelet))


Version-Release number of selected component (if applicable):
openshift v3.6.121
kubernetes v1.6.1+5115d708d7
etcd 3.2.0

How reproducible:
Always

Steps to Reproduce:
1. Although eviction threshold is not configured to kubelet, it is enabled with memory.available<100Mi by default

2. Check memory limit in node description and cgroup
# oc describe node <node> | grep -A7 Capacity
# cat /sys/fs/cgroup/memory/kubepods.slice/memory.limit_in_bytes

3. Add the following to [Node]node-config.yaml and restart node service
kubeletArguments:
experimental-allocatable-ignore-eviction:
- 'true'
# systemctl restart atomic-openshift-node

4. Check memory limit in node description and cgroup again


Actual results:
2. [root@ip-172-18-12-156 ~]# oc describe node <node> | grep -A7 Capacity
Capacity:
 cpu:        1
 memory:    3688620Ki
 pods:        250
Allocatable:
 cpu:        1
 memory:    3586220Ki
 pods:        250

[root@ip-172-18-3-157 ~]# cat /sys/fs/cgroup/memory/kubepods.slice/memory.limit_in_bytes
3777146880

3777146880 = 3688620Ki = Capacity   # I think it's wrong

4. [root@ip-172-18-12-156 ~]# oc describe node <node> | grep -A7 Capacity
Capacity:
 cpu:        1
 memory:    3688620Ki
 pods:        250
Allocatable:
 cpu:        1
 memory:    3688620Ki
 pods:        250

[root@ip-172-18-3-157 node]# cat /sys/fs/cgroup/memory/kubepods.slice/memory.limit_in_bytes
3777146880

3777146880 = 3688620Ki = Capacity  # Correct


Expected results:
2. Take default eviction threshold(<100Mi) into consideration, the node Capacity/Allocatable in node description is correct. Besides, /sys/fs/cgroup/memory/kubepods.slice/memory.limit_in_bytes should be equal to Node.Allocatable(3586220Ki=3672289280)


Additional info:
Comment 1 Seth Jennings 2017-06-23 16:46:11 EDT
This is by design.  It is confusing though.

In order for the hard eviction threshold to be triggered, the kubepods cgroups must be allowed to exceed the threshold.  So eviction-hard is subtracted from capacity to calculate allocatable, but it is not used in the calculation for memory.limit_in_bytes.  kube- and system- reserved will be subtracted from capacity to calculate both allocatable AND the memory.limit_in_bytes.

Note You need to log in before you can comment on or make changes to this bug.