Bug 1400887 - [DOCS] Need better examples of how to set kubeletArguments
Summary: [DOCS] Need better examples of how to set kubeletArguments
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Documentation
Version: 3.3.0
Hardware: Unspecified
OS: Unspecified
low
medium
Target Milestone: ---
: ---
Assignee: Michael Burke
QA Contact: Alexander Koksharov
Vikram Goyal
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-12-02 08:56 UTC by Alexander Koksharov
Modified: 2020-08-13 08:44 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-06-21 20:48:23 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Alexander Koksharov 2016-12-02 08:56:41 UTC
Document URL: 
https://docs.openshift.com/container-platform/3.3/admin_guide/out_of_resource_handling.html#out-of-resource-schedulable-resources-and-eviction-policies

Section Number and Name: 

Describe the issue: 
Documentation says:
  kubeletArguments:
    eviction-hard: 
      - "memory.available<500Mi"
    system-reserved:
      - "1.5Gi"

  - Node memory capacity of 10Gi.
  - Operator wants to reserve 10 percent of memory capacity for system daemons (kernel, node, etc.).
  - Operator wants to evict pods at 95 percent memory utilization to reduce thrashing and incidence of system OOM.

Issue:
Somehow, the 1.5Gi is calculated out of 1Gi (10% of 10Gi) + 0.5 Gi (5% of 5Gi).
Need to have more clarifications on the math behind this.


Suggestions for improvement: 

Additional information:

Comment 1 Ashley Hardin 2016-12-05 21:11:53 UTC
@Derek, looks like this bit of content was sourced from you. Can you please help us clarify the math here?

Comment 2 Derek Carr 2016-12-12 21:45:00 UTC
I agree this is confusing, let me try to explain better.

A node reports two values:

1. capacity is how much resource is on the machine
2. allocatable is how much resource is made available for scheduling.

The goal is to allow the scheduler to fully allocate a node and to not have evictions occur.

Evictions should only occur if pods use more than their requested amount of resource.

If a node has 10Gi of capacity, and we want to reserve 10% of that capacity for the system daemons, we do the following:

capacity = 10Gi
system-reserved = 10Gi * .01 = 1Gi

The node allocatable value in this setting becomes:

allocatable = capacity - system-reserved = 9Gi

This means by default, the scheduler will schedule pods that request 9Gi of memory to that node.

If we want to turn on eviction so that eviction is triggered when available memory falls below 5% of capacity, we need the scheduler to see allocatable as 8.5Gi.

To do this, the math becomes the following:

capacity = 10Gi
eviction-threshold = 10Gi * .05 = .5Gi
system-reserved = (10Gi * .01) + eviction-threshold = 1.5Gi
allocatable = capacity - system-reserved = 8.5Gi

The key piece of information is you need to set system-reserved equal to the amount of resource you want to reserve for system-daemons + the amount of resource you want to reserve before triggering evictions.

Comment 3 Ashley Hardin 2016-12-14 18:26:33 UTC
Work in progress: 
https://github.com/openshift/openshift-docs/pull/3384

Comment 4 Alexander Koksharov 2017-01-04 13:02:39 UTC
Hello Derek,

Thank you. Your explanation is clear, but it does not involve all the variables.

If we have only 'allocatable' and we want to trigger eviction when we ran out of it, then it is as you described - we have to reserve some memory for system services and we want some for a threshold. 

But we have separate options "eviction-hard:" and "eviction-soft:". How they are used? where they are in a scenario you described?

Comment 5 Derek Carr 2017-01-04 22:41:13 UTC
The usage of a soft eviction is more common when you are targeting a certain level of utilization, but you are willing to tolerate temporary spikes.  I would recommend that the soft eviction threshold is always less than the hard eviction threshold, but the time period is operator specific.  The system reservation should also cover the soft eviction threshold.

Let's update the original scenario as follows:

If a node has 10Gi of capacity, and we want to reserve 10% of that capacity for 
the system daemons, we do the following:

capacity = 10Gi
system-reserved = 10Gi * .01 = 1Gi

The node allocatable value in this setting becomes:

allocatable = capacity - system-reserved = 9Gi

This means by default, the scheduler will schedule pods that request 9Gi of memory to that node.

If we want to turn on eviction so that eviction is triggered when the node observes available memory falls below 10% of capacity for 30s, or immediately when it falls below 5% of capacity, we need the scheduler to see allocatable as 8Gi.  So basically, you need to ensure your system reservation covers the greater of your eviction thresholds.

Comment 6 Derek Carr 2017-01-04 22:42:34 UTC
I copy/pasted bad text in my previous comment.

The usage of a soft eviction is more common when you are targeting a certain level of utilization, but you are willing to tolerate temporary spikes.  I would recommend that the soft eviction threshold is always less than the hard eviction threshold, but the time period is operator specific.  The system reservation should also cover the soft eviction threshold.

If we want to turn on eviction so that eviction is triggered when the node observes available memory falls below 10% of capacity for 30s, or immediately when it falls below 5% of capacity, we need the scheduler to see allocatable as 8Gi.  So basically, you need to ensure your system reservation covers the greater of your eviction thresholds.

Comment 7 Ashley Hardin 2017-01-05 18:12:18 UTC
Thanks! The PR is now updated.

Comment 8 Ashley Hardin 2017-01-26 20:54:40 UTC
@Alexander Does this look good now?
https://github.com/openshift/openshift-docs/pull/3384

Comment 9 openshift-github-bot 2017-01-31 15:41:27 UTC
Commits pushed to master at https://github.com/openshift/openshift-docs

https://github.com/openshift/openshift-docs/commit/c7c279eb08db7c95414087abde8f7e9d9e00523e
Bug 1400887, Added clarifying details around kubeletArguments in the Example Scenario section

https://github.com/openshift/openshift-docs/commit/940f08ef59992c746e9ba84a95e87d67031db8c5
Merge pull request #3384 from ahardin-rh/set-kubeletArguments

Bug 1400887, Added clarifying details around kubeletArguments in the Example Scenario section

Comment 22 Michael Burke 2017-05-05 16:02:10 UTC
@derek

Comment 24 openshift-github-bot 2017-06-19 02:05:10 UTC
Commit pushed to master at https://github.com/openshift/openshift-docs

https://github.com/openshift/openshift-docs/commit/cf0e80637c72ad19e895bac705e9db9e4d99657e
Merge pull request #4266 from mburke5678/oor-reorg

BUG 1400887 Reorganize the Out of Resource Handling Topic

Comment 25 Michael Burke 2017-06-21 20:48:23 UTC
Released to 3.5

Comment 26 Vikram Goyal 2017-06-22 00:13:13 UTC
@Michael - please provide a link to the released docs before closing.

Comment 27 Michael Burke 2017-06-22 15:41:32 UTC
Released to 3.5
https://docs.openshift.org/latest/admin_guide/out_of_resource_handling.html


Note You need to log in before you can comment on or make changes to this bug.